2. Review on Probability Theory

Probability Space
Definition of Probability space

A probability space is defined by triplet (Ω,F,P):(\Omega, \mathcal F, \mathcal P):

Ω\Omega : Sample space

F\mathcal F : σ\sigma-algebra on Ω\Omega

P\mathcal P : F[0,1]\mathcal F \rarr [0, 1]

  • Definition of Sample space

    Set of all possible outcomes, where an outcome is the result of a single execution of the model

  • Definition of Event

    Subset of Sample space(Ω\Omega)

  • Definition of Field

    A collection F\mathcal F of subset of Ω\Omega forms a field if following 3 conditions hold.

    (It is equivalent to F\mathcal F is a collection of events on Ω\Omega)

    1. F,ΩF\emptyset \in \mathcal F, \Omega \in \mathcal F
    1. If AF, then AcF\text{If }A \in \mathcal F, \text{ then } A^c \in F 
    1. If A,BF, then ABF and ABF\text{If }A, B \in \mathcal F, \text{ then } A \cup B \in \mathcal F \text{ and } A \cap B \in \mathcal F (closed under finite union and intersection)
  • Definition of σ\sigma-field

    A collection F\mathcal F of subset of Ω\Omega forms a σ\sigma-field if following 3 conditions hold.

    (Almost same as the definition of Field except the 3rd condition)

    1. F,ΩF\emptyset \in \mathcal F, \Omega \in \mathcal F
    1. If AF, then AcF\text{If }A \in \mathcal F, \text{ then } A^c \in F 
    1. If AiF(iI), then i=1AiF and i=1AiF\text{If }A_i \in \mathcal F(i \in I ), \text{ then } \cup_{i = 1}^\infty A_i\in \mathcal F \text{ and } \cap_{i = 1}^\infty A_i \in \mathcal F (closed under countable union and intersection)
Why we need to define σ\sigma-field

It requires to formally define the probability.

What is probability measure?

We say P\mathcal P (F[0,1]\mathcal F \rarr [0,1])is a probability measure if satisfy 3 conditions.

  1. P()=0\mathcal P(\emptyset) = 0
  1. P(Ω)=1\mathcal P(\Omega) = 1
  1. AF,P(A)0\forall A\in \mathcal F, \mathcal P(A) \ge 0

Properties of probability

P(A,B):=P(AB)P(A, B) := P(A \cap B)

P(A),P(B)P(A), P(B)

P(AB)=P(A)P(B)P(A\cap B)= P(A)P(B)

P(AB):=P(AB)P(B)] if P(B)0P(A|B) := \frac{P(A\cap B)}{P(B)} ] \text{ if } P(B) \ne 0

→ If A, and B are independent, P(AB)=P(A)P(A|B) = P(A)

P(A)=i=1nP(ABi)=i=1nP(ABi)P(Bi)P(A) = \sum_{i = 1}^n P(A\cap B_i) = \sum_{i = 1}^nP(A|B_i)P(B_i)

  • {Bi}iI\{B_i\}_{i \in I} is a partition of Ω\Omega
Marginalizing out unwanted data is a basic operation to process raw data

For all countable collection {Ai}iI\{A_i\}_{i \in I} of pairwise discoint events

P(iIAi)=iIP(Ai)P(\cup_{i \in I}A_i) = \sum_{i\in I}P(A_i)
P(AB):=P(BA)P(B)P(B)P(A|B) := \frac{P(B|A)P(B)}{P(B)}

→ In ML, model typicall explains P(x|y)P(y) rather than P(y|x)

First, we need to distinguish between the support of a random variable and the support of its distribution

The support of a random variable X is the set of all possible values that XX can take on with non-zero probability.

  • When X is a discrete r.v

    The support is the set of all possible values of XX

  • When X is a continuous r.v with probability density function f(x)

    The support is the set of all values of x for which f(x) is non-zero

The support of a probability distribution is the set of all values of the random variable for which the probability density function is non-zero.

  • When f(x) is discrete probability distribution

    The support is the set of all values for which the probability mass function is non-zero

  • When f(x) is continuous probability distribution

    The support is the set of all values of x for which f(x) is non-zero.

The support of a probability distribution can be different from the support of the random variable generated by that distribution.

For example, if X is generated by a uniform distribution over the interval (-1, 1), then its probability density function f(x) is:

f(x)=1/2 (1<x<1)f(x)=0 (otherwise)f(x) = 1/2 \text{ (}-1<x<1) \\ f(x) = 0 \text{ (otherwise)}

In this case, f(x) has an infinite support, since it is non-zero over the interval (-1, 1) and zero elsewhere. However, X has a finite support, since it can only take on values in the interval (-1, 1). So in this case, the statement "X has a finite support" is true, even though f(x) has an infinite support.

Continuous Probability

→ a probability of a single point is always zero

→ the probabilities are measured over intervals

Bernoulli Distribution

Bernoulli distribution with parameter p[0,1]p \in [0,1]

Binomial Distribution

A binomial random variable can be interpreted as the sum of n independent Bernoulli random variables.

E[X]=npVar[X]=np(1p)E[X] = np \\ Var[X] = np(1-p)

Beta Distribution

Beta distribution with parameters α,β>0\alpha, \beta > 0

P(X=x)=xα1(1x)n1P(X=x) = x^{\alpha - 1}(1-x)^{n - 1}

Beta distribution is often used to model parameter p of bernoulli distribution

Multivariate Gaussian PDF

