A probability space is defined by triplet (Ω,F,P):
Ω : Sample space
F : σ-algebra on Ω
P : F→[0,1]
Definition of Sample space
Set of all possible outcomes, where an outcome is the result of a single execution of the model
Definition of Event
Subset of Sample space(Ω)
Definition of Field
A collection F of subset of Ω forms a field if following 3 conditions hold.
(It is equivalent to F is a collection of events on Ω)
∅∈F,Ω∈F
If A∈F, then Ac∈F
If A,B∈F, then A∪B∈F and A∩B∈F (closed under finite union and intersection)
Definition of σ-field
A collection F of subset of Ω forms a σ-field if following 3 conditions hold.
(Almost same as the definition of Field except the 3rd condition)
∅∈F,Ω∈F
If A∈F, then Ac∈F
If Ai∈F(i∈I), then ∪i=1∞Ai∈F and ∩i=1∞Ai∈F (closed under countable union and intersection)
Why we need to define σ-field
It requires to formally define the probability.
What is probability measure?
We say P (F→[0,1])is a probability measure if satisfy 3 conditions.
P(∅)=0
P(Ω)=1
∀A∈F,P(A)≥0채
Properties of probability
Joint probability
P(A,B):=P(A∩B)
Marginal probability
P(A),P(B)
Independence
P(A∩B)=P(A)P(B)
Conditional probability
P(A∣B):=P(B)P(A∩B)] if P(B)=0
→ If A, and B are independent, P(A∣B)=P(A)
Law of total probability (a.k.a marginalization)
P(A)=∑i=1nP(A∩Bi)=∑i=1nP(A∣Bi)P(Bi)
{Bi}i∈I is a partition of Ω
💡
Marginalizing out unwanted data is a basic operation to process raw data
Countable additivity
For all countable collection {Ai}i∈I of pairwise discoint events
P(∪i∈IAi)=i∈I∑P(Ai)
Bayes theorem (very important in ML)
P(A∣B):=P(B)P(B∣A)P(B)
→ In ML, model typicall explains P(x|y)P(y) rather than P(y|x)
→ 정리할 것
What is Support?
First, we need to distinguish between the support of a random variable and the support of its distribution
The support of a random variable X
The support of a random variable X is the set of all possible values that X can take on with non-zero probability.
When X is a discrete r.v
The support is the set of all possible values of X
When X is a continuous r.v with probability density function f(x)
The support is the set of all values of x for which f(x) is non-zero
The support of a probability distribution
The support of a probability distribution is the set of all values of the random variable for which the probability density function is non-zero.
When f(x) is discrete probability distribution
The support is the set of all values for which the probability mass function is non-zero
When f(x) is continuous probability distribution
The support is the set of all values of x for which f(x) is non-zero.
Why we need to distinguish two concepts?
The support of a probability distribution can be different from the support of the random variable generated by that distribution.
For example, if X is generated by a uniform distribution over the interval (-1, 1), then its probability density function f(x) is:
f(x)=1/2 (−1<x<1)f(x)=0 (otherwise)
In this case, f(x) has an infinite support, since it is non-zero over the interval (-1, 1) and zero elsewhere. However, X has a finite support, since it can only take on values in the interval (-1, 1). So in this case, the statement "X has a finite support" is true, even though f(x) has an infinite support.
Continuous Probability
→ a probability of a single point is always zero
→ the probabilities are measured over intervals
Bernoulli Distribution
Bernoulli distribution with parameter p∈[0,1]
Binomial Distribution
A binomial random variable can be interpreted as the sum of n independent Bernoulli random variables.
E[X]=npVar[X]=np(1−p)
Beta Distribution
Beta distribution with parameters α,β>0
P(X=x)=xα−1(1−x)n−1
Beta distribution is often used to model parameter p of bernoulli distribution