Computer Science/Machine learning

2. Review on Probability Theory

728x90

1_Introduction (1).pdf

Probability Space

Definition of Probability space

A probability space is defined by triplet $(\Omega, \mathcal F, \mathcal P):$

$\Omega$ : Sample space

$\mathcal F$ : $\sigma$ -algebra on $\Omega$

$\mathcal P$ : $\mathcal F \rarr [0, 1]$

Definition of Sample space
Set of all possible outcomes, where an outcome is the result of a single execution of the model

Definition of Event
Subset of Sample space( $\Omega$ )

Definition of Field
A collection $\mathcal F$ of subset of $\Omega$ forms a field if following 3 conditions hold.
(It is equivalent to $\mathcal F$ is a collection of events on $\Omega$ )
1. $\emptyset \in \mathcal F, \Omega \in \mathcal F$
1. $\text{If }A \in \mathcal F, \text{ then } A^c \in F$
1. $\text{If }A, B \in \mathcal F, \text{ then } A \cup B \in \mathcal F \text{ and } A \cap B \in \mathcal F$ (closed under finite union and intersection)

Definition of $\sigma$ -field
A collection $\mathcal F$ of subset of $\Omega$ forms a $\sigma$ -field if following 3 conditions hold.
(Almost same as the definition of Field except the 3rd condition)
1. $\emptyset \in \mathcal F, \Omega \in \mathcal F$
1. $\text{If }A \in \mathcal F, \text{ then } A^c \in F$
1. $\text{If }A_i \in \mathcal F(i \in I ), \text{ then } \cup_{i = 1}^\infty A_i\in \mathcal F \text{ and } \cap_{i = 1}^\infty A_i \in \mathcal F$ (closed under countable union and intersection)

Why we need to define

\sigma

-field

It requires to formally define the probability.

What is probability measure?

We say $\mathcal P$ ( $\mathcal F \rarr [0,1]$ )is a probability measure if satisfy 3 conditions.

$\mathcal P(\emptyset) = 0$

$\mathcal P(\Omega) = 1$

$\forall A\in \mathcal F, \mathcal P(A) \ge 0$ 채

Properties of probability

Joint probability

$P(A, B) := P(A \cap B)$

Marginal probability

$P(A), P(B)$

Independence

$P(A\cap B)= P(A)P(B)$

Conditional probability

$P(A|B) := \frac{P(A\cap B)}{P(B)} ] \text{ if } P(B) \ne 0$

→ If A, and B are independent, $P(A|B) = P(A)$

Law of total probability (a.k.a marginalization)

$P(A) = \sum_{i = 1}^n P(A\cap B_i) = \sum_{i = 1}^nP(A|B_i)P(B_i)$

$\{B_i\}_{i \in I}$ is a partition of $\Omega$

💡

Marginalizing out unwanted data is a basic operation to process raw data

Countable additivity

For all countable collection $\{A_i\}_{i \in I}$ of pairwise discoint events

P(\cup_{i \in I}A_i) = \sum_{i\in I}P(A_i)

Bayes theorem (very important in ML)

P(A|B) := \frac{P(B|A)P(B)}{P(B)}

→ In ML, model typicall explains P(x|y)P(y) rather than P(y|x)

→ 정리할 것

What is Support?

First, we need to distinguish between the support of a random variable and the support of its distribution

The support of a random variable X

The support of a random variable X is the set of all possible values that $X$ can take on with non-zero probability.

When X is a discrete r.v
The support is the set of all possible values of $X$

When X is a continuous r.v with probability density function f(x)
The support is the set of all values of x for which f(x) is non-zero

The support of a probability distribution

The support of a probability distribution is the set of all values of the random variable for which the probability density function is non-zero.

When f(x) is discrete probability distribution
The support is the set of all values for which the probability mass function is non-zero

When f(x) is continuous probability distribution
The support is the set of all values of x for which f(x) is non-zero.

Why we need to distinguish two concepts?

The support of a probability distribution can be different from the support of the random variable generated by that distribution.

For example, if X is generated by a uniform distribution over the interval (-1, 1), then its probability density function f(x) is:

f(x) = 1/2 \text{ (}-1<x<1) \\ f(x) = 0 \text{ (otherwise)}

In this case, f(x) has an infinite support, since it is non-zero over the interval (-1, 1) and zero elsewhere. However, X has a finite support, since it can only take on values in the interval (-1, 1). So in this case, the statement "X has a finite support" is true, even though f(x) has an infinite support.