Computer Science/Artificial Intelligence

4. Statistical Learning Basics

728x90

Distribution

When we fit a distribution to data, we estimate good values for these parameters from observed data

💡

즉, observed data를 기반으로 parameter를 추정하는 것이다.

Fitting a Distribution

All statistical distributions and models have parameters. The values given to theses parameters determine the exact mathematical function involved.

Maximum Likelihood (MLE)

The likelihood of the model given observed data is just the probability of the data given the model. So this is equivalent to selecting the parameters for the model that maximize the probability of the observed data given that we assume it was generated by the model.

💡

If we have finite amounts of data, MLE approach may overfit

Maximum A Posteriori (MAP)

Assume some form of prior distribution over the parameters of the model. This distribution represents out initial belief of the probability distribution over possible values of the parameters of the model prior to observing any data.

💡

Maximum Likelihood approach is equivalent to the Maximum A Posteriori only when the prior is uniform distribution.

Bayes Estimate

In Bayes estimate, it choose parameters of the model based on the expected value of the probability distribution $p(M|D)$ .

However in MAP estimate, it choose parameters of the model based on the maximum value of the probability distribution $p(M|D)$ .

💡

추가적으로 Bayes estimate를 하기 위해서는 expected value를 구해야하는데 실제로 적분을 수행해야하는 경우에는 현실적이지 않으므로 Monte Carlo Integration을 통해서 근사적으로 구하게 된다.

💡

The Bayes estimate is equivalent to the Maximum A Posteriori approach only when the expected value (or mean) of

p(M|D)

is also its mode (the maximum)

Dirichlet Distributions

These Dirichlet parameters are counts. They may count how often we have actually observed the categorical variable take different values. Complete ignorance of values of a categorical distribution’s parameters is represented by a Dirichlet distribution whose values are all one.

💡

만약에 추가적인 정보가 있는 경우에는 1로 초기화하는 것이 아니라 해당 수를 반영해주면 된다.

💡

Dirichlet distribution은 Multinomial distribution의 conjugate prior이다.

Example

But if we have a reason to think there is something special, we should encode these prior experience into our Dirichlet prior!

💡

The Bayes estimate method for fitting a categorical distribution using counts that start with one is just to place an ignorance encoding Dirichlet prior over the parameters.

Contents

새소식

인기 검색어