When we fit a distribution to data, we estimate good values for these parameters from observed data
Fitting a Distribution
All statistical distributions and models have parameters. The values given to theses parameters determine the exact mathematical function involved.
Maximum Likelihood (MLE)
The likelihood of the model given observed data is just the probability of the data given the model. So this is equivalent to selecting the parameters for the model that maximize the probability of the observed data given that we assume it was generated by the model.
Maximum A Posteriori (MAP)
Assume some form of prior distribution over the parameters of the model. This distribution represents out initial belief of the probability distribution over possible values of the parameters of the model prior to observing any data.
Bayes Estimate
In Bayes estimate, it choose parameters of the model based on the expected value of the probability distribution p(M∣D).
However in MAP estimate, it choose parameters of the model based on the maximum value of the probability distribution p(M∣D).
Dirichlet Distributions
These Dirichlet parameters are counts. They may count how often we have actually observed the categorical variable take different values. Complete ignorance of values of a categorical distribution’s parameters is represented by a Dirichlet distribution whose values are all one.
Example
But if we have a reason to think there is something special, we should encode these prior experience into our Dirichlet prior!