Markov Chains appear all over computer science, mathematics and AI. Just some of the applications
Biology : birth death process, disease spreading
Biology : DNA/RNA/Protein sequence analysis
Speech recognition
Control theory, filtering
Markov chain과 Markov process와의 차이점이 무엇인가?
일반적으로 Markov property를 만족하는 것을 Markov process라고 하고, 이 중 discrete한 경우를 Markov chain이라고 많이 부른다. 사실 2개의 용어를 크게 섞어써도 무방하기는 하다.
Assumption
The system is in one of a number of states : Si
For each pair of states Si and Sj, we have some probability pi,j=P(Si∣Sj) of going from state Si to Sj
The probability only depends on a finite number of previous states
예를 들어 날씨 예보 모델에서 하루하루의 날씨를 Markov chain으로 모델링한다면, '오늘'의 날씨는 '어제'의 날씨에만 의존하고 '그제'나 그 이전의 날씨는 고려하지 않는 것이다.
이러한 성질을 Markov property 혹은 memoryless 라고 불린다.
따라서 우리는 system을 sequence of state 로 기술할 수 있게 된다.
Questions you can ask of Markov Models
Given probabilities what is the most likely sequence of states?
If I let the system run for a long time, what is the probability distribution of the states that I will end up in? What is the most likely state I will end up in?
Given a sequence of states, how likely is that sequence based on our model?
Can I estimate transition probabilities from the data?
Divide states into observable and hidden states. Given a sequence of observed states, what is the most likely sequence of hidden states that give you the observed states?
Example - Rain or no Rain
Two states S=0 sunny, and R=1 raining
Initial probability distribution
50% chance of sunny, 50% chance of rain.
We need four numbers p(S∣R),p(S∣S),p(R∣R),p(R∣S)
After one day, what is the probability that it is sunny?
After one day, what is the probability that it is rainy?
Therefore, we can interpret this result as
General Setting
Given n states (1,…,n), we have a n by n matrix T
Given the initial distribution of states S0 then after n steps we end up in the distribution:
Sampling from a Markov Chain
Useful for simulation
Sample the initial state, randomly pick a state with the required probabilities
Calculate the new state probabilities and sample again with the required probabilities
Example
Two states A and B with probability distribution
and transition matrix
Sampling means that you pick A or B you have a 65% chance of picking A
Suppose you pick A, this gives that state vector
Calculate
Then sample state A and B with probabilities 0.3 and 0.7
Hidden Markov Models
이전까지의 Markov model의 경우에는 observe 할 수 있는 state만을 다루었다. 반면 Hidden Markov Model의 경우에는 관찰할 수 없는 state들이 Markov property를 가지고 이러한 state가 observation에 영향을 주는 system을 다룬다.
Applications
Speech recognition : You observe the sound (observation), and the hidden state is what word is being spoken
Error correcting codes : You observe a noisy bit-stream (observation), and you want to infer the sent message (hidden state)
Language processing : Given a sentence “The band goes whoop” work out the probability that “whoop” is an adverb or a noun (find hidden state) from it place in the sentence (observation)
Biological sequence analysis : Genetic sequence have lots of codes, start and stop sequences, protein codes, etc. You observe the raw sequence AGTAGAATAA (observation) and you want to know what it is doing (hidden state).
Example
Divide the world up into states (Raining, R=0, and Sunny S=1) and observations (if my mother takes my Umbrella, U=1, and not takes my Umbrella, N=0)
As before, initial probability distribution is
Again we have a transition matrix for the states:
Unlike before we need to know the probabilities that an umbrella will be used or not
Suppose I observe for three days and see U,N,U. What is the probability of this sequence?
If I knew that the hidden states on those three days were R,S,R then I can use the conditional probabilities
For three days there are 8 possible sequence of internal hidden states
But you have to note that each term are not equally likely. So we have to weight each item by multiplying the probability that observation occurs
But this approach is impractical. So we are going to use dynamic programming instead
By Bayes’ theorem
But we have the same factor p(U) in each term. So we just calculate the numerator and scale it.
Then how do we calculate p(R2∣UN). The Markov assumption is that the current state only depends on the previous state. We can get R2 from R1 or S1, and we have already calculated those probabilities
Since, U is only related to R2
In addition
Therefore,
Also,
이를 행렬식으로 표현하면 다음과 같다
where ∗ is element-wise product
Smoothing
Given that I know what state I’m in and I have an observation what is the probability of the previous states. Different from the forward calculation, we know what state we are in.
→ This can be solved with a combination of forward and backward reasoning.
Viterbi’s Algorithm
Given a sequence of observations, what is the most likely sequence of hidden states that give that sequence
→ 즉, 주어진 observation들을 잘 표상하는 hidden state를 찾는 것이 목표이다.
Basic idea
We are calculating the most probable path from S1 to Sn that passes through Sk
Then it is concatenation of the most probable path from S1 to Sk and the most probable path from Sk to Sn
→ 즉, Sk 전후로 가장 probable한 path를 찾은 뒤에 concatenation하겠다는 전략이다.