Bài giảng về kỹ thuật Hidden markov models trong xử lý ảnh (chương trình cao học)
Trang 1Hidden Markov Models
Ankur Jain Y7073
Trang 2What is Covered
• Observable Markov Model
• Hidden Markov Model
• Evaluation problem
• Decoding Problem
Trang 3• The output of the process is the set of states at each instant of time
Markov Models
} ,
, , { s1 s2 sN
, , ,
) ,
, ,
| ( sik si1 si2 sik1 P sik sik1
)
| ( i j
a
) ( i
Trang 4• By Markov chain property, probability of state sequence can be found by the formula:
Calculation of sequence probability
) (
)
| (
)
| (
)
| (
) ,
, ,
( )
| (
) ,
, ,
( ) ,
, ,
| (
) ,
, ,
(
1 1
2 2
1 1
1 2
1 1
1 2
1 1
2 1
2 1
i i
i ik
ik ik
ik
ik i
i ik
ik
ik i
i ik
i i
ik ik
i i
s P s
s P s
s P s
s
P
s s
s P s
s
P
s s
s P s
s s
s P s
s s
Trang 5Rain Dry
0.7 0.3
• Two states : ‘Rain’ and ‘Dry’
• Suppose we want to calculate a probability of a sequence of states in our example, {‘Dry’,’Dry’,’Rain’,Rain’}
P({‘Dry’,’Dry’,’Rain’,Rain’} ) = ??
Example of Markov Model
Trang 6Hidden Markov models.
• The observation is turned to be a probabilistic function (discrete
or continuous) of a state instead of an one-to-one correspondence
of a state
•Each state randomly generates one of M observations (or visible states)
• To define hidden Markov model, the following probabilities
aij= P(si| sj) , matrix of observation probabilities B=(bi (vm )),
bi(vm ) = P(vm| si) and a vector of initial probabilities =(i), i = P(si) Model is represented by M=(A, B, ).
} ,
, , { v1 v2 vM
Trang 7HMM Assumptions
• Markov assumption: the state transition depends only on
the origin and destination
• Output-independent assumption: all observation frames
are dependent on the state that generated them, not on
neighbouring observation frames
Trang 8Low High
0.7 0.3
DryRain
0.4 0.4
Example of Hidden Markov Model
Trang 9• Two states : ‘Low’ and ‘High’ atmospheric pressure.
• Two observations : ‘Rain’ and ‘Dry’
Example of Hidden Markov Model
Trang 10•Suppose we want to calculate a probability of a sequence of
observations in our example, {‘Dry’,’Rain’}
•Consider all possible hidden state sequences:
where first term is :
Calculation of observation sequence probability
Trang 11Evaluation problem Given the HMM M=(A, B, ) and the
O
• Learning problem Given some training observation sequences
O=o1 o2 oK and general structure of HMM (numbers of hidden
probability
O=o1 oK denotes a sequence of observations ok{v1,…,vM }.
Main issues using HMMs :
Trang 12• Typed word recognition, assume all characters are separated.
• Character recognizer outputs probability of the image being particular character, P(image|character)
0.5 0.03 0.005
0.31
z
cba
Word recognition example(1).
Hidden state Observation
Trang 13• Hidden states of HMM = characters.
• Observations = typed images of characters segmented from the image Note that there is an infinite number of
observations
• Observation probabilities = character recognizer scores
•Transition probabilities will be defined differently in two
Trang 14• If lexicon is given, we can construct separate HMM models for each lexicon word.
0.5 0.03
• Here recognition of word image is equivalent to the problem
of evaluating few HMM models
•This is an application of Evaluation problem.
Word recognition example(3).
0.4 0.6
Trang 15• We can construct a single HMM for all words.
• Hidden states = all characters in the alphabet
• Transition probabilities and initial probabilities are calculated from language model
• Observations and observation probabilities are as before
• This is an application of Decoding problem.
Word recognition example(4).
Trang 16•Evaluation problem Given the HMM M=(A, B, ) and the
• Direct Evaluation :Trying to find probability of observations
O=o1 o2 oT by means of considering all hidden state sequences
•P(o1 o2 oT ) = P(o1 o2 oT , S ) {S is state sequence}
•P(o1 o2 oT ) = P(o1 o2 oT /S ) P(S)
•P(S) = {Markov property}
Evaluation Problem.
Trang 17• NT hidden state sequences - exponential complexity.
• Use Forward-Backward HMM algorithms for efficient
calculations
of the partial observation sequence o1 o2 ok and that the hidden state at time k is si : k(i)= P(o1 o2 ok , qk= si )
Trang 20• Define the backward variable k(i) as the joint probability of the
state at time k is si : k(i)= P(ok+1 ok+2 oK |qk= si)
Trang 21k(i) k(i) = P(o1 o2 oK , qk= si)
• P(o1 o2 oK) = i k(i) k(i)
Trang 22•We want to find the state sequence Q= q1…qK which maximizes
P(Q | o1 o2 oK ) , or equivalently P(Q , o1 o2 oK )
• Brute force consideration of all paths takes exponential time Use efficient Viterbi algorithm instead.
state sequence q1… qk-1 and getting into qk= si
k(i) = max P(q1… qk-1 , qk= si , o1 o2 ok)
where max is taken over all possible paths q1… qk-1
Decoding problem
Trang 23• General idea:
if best path ending in qk= sj goes through qk-1= si then it
Trang 24•Termination: choose best path ending at time K
maxi [ K(i) ]
• Backtrack best path
This algorithm is similar to the forward recursion of evaluation
problem, with replaced by max and additional backtracking.
Viterbi algorithm (2)
Trang 25Learning problem
• The most difficult of the three problems, because there is
no known analytical method that maximizes the joint
probability of the training data
• Solved by the Baum-Welch (known as forward backward) algorithm and EM (Expectation maximization) algorithm
Trang 26HMM & Neural network
• Predicting the Protein Structures
• Fault detection in ground antenna
Trang 27• What is hidden in HMM??
• How many states do I think my model needs?
• How many possible observations do I think there are?
• What kind of topology do I think my state transition graph should have?
Trang 28Thank You