KEMBAR78
HMM Model | PDF | Learning | Statistical Theory
0% found this document useful (0 votes)
22 views11 pages

HMM Model

Uploaded by

davidklein3678
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views11 pages

HMM Model

Uploaded by

davidklein3678
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

HIDDEN

MARKOV
MODEL(HMM)
Anshul Jaiswal - (MCA/45017/23)
Introduction
Introduction to POS Tagging
Hidden Markov Models (HMMs) are
particularly effective for POS tagging, as POS tagging assigns a grammatical
they model sequences and predict each category (e.g., noun, verb, adjective)
tag based on the context of previous to each word in a sentence. It’s
tags. crucial for understanding sentence
A Hidden Markov Model (HMM) is a structure and meaning.
statistical model that represents systems Challenges in POS Tagging: Some
with unobserved (hidden) states. In HMM, words may serve multiple functions
we can observe certain outputs, but the depending on context. For example,
actual sequence of states that generated “run” can be a noun or a verb.
those outputs is hidden. Importance of Context: POS tagging
algorithms rely on the surrounding
words to determine the correct tag
for ambiguous words.
Goal of HMM POS Tagging
Objective
Given a sentence, determine the sequence of tags
T=t1,t2,…,tn.
T = t_1, t_2, \dots, t_n
T=t1​,t2​,…,tn​
that maximizes the likelihood of tags given the sequence of words.

Sequence of Words
Let
W=w1,w2,…,wn
W = w_1, w_2, \ dots, w_n
W=w1​,w2​,…,wn​
represent the words in the sentence.

Probability Maximization
The goal is to find the tag sequence T that maximizes the
conditional probability P(T∣W)P(T | W)P(T∣W).
Applying Bayes’
Theorem
Bayes’ Theorem : Helps express the
probability of the tag sequence given words.
Simplified for HMM : Since P(W) is constant
for all possible tag sequences, we focus on
maximizing.
Interpretation : This expression combines
two probabilities:
Prior Probability P(T) : Likelihood of a tag
sequence occurring.
Likelihood P(W∣T) : Likelihood of observing
the words given the tag sequence.
Using the Chain Rule for
Probability Calculation
Application of Chain Rule
P(T)×P(W∣T)=i=1∏n​P(wi​∣ti​)×P(ti​∣ti−1​)

Transition Probability P(ti​∣ti−1​)


Probability of moving from one tag to the next (e.g., noun to verb).

Emission Probability
P(wi​∣ti​)
Probability of seeing a word given a specific tag (e.g., probability of
“run” as a verb).

Contextual Example
For “The dog barks,” this chain rule allows us to model the
likelihood of each word being assigned a particular tag.
SIMPLIFYING
ASSUMPTIONS FOR
HMM POS TAGGING
Emission Independence: Assumes that each word’s
likelihood depends only on its corresponding tag,
not on other words.
P(wi​∣w1​,…,wi−1​,t1​,…,ti​)≈P(wi​∣ti​)
Transition Independence: Assumes each tag
depends only on the immediately preceding tag.
P(ti​∣ti−1​,…,t1​)≈P(ti​∣ti−1​)
Resulting Formula
T=argmax (i=1∏n) ​P(wi​∣ti​)×P(ti​∣ti−1​)
Benefit: These assumptions reduce the model’s
complexity, making computation feasible.
Calculating Transition
and Emission
Probabilities
Transition Probability P(ti∣ti−1) :
P(ti​∣ti−1​) = count(ti−1​, ti)/count(ti−1​​)​
Interpretation: Likelihood of a tag following a
specific previous tag (e.g., likelihood of verb
following noun).
Emission Probability P(wi∣ti) :
P(wi​∣ti​) = count(wi​,ti​) ​/ count(ti​)
Interpretation: Likelihood of a word given a
tag (e.g., “dog” as a noun).
Training Phase
Using a Labeled Dataset: The model
learns transition and emission
probabilities from annotated text.
Goal: Build a probability matrix for
each tag transition and word-tag
pair.

Steps in HMM POS


Tagging Inference Phase
POS Tagging: The algorithm
identifies the most likely POS tags
for each word in a sentence.
Dynamic Programming: Algorithms
like the Viterbi algorithm calculate
the most probable tag sequence by
maximizing the product of transition
and emission probabilities.
HMMs assume each tag depends
only on the previous tag. This limits
Simplified
the model’s ability to capture more
Dependency
complex dependencies.
Assumption

HMMs require substantial


Reliance on labeled data to accurately
Annotated estimate transition and emission
Training probabilities, which can be time-
Data consuming and resource-
intensive to obtain.
Limitations
They are less effective in capturing
Performance
dependencies between non-adjacent
on Long
tags, which might affect accuracy in
Dependencies
complex sentences.
Conclusion
Effectiveness: Hidden Markov Models (HMMs) provide a
reliable and interpretable framework for POS tagging by
capturing sequential dependencies through transition and
emission probabilities.
Application: Widely used in natural language processing
tasks like speech recognition, machine translation, and text
analysis.
Limitations: HMMs assume each tag depends only on the
previous tag, which may not capture complex dependencies
in natural language. Additionally, their performance relies
on high-quality annotated training data.
Thank You

You might also like