Week 3
Week 3
EL
Pawan Goyal
PT CSE, IITKGP
Week 3: Lecture 1
N
Pawan Goyal (IIT Kharagpur) Language Modelling: Advanced Smoothing Models Week 3: Lecture 1 1 / 18
Advanced smoothing algorithms
Some Examples
EL
Good-Turing
Kneser-Ney
Pawan Goyal (IIT Kharagpur) Language Modelling: Advanced Smoothing Models Week 3: Lecture 1 2 / 18
Nc : Frequency of frequency c
Example Sentences
<s>I am here </s>
<s>who am I </s>
EL
<s>I would like </s>
Computing Nc
am
I
here
3
2
1
PT N1 = 4
N
who 1 N2 = 1
would 1 N3 = 1
like 1
Pawan Goyal (IIT Kharagpur) Language Modelling: Advanced Smoothing Models Week 3: Lecture 1 3 / 18
Good Turing Estimation
Idea
Reallocate the probability mass of n grams that occur r + 1 times in the
training data to the n grams that occur r times
EL
In particular, reallocate the probability mass of n grams that were seen
once to the n grams that were never seen
Adjusted count
PT
For each count c, an adjusted count c⇤ is computed as:
N
(c + 1)Nc+1
c⇤ =
Nc
where Nc is the number of n grams seen exactly c times
Pawan Goyal (IIT Kharagpur) Language Modelling: Advanced Smoothing Models Week 3: Lecture 1 4 / 18
Good Turing Estimation
EL
N
(c + 1)Nc+1
c⇤ =
Nc
What if c = 0 PT N1
N
P⇤GT (things with frequency c) = where N denotes the total number of
N
bigrams that actually occur in training
Pawan Goyal (IIT Kharagpur) Language Modelling: Advanced Smoothing Models Week 3: Lecture 1 5 / 18
Complications
EL
For small c, Nc > Nc+1
For large c, too jumpy
PT
N
Pawan Goyal (IIT Kharagpur) Language Modelling: Advanced Smoothing Models Week 3: Lecture 1 6 / 18
Complications
EL
For small c, Nc > Nc+1
For large c, too jumpy
Simple Good-Turing PT
Replace empirical Nk with a best-fit power law once counts get unreliable
N
Pawan Goyal (IIT Kharagpur) Language Modelling: Advanced Smoothing Models Week 3: Lecture 1 6 / 18
Good-Turing numbers: Example
EL
22 million words of AP Neswire
(c + 1)Nc+1
c⇤ =
Nc
PT
N
Pawan Goyal (IIT Kharagpur) Language Modelling: Advanced Smoothing Models Week 3: Lecture 1 7 / 18
Good-Turing numbers: Example
EL
22 million words of AP Neswire
(c + 1)Nc+1
c⇤ =
It looks like c⇤ = c
Nc
PT
0.75
N
Pawan Goyal (IIT Kharagpur) Language Modelling: Advanced Smoothing Models Week 3: Lecture 1 8 / 18
Absolute Discounting Interpolation
EL
c(wi 1 , wi ) d
PAbsoluteDiscounting (wi |wi 1 ) = + l(wi 1 )P(wi )
c(wi 1 )
PT
We may keep some more values of d for counts 1 and 2
But can we do better than using the regular unigram correct?
N
Pawan Goyal (IIT Kharagpur) Language Modelling: Advanced Smoothing Models Week 3: Lecture 1 9 / 18
Kneser-Ney Smoothing
Intuition
Shannon game: I can’t see without my reading ...: glasses/Francisco?
“Francisco” more common that “glasses”
EL
But “Francisco” mostly follows “San”
Pawan Goyal (IIT Kharagpur) Language Modelling: Advanced Smoothing Models Week 3: Lecture 1 10 / 18
Kneser-Ney Smoothing
EL
Normalized by the total number of word bigram types
PT
Pcontinuation (w) =
|{wi 1 : c(wi 1 , w) > 0}|
|{(wj 1 , wj ) : c(wj 1 , wj ) > 0}|
N
A frequent word (Francisco) occurring in only one context (San) will have a low
continuation probability
Pawan Goyal (IIT Kharagpur) Language Modelling: Advanced Smoothing Models Week 3: Lecture 1 11 / 18
Kneser-Ney Smoothing
max(c(wi 1 , wi ) d, 0)
EL
PKN (wi |wi 1 ) = + l(wi 1 )Pcontinuation (wi )
c(wi 1 )
l is a normalizing constant
PT
l(wi 1 ) =
d
c(wi 1 )
|{w : c(wi 1 , w) > 0}|
N
Pawan Goyal (IIT Kharagpur) Language Modelling: Advanced Smoothing Models Week 3: Lecture 1 12 / 18
Model Combination
As N increases
EL
The power (expressiveness) of an N-gram model increases
But the ability to estimate accurate parameters from sparse data
PT
decreases (i.e. the smoothing problem gets worse).
Pawan Goyal (IIT Kharagpur) Language Modelling: Advanced Smoothing Models Week 3: Lecture 1 13 / 18
Backoff and Interpolation
EL
Backoff
use trigram if you have good evidence
PT
otherwise bigram, otherwise unigram
N
Interpolation
mix unigram, bigram, trigram
Pawan Goyal (IIT Kharagpur) Language Modelling: Advanced Smoothing Models Week 3: Lecture 1 14 / 18
Backoff
EL
unigram probability P(wi )
Pawan Goyal (IIT Kharagpur) Language Modelling: Advanced Smoothing Models Week 3: Lecture 1 15 / 18
Example Problem
In a corpus, suppose there are 4 words, a, b, c, and d. You are provided with
the following counts.
EL
n-gram count n-gram count n-gram count
aba 4 ba 5 a 8
abb 0 bb 3 b 9
abc
abd
0
0 PT bc
bd
0
0
c
d
8
7
N
Use the recursive definition of backoff smoothing to obtain the probability
distribution, Pbackoff (wn |wn 2 wn 1 ), where wn 1 = b and wn 2 = a.
Also assume that P̂(x) = P(x) 1/8.
Pawan Goyal (IIT Kharagpur) Language Modelling: Advanced Smoothing Models Week 3: Lecture 1 16 / 18
Linear Interpolation
Simple Interpolation
EL
 li = 1
i
Pawan Goyal (IIT Kharagpur) Language Modelling: Advanced Smoothing Models Week 3: Lecture 1 17 / 18
Setting the lambda values
EL
Use a held-out corpus
Choose ls to maximize the probability of held-out data:
Find the N-gram probabilities on the training data
PT
Search for ls that give the largest probability to held-out data
N
Pawan Goyal (IIT Kharagpur) Language Modelling: Advanced Smoothing Models Week 3: Lecture 1 18 / 18
Computational Morphology
EL
Pawan Goyal
PT CSE, IITKGP
Week 3: Lecture 2
N
Morphology studies the internal structure of words, how words are built up
from smaller meaningful units called morphemes
dogs
EL
2 morphemes, ‘dog’ and ‘s’
‘s’ is a plural marker on nouns
unladylike
3 morphemes
PT
N
un- ‘not’
lady ‘well-behaved woman’
-like ‘having the characteristic of’
EL
Variants of the same morpheme, but cannot be replaced by one another
Example
PT
opposite: un-happy, in-comprehensible, im-possible, ir-rational
N
Bound
EL
Cannot appear as a word by itself.
-s (dog-s), -ly (quick-ly), -ed (walk-ed)
Free
PT
Can appear as a word by itself; often can combine with other morphemes too.
house (house-s), walk (walk-ed), of, the, or
N
EL
Stems (roots): The core meaning bearing units
Affixes: Bits and pieces adhering to stems to change their meanings and
grammatical functions
PT
N
EL
Stems (roots): The core meaning bearing units
Affixes: Bits and pieces adhering to stems to change their meanings and
grammatical functions
PT
Mostly, stems are free morphemes and affixes are bound morphemes
N
EL
Suffix: -ity, -ation, etc (-taa, -ke, -ka etc.)
talk-ing, quick-ly
Infix: ‘n’ in ‘vindati’ (he knows), as contrasted with vid (to know).
PT
Philippines: basa ‘read’ ! b-um-asa ‘read’
English: abso-bloody-lutely (emphasis)
N
Circumfixes - precedes and follows the stem
Dutch: berg ‘mountain’, ge-berg-te ‘mountains’
Content morphemes
EL
Carry some semantic content
car, -able, un-
Functional morphemes
Provide grammatical information
-s (plural), -s (3rd singular)
PT
N
EL
PT
N
Inflectional morphology
EL
Grammatical: number, tense, case, gender
Creates new forms of the same word: bring, brought, brings, bringing
Derivational morphology
PT
Creates new words by changing part-of-speech: logic, logical, illogical,
illogicality, logician
N
Fairly systematic but some derivations missing: sincere - sincerity, scarce -
scarcity, curious - curiosity, fierce - fiercity?
Concatenation
EL
Adding continuous affixes - the most common process:
hope+less, un+happy, anti+capital+ist+s
PT
Often, there are phonological/graphemic changes on morpheme boundaries:
book + s [s], shoe + s [z]
N
happy +er ! happier
EL
Nama: ‘go’ (look), ‘go-go’ (examine with attention)
Tagalog: ‘basa’ (read), ‘ba-basa’(will read)
PT
Sanskrit: ‘pac’ (cook), ‘papāca’ (perfect form, cooked)
Phrasal reduplication (Telugu): pillavād.u nad.ustū nad.ustū pad.i pōyād.u
(The child fell down while walking)
N
Suppletion
EL
‘irregular’ relation between the words
go - went, good - better
Compounding
Words formed by combining two or more words
Example in English:
EL
Adj + Adj ! Adj: bitter-sweet
N + N ! N: rain-bow
V + N ! V: pick-pocket
P + V ! V: over-do
PT
N
Particular to languages
room-temperature: Hindi translation?
Compounding
Words formed by combining two or more words
Example in English:
EL
Adj + Adj ! Adj: bitter-sweet
N + N ! N: rain-bow
V + N ! V: pick-pocket
P + V ! V: over-do
PT
N
Particular to languages
room-temperature: Hindi translation?
Acronyms
laser: Light Amplification by Simulated Emission of Radiation
EL
Blending
Parts of two different words are combined
breakfast + lunch ! brunch
smoke + fog ! smog
motor + hotel ! motel
PT
N
Clipping
Longer words are shortened
Acronyms
laser: Light Amplification by Simulated Emission of Radiation
EL
Blending
Parts of two different words are combined
breakfast + lunch ! brunch
smoke + fog ! smog
motor + hotel ! motel
PT
N
Clipping
Longer words are shortened
doctor, laboratory, advertisement, dormitory, examination, bicycle, refrigerator
EL
Morphological analysis : word ! setOf(lemma +tag)
saw ! { <see, verb.past>, < saw, noun.sg>}
PT
Tagging: word ! tag, considers context
Peter saw her ! { <see, verb.past>}
Morpheme segmentation: de-nation-al-iz-ation
N
Generation: see + verb.past ! saw
EL
Text-to-speech synthesis:
lead: verb or noun?
read: present or past?
PT
Search and information retrieval
Machine translation, grammar correction
N
EL
Goal
PT
N
To take input forms like those in the first column and produce output forms like
those in the second column.
Output contains stem and additional information; +N for noun, +SG for
singular, +PL for plural, +V for verb etc.
boy ! boys
fly ! flys ! flies (y! i rule)
EL
Toiling ! toil
Duckling ! duckl?
Getter ! get + er
PT
N
Doer ! do + er
Beer ! be + er?
Morphotactics
EL
Which class of morphemes follow other classes of morphemes inside the
word?
Ex: plural morpheme follows the noun
PT
Only some endings go on some words
N
Do+er: ok
Be+er: not so
Morphotactics
EL
Which class of morphemes follow other classes of morphemes inside the
word?
Ex: plural morpheme follows the noun
PT
Only some endings go on some words
N
Do+er: ok
Be+er: not so
Get + er ! getter
Pawan Goyal (IIT Kharagpur) Computational Morphology Week 3: Lecture 2 18 / 19
Why can’t this be put in a big lexicon?
English: just 317,477 forms from 90,196 lexical entries, a ratio of 3.5:1
EL
Sanskrit: 11 million forms from a lexicon of 170,000 entries, a ratio of
64.7:1
PT
New forms can be created, compounding etc.
EL
Pawan Goyal
PT CSE, IITKGP
Week 3: Lecture 3
N
Pawan Goyal (IIT Kharagpur) Finite-state methods for morphology Week 3: Lecture 3 1 / 18
Finite State Automaton (FSA)
EL
What is FSA?
A kind of directed graph
PT
Nodes are called states, edges are labeled with symbols (possibly empty
N
✏)
Start state and accepting states
Recognizes regular languages, i.e., languages specified by regular
expressions
Pawan Goyal (IIT Kharagpur) Finite-state methods for morphology Week 3: Lecture 3 2 / 18
FSA for nominal inflection in English
EL
PT
N
Pawan Goyal (IIT Kharagpur) Finite-state methods for morphology Week 3: Lecture 3 3 / 18
FSA for English Adjectives
EL
PT
N
Word modeled
happy, happier, happiest, real, unreal, cool, coolly, clear, clearly, unclear,
unclearly, ...
Pawan Goyal (IIT Kharagpur) Finite-state methods for morphology Week 3: Lecture 3 4 / 18
Morphotactics
EL
The last two examples model some parts of the English morphotactics
But what about the information about regular and irregular roots?
Lexicon
PT
Can we include the lexicon in the FSA?
N
Pawan Goyal (IIT Kharagpur) Finite-state methods for morphology Week 3: Lecture 3 5 / 18
FSA for nominal inflection in English
EL
PT
N
Pawan Goyal (IIT Kharagpur) Finite-state methods for morphology Week 3: Lecture 3 6 / 18
After adding a mini-lexicon
EL
PT
N
Pawan Goyal (IIT Kharagpur) Finite-state methods for morphology Week 3: Lecture 3 7 / 18
Some properties of FSAs: Elegance
EL
There is an algorithm to transform each automaton into a unique
equivalent automaton with the least number of states
PT
An FSA is deterministic iff it has no empty (✏ ) transition and for each state
and each symbol, there is at most one applicable transition
Every non-deterministic automaton can be transformed into a
N
deterministic one
Pawan Goyal (IIT Kharagpur) Finite-state methods for morphology Week 3: Lecture 3 8 / 18
But ...
EL
We need transducers to build Morphological Analyzers
PT
Translate strings from one language to strings in another language
Like FSA, but each edge is associated with two strings
N
Pawan Goyal (IIT Kharagpur) Finite-state methods for morphology Week 3: Lecture 3 9 / 18
An example FST
EL
PT
N
Pawan Goyal (IIT Kharagpur) Finite-state methods for morphology Week 3: Lecture 3 10 / 18
Two-level morphology
Given the input cats, we would like to output cat+N+PL, talling us that cat is a
plural noun.
EL
We do this via a version of two-level morphology, a correspondence between
a lexical level (morphemes and features) to a surface level (actual spelling).
PT
N
Pawan Goyal (IIT Kharagpur) Finite-state methods for morphology Week 3: Lecture 3 11 / 18
Intermediate tape for Spelling change rules
EL
PT
N
Pawan Goyal (IIT Kharagpur) Finite-state methods for morphology Week 3: Lecture 3 12 / 18
English Nominal Inflection FST
EL
PT
N
Pawan Goyal (IIT Kharagpur) Finite-state methods for morphology Week 3: Lecture 3 13 / 18
Spelling Handling
EL
PT
N
Pawan Goyal (IIT Kharagpur) Finite-state methods for morphology Week 3: Lecture 3 14 / 18
Rule Handling
EL
Rule Notation
a ! b/c_d : “rewrite a as b when it occurs between c and d.”
PT
N
Pawan Goyal (IIT Kharagpur) Finite-state methods for morphology Week 3: Lecture 3 15 / 18
Morphological Analysis: Approaches
EL
Linguistic approach: A phonological component accompanying the simple
concatenative process of attaching an ending
PT
Engineering approach: Phonological changes and irregularities are
factored into endings and a higher number of paradigms
N
Pawan Goyal (IIT Kharagpur) Finite-state methods for morphology Week 3: Lecture 3 16 / 18
Different Approaches: Example from Czech
EL
PT
N
Pawan Goyal (IIT Kharagpur) Finite-state methods for morphology Week 3: Lecture 3 17 / 18
Tools Available
EL
AT&T FSM Library and Lextools
http://www2.research.att.com/~fsmtools/fsm/
OpenFST (Google and NYU)
http://www.openfst.org/
PT
N
Pawan Goyal (IIT Kharagpur) Finite-state methods for morphology Week 3: Lecture 3 18 / 18
Introduction to POS Tagging
EL
Pawan Goyal
PT CSE, IITKGP
Week 3: Lecture 4
N
Task
Given a text of English, identify the parts of speech of each word
EL
PT
N
EL
mostly content-bearing: they refer to objects, actions, and features in the
world
open class, since new words are added all the time
EL
PT
N
EL
N, V, Adj, Adv
More commonly used set is finer grained, “UPenn TreeBank tagset”, 45
tags
EL
PT
N
Example Sentence
EL
The grand jury commented on a number of other topics.
EL
On my back: back/NN
Win the voters back: back/RB
PT
Promised to back the bill: back/VB
EL
PT
N
EL
PT
N
EL
In the Brown corpus, race is a noun 98% of the time, and a verb 2% of the
time
A tagger for English that simply chooses the most likely tag for each word
PT
can achieve good performance
Any new approach should be compared against the unigram baseline
(assigning each token to its most likely tag)
N
EL
Mrs./NNP Shaefer/NNP never/RB got/VBD around/_ to/TO joining/VBG.
All/DT we/PRP gotta/VBN do/VB is/VBZ go/VB around/_ the/DT
corner/NN.
PT
Chateau/NNP Petrus/NNP costs/VBZ around/_ 2500/CD.
N
EL
Mrs./NNP Shaefer/NNP never/RB got/VBD around/RP to/TO joining/VBG.
All/DT we/PRP gotta/VBN do/VB is/VBZ go/VB around/IN the/DT
corner/NN.
PT
Chateau/NNP Petrus/NNP costs/VBZ around/RB 2500/CD.
N
EL
Some words are ambiguous, e.g. like, flies
Probabilities may help, if one tag is more likely than another
Local context
PT
Two determiners rarely follow each other
N
Two base form verbs rarely follow each other
Determiner is almost always followed by adjective or noun
Rule-based Approach
Assign each word in the input a list of potential POS tags
EL
Then winnow down this list to a single tag using hand-written rules
Statistical tagging
PT
Get a training corpus of tagged text, learn the transformation rules from
the most frequent tags (TBL tagger)
N
Probabilistic: Find the most likely sequence of tags T for a sequence of
words W
EL
The can was rusted.
The/DT can/MD was/VBD rusted/VBD.
MD !NN: DT_
VBD!VBN: VBD_
PT
Add transformation rules to reduce training mistakes
N
Problem at hand
We have some data {(d, c)} of paired observations d and hidden classes c.
EL
Part-of-Speech Tagging: words are observed and tags are hidden.
Text Classification: sentences/documents are observed and the
category is hidden.
PT
Categories can be positive/negative for sentiments ..
sports/politics/business for documents ...
N
What gives rise to the two families?
Whether they generate the observed data from hidden stuff or the hidden
structure given the data?
EL
e.g. Naïve Bayes’ classifiers, Hidden Markov Models etc.
data: P(c|d) PT
Take the data as given, and put a probability over hidden structure given the
EL
PT
N
EL
PT
N
Joint vs. conditional likelihood
A joint model gives probabilities P(d, c) and tries to maximize this joint
likelihood.
A conditional model gives probabilities P(c|d), taking the data as given
and modeling only the conditional probability of the class.
Pawan Goyal (IIT Kharagpur) Introduction to POS Tagging Week 3: Lecture 4 18 / 18
Hidden Markov Models for POS Tagging
EL
Pawan Goyal
PT CSE, IITKGP
Week 3: Lecture 5
N
Pawan Goyal (IIT Kharagpur) Hidden Markov Models for POS Tagging Week 3: Lecture 5 1 / 17
Probabilistic Tagging
EL
Tagging: Probabilistic View (Generative Model)
Find
T̂ = argmaxT P(T|W)
= argmaxT
PT
P(W|T)P(T)
P(W)
N
= argmaxT P(W|T)P(T)
Y
= argmaxT P(wi |w1 . . . wi 1 , t1 . . . ti )P(ti |t1 . . . ti 1 )
i
Pawan Goyal (IIT Kharagpur) Hidden Markov Models for POS Tagging Week 3: Lecture 5 2 / 17
Further simplifications
Y
T̂ = argmaxT P(wi |w1 . . . wi 1 , t1 . . . ti )P(ti |t1 . . . ti 1 )
i
The probability of a word appearing depends only on its own POS tag
EL
P(wi |w1 . . . wi 1 , t1 . . . ti ) ⇡ P(wi |ti )
Bigram assumption: the probability of a tag appearing depends only on
the previous tag
PT
P(ti |t1 . . . ti 1 ) ⇡ P(ti |ti 1 )
Using these simplifications:
Y
N
T̂ = argmaxT P(wi |ti )P(ti |ti 1 )
i
Pawan Goyal (IIT Kharagpur) Hidden Markov Models for POS Tagging Week 3: Lecture 5 3 / 17
Computing the probability values
EL
C(DT, NN) 56, 509
P(NN|DT) = = = 0.49
C(DT) 116, 454
PT
Word Likelihood probabilities p(wi |ti )
C(ti , wi )
N
P(wi |ti ) =
C(ti )
C(VBZ, is) 10, 073
P(is|VBZ) = = = 0.47
C(VBZ) 21, 627
Pawan Goyal (IIT Kharagpur) Hidden Markov Models for POS Tagging Week 3: Lecture 5 4 / 17
Disambiguating “race”
EL
PT
N
Pawan Goyal (IIT Kharagpur) Hidden Markov Models for POS Tagging Week 3: Lecture 5 5 / 17
Disambiguating “race”
EL
P(race|VB) vs. P(race|NN)
P(NR|VB) vs. P(NR|NN)
PT
After computing the probabilities
P(NN|TO)P(NR|NN)P(race|NN) = 0.0047 ⇥ 0.0012 ⇥ 0.00057 =
N
0.00000000032
P(VB|TO)P(NR|VB)P(race|VB) = 0.83 ⇥ 0.0027 ⇥ 0.00012 = 0.00000027
Pawan Goyal (IIT Kharagpur) Hidden Markov Models for POS Tagging Week 3: Lecture 5 6 / 17
What is this model?
EL
PT
N
This is a Hidden Markov Model
Pawan Goyal (IIT Kharagpur) Hidden Markov Models for POS Tagging Week 3: Lecture 5 7 / 17
Hidden Markov Models
EL
Word Likelihood probabilities (emissions) p(wi |ti )
What we have described with these probabilities is a hidden markov
model.
PT
Let us quickly introduce the Markov Chain, or observable Markov Model.
N
Pawan Goyal (IIT Kharagpur) Hidden Markov Models for POS Tagging Week 3: Lecture 5 8 / 17
Markov Chain = First-order Markov Model
Weather example
Three types of weather: sunny, rainy, foggy
EL
qn : variable denoting the weather on the nth day
We want to find the following conditional probabilities:
Pawan Goyal (IIT Kharagpur) Hidden Markov Models for POS Tagging Week 3: Lecture 5 9 / 17
Markov Chain Transition Table
EL
PT
N
Pawan Goyal (IIT Kharagpur) Hidden Markov Models for POS Tagging Week 3: Lecture 5 10 / 17
Using Markov Chain
Given that today the weather is sunny, what is the probability that
tomorrow is sunny and day after is rainy?
EL
P(q2 = sunny, q3 = rainy|q1 = sunny)
Pawan Goyal (IIT Kharagpur) Hidden Markov Models for POS Tagging Week 3: Lecture 5 11 / 17
Hidden Markov Model
For Markov chains, the output symbols are the same as the states
‘sunny’ weather is both observable and state
EL
But in POS tagging
The output symbols are words
PT
But the hidden states are POS tags
A Hidden Markov Model is an extension of a Markov chain in which the
output symbols are not the same as the states
N
We don’t know which state we are in
Pawan Goyal (IIT Kharagpur) Hidden Markov Models for POS Tagging Week 3: Lecture 5 12 / 17
Hidden Markov Models (HMMs)
EL
A set of states (here: the tags)
An output alphabet (here: words)
PT
Initial state (here: beginning of sentence)
State transition probabilities (here p(tn |tn 1 ))
Symbol emission probabilities (here p(wi |ti ))
N
Pawan Goyal (IIT Kharagpur) Hidden Markov Models for POS Tagging Week 3: Lecture 5 13 / 17
Graphical Representation
When tagging a sentence, we are walking through the state graph:
EL
PT
N
Edges are labeled with the state transition probabilities: p(tn |tn 1 )
Pawan Goyal (IIT Kharagpur) Hidden Markov Models for POS Tagging Week 3: Lecture 5 14 / 17
Graphical Representation
At each state we emit a word: P(wn |tn )
EL
PT
N
Pawan Goyal (IIT Kharagpur) Hidden Markov Models for POS Tagging Week 3: Lecture 5 15 / 17
Walking through the states: best path
EL
PT
N
Pawan Goyal (IIT Kharagpur) Hidden Markov Models for POS Tagging Week 3: Lecture 5 16 / 17
Walking through the states: best path
EL
PT
N
Pawan Goyal (IIT Kharagpur) Hidden Markov Models for POS Tagging Week 3: Lecture 5 17 / 17