0% found this document useful (0 votes)

3 views98 pages

Week 3

The document discusses advanced smoothing models in language modeling, focusing on techniques like Good-Turing and Kneser-Ney smoothing. It explains the principles behind these methods, including how to estimate probabilities for unseen n-grams and the importance of context in language prediction. Additionally, it covers morphological concepts, including morphemes, affixes, and the distinction between inflectional and derivational morphology.

Uploaded by

wacinop537

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views98 pages

Week 3

Uploaded by

wacinop537

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 98

Language Modelling: Advanced Smoothing Models

EL
Pawan Goyal

PT CSE, IITKGP

Week 3: Lecture 1
N

Pawan Goyal (IIT Kharagpur) Language Modelling: Advanced Smoothing Models Week 3: Lecture 1 1 / 18
Advanced smoothing algorithms

Some Examples

EL
Good-Turing
Kneser-Ney

Good-Turing: Basic Intuition

PT
Use the count of things we have see once
N
to help estimate the count of things we have never seen

Pawan Goyal (IIT Kharagpur) Language Modelling: Advanced Smoothing Models Week 3: Lecture 1 2 / 18
Nc : Frequency of frequency c

Example Sentences
<s>I am here </s>
<s>who am I </s>

EL
<s>I would like </s>

Computing Nc

am
I

here
3
2
1
PT N1 = 4
N
who 1 N2 = 1
would 1 N3 = 1
like 1

Pawan Goyal (IIT Kharagpur) Language Modelling: Advanced Smoothing Models Week 3: Lecture 1 3 / 18
Good Turing Estimation

Idea
Reallocate the probability mass of n grams that occur r + 1 times in the
training data to the n grams that occur r times

EL
In particular, reallocate the probability mass of n grams that were seen
once to the n grams that were never seen

Adjusted count
PT
For each count c, an adjusted count c⇤ is computed as:
N
(c + 1)Nc+1
c⇤ =
Nc
where Nc is the number of n grams seen exactly c times

Pawan Goyal (IIT Kharagpur) Language Modelling: Advanced Smoothing Models Week 3: Lecture 1 4 / 18
Good Turing Estimation

Good Turing Smoothing

c⇤
P⇤GT (things with frequency c) =

EL
N

(c + 1)Nc+1
c⇤ =
Nc

What if c = 0 PT N1
N
P⇤GT (things with frequency c) = where N denotes the total number of
N
bigrams that actually occur in training

Pawan Goyal (IIT Kharagpur) Language Modelling: Advanced Smoothing Models Week 3: Lecture 1 5 / 18
Complications

What about words with high frequency?

EL
For small c, Nc > Nc+1
For large c, too jumpy

PT
N

Pawan Goyal (IIT Kharagpur) Language Modelling: Advanced Smoothing Models Week 3: Lecture 1 6 / 18
Complications

What about words with high frequency?

EL
For small c, Nc > Nc+1
For large c, too jumpy

Simple Good-Turing PT
Replace empirical Nk with a best-fit power law once counts get unreliable
N

Pawan Goyal (IIT Kharagpur) Language Modelling: Advanced Smoothing Models Week 3: Lecture 1 6 / 18
Good-Turing numbers: Example

EL
22 million words of AP Neswire

(c + 1)Nc+1
c⇤ =
Nc
PT
N

Pawan Goyal (IIT Kharagpur) Language Modelling: Advanced Smoothing Models Week 3: Lecture 1 7 / 18
Good-Turing numbers: Example

EL
22 million words of AP Neswire

(c + 1)Nc+1
c⇤ =

It looks like c⇤ = c
Nc
PT
0.75
N

Pawan Goyal (IIT Kharagpur) Language Modelling: Advanced Smoothing Models Week 3: Lecture 1 8 / 18
Absolute Discounting Interpolation

Why don’t we just substract 0.75 (or some d)?

EL
c(wi 1 , wi ) d
PAbsoluteDiscounting (wi |wi 1 ) = + l(wi 1 )P(wi )
c(wi 1 )

PT
We may keep some more values of d for counts 1 and 2
But can we do better than using the regular unigram correct?
N

Pawan Goyal (IIT Kharagpur) Language Modelling: Advanced Smoothing Models Week 3: Lecture 1 9 / 18
Kneser-Ney Smoothing

Intuition
Shannon game: I can’t see without my reading ...: glasses/Francisco?
“Francisco” more common that “glasses”

EL
But “Francisco” mostly follows “San”

P(w): “How likely is w?”

PT
Instead, Pcontinuation (w): “How likely is w to appear as a novel continuation?”
For each word, count the number of bigram types it completes
N
Every bigram type was a novel continuation the first time it was seen

Pcontinuation (w) µ |{wi 1 : c(wi 1 , w) > 0}|

Pawan Goyal (IIT Kharagpur) Language Modelling: Advanced Smoothing Models Week 3: Lecture 1 10 / 18
Kneser-Ney Smoothing

How many times does w appear as a novel continuation?

Pcontinuation (w) µ |{wi 1 : c(wi 1 , w) > 0}|

EL
Normalized by the total number of word bigram types

|{(wj 1 , wj ) : c(wj 1 , wj ) > 0}|

PT
Pcontinuation (w) =
|{wi 1 : c(wi 1 , w) > 0}|
|{(wj 1 , wj ) : c(wj 1 , wj ) > 0}|
N
A frequent word (Francisco) occurring in only one context (San) will have a low
continuation probability

Pawan Goyal (IIT Kharagpur) Language Modelling: Advanced Smoothing Models Week 3: Lecture 1 11 / 18
Kneser-Ney Smoothing

max(c(wi 1 , wi ) d, 0)

EL
PKN (wi |wi 1 ) = + l(wi 1 )Pcontinuation (wi )
c(wi 1 )
l is a normalizing constant

PT
l(wi 1 ) =
d
c(wi 1 )
|{w : c(wi 1 , w) > 0}|
N

Pawan Goyal (IIT Kharagpur) Language Modelling: Advanced Smoothing Models Week 3: Lecture 1 12 / 18
Model Combination

As N increases

EL
The power (expressiveness) of an N-gram model increases
But the ability to estimate accurate parameters from sparse data

PT
decreases (i.e. the smoothing problem gets worse).

A general approach is to combine the results of multiple N-gram models.

Pawan Goyal (IIT Kharagpur) Language Modelling: Advanced Smoothing Models Week 3: Lecture 1 13 / 18
Backoff and Interpolation

It might help to use less context

when you haven’t learned much about larger contexts

EL
Backoff
use trigram if you have good evidence

PT
otherwise bigram, otherwise unigram
N
Interpolation
mix unigram, bigram, trigram

Pawan Goyal (IIT Kharagpur) Language Modelling: Advanced Smoothing Models Week 3: Lecture 1 14 / 18
Backoff

Estimating P(wi |wi 2 wi 1 )

If we do not have counts to compute P(wi |wi 2 wi 1 ) estimate this using
the bigram probbaility P(wi |wi 1 )
If we do not have counts to compute P(wi |wi 1 ), estimate this using the

EL
unigram probability P(wi )

Pbo (wi |wi 2 wi 1 )=

PT
P̂(wi |wi 2 wi 1 ), if c(wi 2 wi 1 wi ) > 0
l(wi 1 wi 2 )Pbo (wi |wi 1 ), otherwise
N
where Pbo (wi |wi 1 ) =
P̂(wi |wi 1 ) if c(wi 1 wi ) > 0
l(wn 1 )P̂(wn ), otherwise

Pawan Goyal (IIT Kharagpur) Language Modelling: Advanced Smoothing Models Week 3: Lecture 1 15 / 18
Example Problem

In a corpus, suppose there are 4 words, a, b, c, and d. You are provided with
the following counts.

EL
n-gram count n-gram count n-gram count
aba 4 ba 5 a 8
abb 0 bb 3 b 9
abc
abd
0
0 PT bc
bd
0
0
c
d
8
7
N
Use the recursive definition of backoff smoothing to obtain the probability
distribution, Pbackoff (wn |wn 2 wn 1 ), where wn 1 = b and wn 2 = a.
Also assume that P̂(x) = P(x) 1/8.

Pawan Goyal (IIT Kharagpur) Language Modelling: Advanced Smoothing Models Week 3: Lecture 1 16 / 18
Linear Interpolation

Simple Interpolation

P̃(wn |wn 1 wn 2 ) = l1 P(wn |wn 1 wn 2 ) + l2 P(wn |wn 1 ) + l3 P(wn )

EL
Â li = 1
i

Lambdas conditional on context PT

N
P̃(wn |wn 1 wn 2 ) = l1 (wn 2 , wn 1 )P(wn |wn 1 wn 2 )
+l2 (wn 2 , wn 1 )P(wn |wn 1 ) + l3 (wn 2 , wn 1 )P(wn )

Pawan Goyal (IIT Kharagpur) Language Modelling: Advanced Smoothing Models Week 3: Lecture 1 17 / 18
Setting the lambda values

EL
Use a held-out corpus
Choose ls to maximize the probability of held-out data:
Find the N-gram probabilities on the training data

PT
Search for ls that give the largest probability to held-out data
N

Pawan Goyal (IIT Kharagpur) Language Modelling: Advanced Smoothing Models Week 3: Lecture 1 18 / 18
Computational Morphology

EL
Pawan Goyal

PT CSE, IITKGP

Week 3: Lecture 2
N

Pawan Goyal (IIT Kharagpur) Computational Morphology Week 3: Lecture 2 1 / 19

Morphology

Morphology studies the internal structure of words, how words are built up
from smaller meaningful units called morphemes

dogs

EL
2 morphemes, ‘dog’ and ‘s’
‘s’ is a plural marker on nouns

unladylike
3 morphemes
PT
N
un- ‘not’
lady ‘well-behaved woman’
-like ‘having the characteristic of’

Pawan Goyal (IIT Kharagpur) Computational Morphology Week 3: Lecture 2 2 / 19

Allomorphs

EL
Variants of the same morpheme, but cannot be replaced by one another

Example

PT
opposite: un-happy, in-comprehensible, im-possible, ir-rational
N

Pawan Goyal (IIT Kharagpur) Computational Morphology Week 3: Lecture 2 3 / 19

Bound and Free Morphemes

Bound

EL
Cannot appear as a word by itself.
-s (dog-s), -ly (quick-ly), -ed (walk-ed)

Free
PT
Can appear as a word by itself; often can combine with other morphemes too.
house (house-s), walk (walk-ed), of, the, or
N

Pawan Goyal (IIT Kharagpur) Computational Morphology Week 3: Lecture 2 4 / 19

Stems and Affixes

EL
Stems (roots): The core meaning bearing units
Affixes: Bits and pieces adhering to stems to change their meanings and
grammatical functions
PT
N

Pawan Goyal (IIT Kharagpur) Computational Morphology Week 3: Lecture 2 5 / 19

Stems and Affixes

EL
Stems (roots): The core meaning bearing units
Affixes: Bits and pieces adhering to stems to change their meanings and
grammatical functions
PT
Mostly, stems are free morphemes and affixes are bound morphemes
N

Pawan Goyal (IIT Kharagpur) Computational Morphology Week 3: Lecture 2 5 / 19

Types of affixes

Prefix: un-, anti-, etc (a-, ati-, pra- etc.)

un-happy, pre-existing

EL
Suffix: -ity, -ation, etc (-taa, -ke, -ka etc.)
talk-ing, quick-ly
Infix: ‘n’ in ‘vindati’ (he knows), as contrasted with vid (to know).

PT
Philippines: basa ‘read’ ! b-um-asa ‘read’
English: abso-bloody-lutely (emphasis)
N
Circumfixes - precedes and follows the stem
Dutch: berg ‘mountain’, ge-berg-te ‘mountains’

Pawan Goyal (IIT Kharagpur) Computational Morphology Week 3: Lecture 2 6 / 19

Content and functional morphemes

Content morphemes

EL
Carry some semantic content
car, -able, un-

Functional morphemes
Provide grammatical information
-s (plural), -s (3rd singular)
PT
N

Pawan Goyal (IIT Kharagpur) Computational Morphology Week 3: Lecture 2 7 / 19

Inflectional and Derivational Morphology

Two different kind of relationship among words

EL
PT
N

Pawan Goyal (IIT Kharagpur) Computational Morphology Week 3: Lecture 2 8 / 19

Inflectional and Derivational Morphology

Two different kind of relationship among words

Inflectional morphology

EL
Grammatical: number, tense, case, gender
Creates new forms of the same word: bring, brought, brings, bringing

Derivational morphology
PT
Creates new words by changing part-of-speech: logic, logical, illogical,
illogicality, logician
N
Fairly systematic but some derivations missing: sincere - sincerity, scarce -
scarcity, curious - curiosity, fierce - fiercity?

Pawan Goyal (IIT Kharagpur) Computational Morphology Week 3: Lecture 2 8 / 19

Morphological processes

Concatenation

EL
Adding continuous affixes - the most common process:
hope+less, un+happy, anti+capital+ist+s

PT
Often, there are phonological/graphemic changes on morpheme boundaries:
book + s [s], shoe + s [z]
N
happy +er ! happier

Pawan Goyal (IIT Kharagpur) Computational Morphology Week 3: Lecture 2 9 / 19

Morphological processes

Reduplication: part of the word or the entire word is doubled

EL
Nama: ‘go’ (look), ‘go-go’ (examine with attention)
Tagalog: ‘basa’ (read), ‘ba-basa’(will read)

PT
Sanskrit: ‘pac’ (cook), ‘papāca’ (perfect form, cooked)
Phrasal reduplication (Telugu): pillavād.u nad.ustū nad.ustū pad.i pōyād.u
(The child fell down while walking)
N

Pawan Goyal (IIT Kharagpur) Computational Morphology Week 3: Lecture 2 10 / 19

Morphological processes

Suppletion

EL
‘irregular’ relation between the words
go - went, good - better

Morpheme internal changes

The word changes internally PT
sing - sang - sung, man - men, goose - geese
N

Pawan Goyal (IIT Kharagpur) Computational Morphology Week 3: Lecture 2 11 / 19

Word Formation

Compounding
Words formed by combining two or more words
Example in English:

EL
Adj + Adj ! Adj: bitter-sweet
N + N ! N: rain-bow
V + N ! V: pick-pocket
P + V ! V: over-do
PT
N
Particular to languages
room-temperature: Hindi translation?

Pawan Goyal (IIT Kharagpur) Computational Morphology Week 3: Lecture 2 12 / 19

Word Formation

Compounding
Words formed by combining two or more words
Example in English:

EL
Adj + Adj ! Adj: bitter-sweet
N + N ! N: rain-bow
V + N ! V: pick-pocket
P + V ! V: over-do
PT
N
Particular to languages
room-temperature: Hindi translation?

Pawan Goyal (IIT Kharagpur) Computational Morphology Week 3: Lecture 2 12 / 19

Word Formation

Acronyms
laser: Light Amplification by Simulated Emission of Radiation

EL
Blending
Parts of two different words are combined
breakfast + lunch ! brunch
smoke + fog ! smog
motor + hotel ! motel
PT
N
Clipping
Longer words are shortened

Pawan Goyal (IIT Kharagpur) Computational Morphology Week 3: Lecture 2 13 / 19

Word Formation

Acronyms
laser: Light Amplification by Simulated Emission of Radiation

EL
Blending
Parts of two different words are combined
breakfast + lunch ! brunch
smoke + fog ! smog
motor + hotel ! motel
PT
N
Clipping
Longer words are shortened
doctor, laboratory, advertisement, dormitory, examination, bicycle, refrigerator

Pawan Goyal (IIT Kharagpur) Computational Morphology Week 3: Lecture 2 13 / 19

Processing morphology

Lemmatization: word ! lemma

saw ! {see, saw}

EL
Morphological analysis : word ! setOf(lemma +tag)
saw ! { <see, verb.past>, < saw, noun.sg>}

PT
Tagging: word ! tag, considers context
Peter saw her ! { <see, verb.past>}
Morpheme segmentation: de-nation-al-iz-ation
N
Generation: see + verb.past ! saw

Pawan Goyal (IIT Kharagpur) Computational Morphology Week 3: Lecture 2 14 / 19

What are the applications?

EL
Text-to-speech synthesis:
lead: verb or noun?
read: present or past?

PT
Search and information retrieval
Machine translation, grammar correction
N

Pawan Goyal (IIT Kharagpur) Computational Morphology Week 3: Lecture 2 15 / 19

Morphological Analysis

EL
Goal
PT
N
To take input forms like those in the first column and produce output forms like
those in the second column.
Output contains stem and additional information; +N for noun, +SG for
singular, +PL for plural, +V for verb etc.

Pawan Goyal (IIT Kharagpur) Computational Morphology Week 3: Lecture 2 16 / 19

Issues involved

boy ! boys
fly ! flys ! flies (y! i rule)

EL
Toiling ! toil
Duckling ! duckl?

Getter ! get + er
PT
N
Doer ! do + er
Beer ! be + er?

Pawan Goyal (IIT Kharagpur) Computational Morphology Week 3: Lecture 2 17 / 19

Knowledge Required
Knowledge of stems or roots
Duck is a possible root, not duckl.
We need a dictionary (lexicon)

Morphotactics

EL
Which class of morphemes follow other classes of morphemes inside the
word?
Ex: plural morpheme follows the noun

PT
Only some endings go on some words
N
Do+er: ok
Be+er: not so

Pawan Goyal (IIT Kharagpur) Computational Morphology Week 3: Lecture 2 18 / 19

Knowledge Required
Knowledge of stems or roots
Duck is a possible root, not duckl.
We need a dictionary (lexicon)

Morphotactics

EL
Which class of morphemes follow other classes of morphemes inside the
word?
Ex: plural morpheme follows the noun

PT
Only some endings go on some words
N
Do+er: ok
Be+er: not so

Spelling change rules

Adjust the surface form using spelling change rules

Get + er ! getter
Pawan Goyal (IIT Kharagpur) Computational Morphology Week 3: Lecture 2 18 / 19
Why can’t this be put in a big lexicon?

English: just 317,477 forms from 90,196 lexical entries, a ratio of 3.5:1

EL
Sanskrit: 11 million forms from a lexicon of 170,000 entries, a ratio of
64.7:1

PT
New forms can be created, compounding etc.

One of the most common methods is finite-state-machines

Pawan Goyal (IIT Kharagpur) Computational Morphology Week 3: Lecture 2 19 / 19

Finite-state methods for morphology

EL
Pawan Goyal

PT CSE, IITKGP

Week 3: Lecture 3
N

Pawan Goyal (IIT Kharagpur) Finite-state methods for morphology Week 3: Lecture 3 1 / 18
Finite State Automaton (FSA)

EL
What is FSA?
A kind of directed graph
PT
Nodes are called states, edges are labeled with symbols (possibly empty
N
✏)
Start state and accepting states
Recognizes regular languages, i.e., languages specified by regular
expressions

Pawan Goyal (IIT Kharagpur) Finite-state methods for morphology Week 3: Lecture 3 2 / 18
FSA for nominal inflection in English

EL
PT
N

Pawan Goyal (IIT Kharagpur) Finite-state methods for morphology Week 3: Lecture 3 3 / 18
FSA for English Adjectives

EL
PT
N
Word modeled
happy, happier, happiest, real, unreal, cool, coolly, clear, clearly, unclear,
unclearly, ...

Pawan Goyal (IIT Kharagpur) Finite-state methods for morphology Week 3: Lecture 3 4 / 18
Morphotactics

EL
The last two examples model some parts of the English morphotactics
But what about the information about regular and irregular roots?

Lexicon
PT
Can we include the lexicon in the FSA?
N

Pawan Goyal (IIT Kharagpur) Finite-state methods for morphology Week 3: Lecture 3 5 / 18
FSA for nominal inflection in English

EL
PT
N

Pawan Goyal (IIT Kharagpur) Finite-state methods for morphology Week 3: Lecture 3 6 / 18
After adding a mini-lexicon

EL
PT
N

Pawan Goyal (IIT Kharagpur) Finite-state methods for morphology Week 3: Lecture 3 7 / 18
Some properties of FSAs: Elegance

Recognizing problem can be solved in linear time (independent of the

size of the automaton)

EL
There is an algorithm to transform each automaton into a unique
equivalent automaton with the least number of states

PT
An FSA is deterministic iff it has no empty (✏ ) transition and for each state
and each symbol, there is at most one applicable transition
Every non-deterministic automaton can be transformed into a
N
deterministic one

Pawan Goyal (IIT Kharagpur) Finite-state methods for morphology Week 3: Lecture 3 8 / 18
But ...

FSAs are language recognizers/generators.

EL
We need transducers to build Morphological Analyzers

Finite State Transducers

PT
Translate strings from one language to strings in another language
Like FSA, but each edge is associated with two strings
N

Pawan Goyal (IIT Kharagpur) Finite-state methods for morphology Week 3: Lecture 3 9 / 18
An example FST

EL
PT
N

Pawan Goyal (IIT Kharagpur) Finite-state methods for morphology Week 3: Lecture 3 10 / 18
Two-level morphology

Given the input cats, we would like to output cat+N+PL, talling us that cat is a
plural noun.

EL
We do this via a version of two-level morphology, a correspondence between
a lexical level (morphemes and features) to a surface level (actual spelling).

PT
N

Pawan Goyal (IIT Kharagpur) Finite-state methods for morphology Week 3: Lecture 3 11 / 18
Intermediate tape for Spelling change rules

EL
PT
N

Pawan Goyal (IIT Kharagpur) Finite-state methods for morphology Week 3: Lecture 3 12 / 18
English Nominal Inflection FST

EL
PT
N

Pawan Goyal (IIT Kharagpur) Finite-state methods for morphology Week 3: Lecture 3 13 / 18
Spelling Handling

A spelling change rule would insert an e only in the appropriate environment.

EL
PT
N

Pawan Goyal (IIT Kharagpur) Finite-state methods for morphology Week 3: Lecture 3 14 / 18
Rule Handling

EL
Rule Notation
a ! b/c_d : “rewrite a as b when it occurs between c and d.”

PT
N

Pawan Goyal (IIT Kharagpur) Finite-state methods for morphology Week 3: Lecture 3 15 / 18
Morphological Analysis: Approaches

Two different ways to address phonological/graphemic variations

EL
Linguistic approach: A phonological component accompanying the simple
concatenative process of attaching an ending

PT
Engineering approach: Phonological changes and irregularities are
factored into endings and a higher number of paradigms
N

Pawan Goyal (IIT Kharagpur) Finite-state methods for morphology Week 3: Lecture 3 16 / 18
Different Approaches: Example from Czech

EL
PT
N

Pawan Goyal (IIT Kharagpur) Finite-state methods for morphology Week 3: Lecture 3 17 / 18
Tools Available

EL
AT&T FSM Library and Lextools
http://www2.research.att.com/~fsmtools/fsm/
OpenFST (Google and NYU)
http://www.openfst.org/
PT
N

Pawan Goyal (IIT Kharagpur) Finite-state methods for morphology Week 3: Lecture 3 18 / 18
Introduction to POS Tagging

EL
Pawan Goyal

PT CSE, IITKGP

Week 3: Lecture 4
N

Pawan Goyal (IIT Kharagpur) Introduction to POS Tagging Week 3: Lecture 4 1 / 18

Part-of-Speech (POS) tagging

Task
Given a text of English, identify the parts of speech of each word

EL
PT
N

Pawan Goyal (IIT Kharagpur) Introduction to POS Tagging Week 3: Lecture 4 2 / 18

Parts of Speech: How many?

Open class words (content words)

nouns, verbs, adjectives, adverbs

EL
mostly content-bearing: they refer to objects, actions, and features in the
world
open class, since new words are added all the time

Closed class words PT

pronouns, determiners, prepositions, connectives, ...
N
there is a limited number of these
mostly functional: to tie the concepts of a sentence together

Pawan Goyal (IIT Kharagpur) Introduction to POS Tagging Week 3: Lecture 4 3 / 18

POS examples

EL
PT
N

Pawan Goyal (IIT Kharagpur) Introduction to POS Tagging Week 3: Lecture 4 4 / 18

POS tagging: Choosing a tagset

To do POS tagging, a standard set needs to be chosen

Could pick very coarse tagsets

EL
N, V, Adj, Adv
More commonly used set is finer grained, “UPenn TreeBank tagset”, 45
tags

A Nice Tutorial on POS tags

PT
N
https://sites.google.com/site/partofspeechhelp/

Pawan Goyal (IIT Kharagpur) Introduction to POS Tagging Week 3: Lecture 4 5 / 18

UPenn TreeBank POS tag set

EL
PT
N

Pawan Goyal (IIT Kharagpur) Introduction to POS Tagging Week 3: Lecture 4 6 / 18

Using the UPenn tagset

Example Sentence

EL
The grand jury commented on a number of other topics.

POS tagged sentence

PT
The/DT grand/JJ jury/NN commmented/VBD on/IN a/DT number/NN of/IN
other/JJ topics/NNS ./.
N

Pawan Goyal (IIT Kharagpur) Introduction to POS Tagging Week 3: Lecture 4 7 / 18

Why is POS tagging hard?

Words often have more than one POS: back

The back door: back/JJ

EL
On my back: back/NN
Win the voters back: back/RB

PT
Promised to back the bill: back/VB

POS tagging problem

N
To determine the POS tag for a particular instance of a word

Pawan Goyal (IIT Kharagpur) Introduction to POS Tagging Week 3: Lecture 4 8 / 18

Ambiguous word types in the Brown Corpus
Ambiguity in the Brown corpus
40% of word tokens are ambiguous
12% of word types are ambiguous
Breakdown of ambiguous word types:

EL
PT
N

Pawan Goyal (IIT Kharagpur) Introduction to POS Tagging Week 3: Lecture 4 9 / 18

How bad is the ambiguity problem?

One tag is usually more likely than the others.

EL
PT
N

Pawan Goyal (IIT Kharagpur) Introduction to POS Tagging Week 3: Lecture 4 10 / 18

How bad is the ambiguity problem?

One tag is usually more likely than the others.

EL
In the Brown corpus, race is a noun 98% of the time, and a verb 2% of the
time
A tagger for English that simply chooses the most likely tag for each word

PT
can achieve good performance
Any new approach should be compared against the unigram baseline
(assigning each token to its most likely tag)
N

Pawan Goyal (IIT Kharagpur) Introduction to POS Tagging Week 3: Lecture 4 10 / 18

Deciding the correct POS

Can be difficult even for people

EL
Mrs./NNP Shaefer/NNP never/RB got/VBD around/_ to/TO joining/VBG.
All/DT we/PRP gotta/VBN do/VB is/VBZ go/VB around/_ the/DT
corner/NN.
PT
Chateau/NNP Petrus/NNP costs/VBZ around/_ 2500/CD.
N

Pawan Goyal (IIT Kharagpur) Introduction to POS Tagging Week 3: Lecture 4 11 / 18

Deciding the correct POS

Can be difficult even for people

EL
Mrs./NNP Shaefer/NNP never/RB got/VBD around/RP to/TO joining/VBG.
All/DT we/PRP gotta/VBN do/VB is/VBZ go/VB around/IN the/DT
corner/NN.
PT
Chateau/NNP Petrus/NNP costs/VBZ around/RB 2500/CD.
N

Pawan Goyal (IIT Kharagpur) Introduction to POS Tagging Week 3: Lecture 4 12 / 18

Relevant knowledge for POS tagging

The word itself

Some words may only be nouns, e.g. arrow

EL
Some words are ambiguous, e.g. like, flies
Probabilities may help, if one tag is more likely than another

Local context
PT
Two determiners rarely follow each other
N
Two base form verbs rarely follow each other
Determiner is almost always followed by adjective or noun

Pawan Goyal (IIT Kharagpur) Introduction to POS Tagging Week 3: Lecture 4 13 / 18

POS tagging: Two approaches

Rule-based Approach
Assign each word in the input a list of potential POS tags

EL
Then winnow down this list to a single tag using hand-written rules

Statistical tagging
PT
Get a training corpus of tagged text, learn the transformation rules from
the most frequent tags (TBL tagger)
N
Probabilistic: Find the most likely sequence of tags T for a sequence of
words W

Pawan Goyal (IIT Kharagpur) Introduction to POS Tagging Week 3: Lecture 4 14 / 18

TBL Tagger

Label the training set with most frequent tags

EL
The can was rusted.
The/DT can/MD was/VBD rusted/VBD.

MD !NN: DT_
VBD!VBN: VBD_
PT
Add transformation rules to reduce training mistakes
N

Pawan Goyal (IIT Kharagpur) Introduction to POS Tagging Week 3: Lecture 4 15 / 18

Probabilistic Tagging: Two different families of models

Problem at hand
We have some data {(d, c)} of paired observations d and hidden classes c.

Different instances of d and c

EL
Part-of-Speech Tagging: words are observed and tags are hidden.
Text Classification: sentences/documents are observed and the
category is hidden.
PT
Categories can be positive/negative for sentiments ..
sports/politics/business for documents ...
N
What gives rise to the two families?
Whether they generate the observed data from hidden stuff or the hidden
structure given the data?

Pawan Goyal (IIT Kharagpur) Introduction to POS Tagging Week 3: Lecture 4 16 / 18

Generative vs. Conditional Models

Generative (Joint) Models

Generate the observed data from hidden stuff, i.e. put a probability over the
observations given the class: P(d, c) in terms of P(d|c)

EL
e.g. Naïve Bayes’ classifiers, Hidden Markov Models etc.

Discriminative (Conditional) Models

data: P(c|d) PT
Take the data as given, and put a probability over hidden structure given the

e.g. Logistic regression, maximum entropy models, conditional random fields

N
SVMs, perceptron, etc. are discriminative classifiers but not directly
probabilistic

Pawan Goyal (IIT Kharagpur) Introduction to POS Tagging Week 3: Lecture 4 17 / 18

Generative vs. Discriminative Models

EL
PT
N

Pawan Goyal (IIT Kharagpur) Introduction to POS Tagging Week 3: Lecture 4 18 / 18

Generative vs. Discriminative Models

EL
PT
N
Joint vs. conditional likelihood
A joint model gives probabilities P(d, c) and tries to maximize this joint
likelihood.
A conditional model gives probabilities P(c|d), taking the data as given
and modeling only the conditional probability of the class.
Pawan Goyal (IIT Kharagpur) Introduction to POS Tagging Week 3: Lecture 4 18 / 18
Hidden Markov Models for POS Tagging

EL
Pawan Goyal

PT CSE, IITKGP

Week 3: Lecture 5
N

Pawan Goyal (IIT Kharagpur) Hidden Markov Models for POS Tagging Week 3: Lecture 5 1 / 17
Probabilistic Tagging

W = w1 . . . wn - words in the corpus (observed)

T = t1 . . . tn - the corresponding tags (unknown)

EL
Tagging: Probabilistic View (Generative Model)
Find

T̂ = argmaxT P(T|W)

= argmaxT
PT
P(W|T)P(T)
P(W)
N
= argmaxT P(W|T)P(T)
Y
= argmaxT P(wi |w1 . . . wi 1 , t1 . . . ti )P(ti |t1 . . . ti 1 )
i

Pawan Goyal (IIT Kharagpur) Hidden Markov Models for POS Tagging Week 3: Lecture 5 2 / 17
Further simplifications

Y
T̂ = argmaxT P(wi |w1 . . . wi 1 , t1 . . . ti )P(ti |t1 . . . ti 1 )
i
The probability of a word appearing depends only on its own POS tag

Pawan Goyal (IIT Kharagpur) Hidden Markov Models for POS Tagging Week 3: Lecture 5 3 / 17
Computing the probability values

Tag Transition probabilities p(ti |ti 1 )

C(ti 1 , ti )
P(ti |ti 1 ) =
C(ti 1 )

EL
C(DT, NN) 56, 509
P(NN|DT) = = = 0.49
C(DT) 116, 454

PT
Word Likelihood probabilities p(wi |ti )
C(ti , wi )
N
P(wi |ti ) =
C(ti )
C(VBZ, is) 10, 073
P(is|VBZ) = = = 0.47
C(VBZ) 21, 627

Pawan Goyal (IIT Kharagpur) Hidden Markov Models for POS Tagging Week 3: Lecture 5 4 / 17
Disambiguating “race”

EL
PT
N

Pawan Goyal (IIT Kharagpur) Hidden Markov Models for POS Tagging Week 3: Lecture 5 5 / 17
Disambiguating “race”

Difference in probability due to

P(VB|TO) vs. P(NN|TO)

EL
P(race|VB) vs. P(race|NN)
P(NR|VB) vs. P(NR|NN)

Pawan Goyal (IIT Kharagpur) Hidden Markov Models for POS Tagging Week 3: Lecture 5 6 / 17
What is this model?

EL
PT
N
This is a Hidden Markov Model

Pawan Goyal (IIT Kharagpur) Hidden Markov Models for POS Tagging Week 3: Lecture 5 7 / 17
Hidden Markov Models

Tag Transition probabilities p(ti |ti 1 )

EL
Word Likelihood probabilities (emissions) p(wi |ti )
What we have described with these probabilities is a hidden markov
model.
PT
Let us quickly introduce the Markov Chain, or observable Markov Model.
N

Pawan Goyal (IIT Kharagpur) Hidden Markov Models for POS Tagging Week 3: Lecture 5 8 / 17
Markov Chain = First-order Markov Model

Weather example
Three types of weather: sunny, rainy, foggy

EL
qn : variable denoting the weather on the nth day
We want to find the following conditional probabilities:

First-order Markov Assumption

PT P(qn |qn 1 , qn 2 , . . . , q1 )
N
P(qn |qn 1 , qn 2 , . . . , q1 ) = P(qn |qn 1 )

Pawan Goyal (IIT Kharagpur) Hidden Markov Models for POS Tagging Week 3: Lecture 5 9 / 17
Markov Chain Transition Table

EL
PT
N

Pawan Goyal (IIT Kharagpur) Hidden Markov Models for POS Tagging Week 3: Lecture 5 10 / 17
Using Markov Chain

Given that today the weather is sunny, what is the probability that
tomorrow is sunny and day after is rainy?

EL
P(q2 = sunny, q3 = rainy|q1 = sunny)

= P(q3 = rainy|q2 = sunny, q1 = sunny) ⇥ P(q2 = sunny|q1 = sunny)

PT
= P(q3 = rainy|q2 = sunny) ⇥ P(q2 = sunny|q1 = sunny)
= 0.05 ⇥ 0.8
N
= 0.04

Pawan Goyal (IIT Kharagpur) Hidden Markov Models for POS Tagging Week 3: Lecture 5 11 / 17
Hidden Markov Model

For Markov chains, the output symbols are the same as the states
‘sunny’ weather is both observable and state

EL
But in POS tagging
The output symbols are words

PT
But the hidden states are POS tags
A Hidden Markov Model is an extension of a Markov chain in which the
output symbols are not the same as the states
N
We don’t know which state we are in

Pawan Goyal (IIT Kharagpur) Hidden Markov Models for POS Tagging Week 3: Lecture 5 12 / 17
Hidden Markov Models (HMMs)

Elements of an HMM model

EL
A set of states (here: the tags)
An output alphabet (here: words)

PT
Initial state (here: beginning of sentence)
State transition probabilities (here p(tn |tn 1 ))
Symbol emission probabilities (here p(wi |ti ))
N

Pawan Goyal (IIT Kharagpur) Hidden Markov Models for POS Tagging Week 3: Lecture 5 13 / 17
Graphical Representation
When tagging a sentence, we are walking through the state graph:

EL
PT
N

Edges are labeled with the state transition probabilities: p(tn |tn 1 )
Pawan Goyal (IIT Kharagpur) Hidden Markov Models for POS Tagging Week 3: Lecture 5 14 / 17
Graphical Representation
At each state we emit a word: P(wn |tn )

EL
PT
N

Pawan Goyal (IIT Kharagpur) Hidden Markov Models for POS Tagging Week 3: Lecture 5 15 / 17
Walking through the states: best path

EL
PT
N

Pawan Goyal (IIT Kharagpur) Hidden Markov Models for POS Tagging Week 3: Lecture 5 16 / 17
Walking through the states: best path

EL
PT
N

Pawan Goyal (IIT Kharagpur) Hidden Markov Models for POS Tagging Week 3: Lecture 5 17 / 17

NLP Week-3
No ratings yet
NLP Week-3
214 pages
Stories Activities: en Ha NC Ed E-B Oo K
100% (4)
Stories Activities: en Ha NC Ed E-B Oo K
118 pages
Grade 10 EFAL Term 3 Revision Booklet
No ratings yet
Grade 10 EFAL Term 3 Revision Booklet
30 pages
100 Common Irregular Verbs
No ratings yet
100 Common Irregular Verbs
10 pages
Đáp án đề kiểm tra học kì II tiếng Anh 6 Global Sucess Mid-test - semester2
No ratings yet
Đáp án đề kiểm tra học kì II tiếng Anh 6 Global Sucess Mid-test - semester2
2 pages
NLP CH 2
No ratings yet
NLP CH 2
59 pages
NLP Units Iv V
No ratings yet
NLP Units Iv V
30 pages
NLP Unit-II
No ratings yet
NLP Unit-II
20 pages
Kami Export - Assignment - 2 - 20240709
No ratings yet
Kami Export - Assignment - 2 - 20240709
13 pages
N-Grams and Smoothing: Course Based On Jurafsky and Martin (2009, Chap.4)
No ratings yet
N-Grams and Smoothing: Course Based On Jurafsky and Martin (2009, Chap.4)
36 pages
T Fidnen: Ielfit
No ratings yet
T Fidnen: Ielfit
2 pages
02 Neural Lms
No ratings yet
02 Neural Lms
58 pages
Notes of NLP - Unit-2
No ratings yet
Notes of NLP - Unit-2
23 pages
Natural Language Processing Overview
No ratings yet
Natural Language Processing Overview
26 pages
NLP Smoothing Techniques Tutorial
No ratings yet
NLP Smoothing Techniques Tutorial
33 pages
Video v3
No ratings yet
Video v3
34 pages
Common Sexting Codes
0% (1)
Common Sexting Codes
7 pages
Langmodel2 PDF
No ratings yet
Langmodel2 PDF
85 pages
Language Models
No ratings yet
Language Models
59 pages
Smoothing Techniques
No ratings yet
Smoothing Techniques
28 pages
English Test for Grade 6 - 2023-2024
No ratings yet
English Test for Grade 6 - 2023-2024
3 pages
L8Statistical Inference
No ratings yet
L8Statistical Inference
38 pages
NLP - N-Gram Language Model
No ratings yet
NLP - N-Gram Language Model
22 pages
April 22 Part 2achine Translation
No ratings yet
April 22 Part 2achine Translation
36 pages
2.1 Chap NLP Ngrams
No ratings yet
2.1 Chap NLP Ngrams
37 pages
CS 904: Natural Language Processing Statistical Inference: N-Grams
No ratings yet
CS 904: Natural Language Processing Statistical Inference: N-Grams
30 pages
cs217 Prolog SVM Week8 3mar25
No ratings yet
cs217 Prolog SVM Week8 3mar25
53 pages
Ngrams
No ratings yet
Ngrams
22 pages
Language Modelling
No ratings yet
Language Modelling
48 pages
Module 2
No ratings yet
Module 2
98 pages
NLP Unit-V
No ratings yet
NLP Unit-V
30 pages
CV With Translation Skills
No ratings yet
CV With Translation Skills
2 pages
Ngram
No ratings yet
Ngram
41 pages
Situations 1
No ratings yet
Situations 1
16 pages
Grammar Guide: Subject-Verb Agreement
No ratings yet
Grammar Guide: Subject-Verb Agreement
24 pages
NLP L IA2
No ratings yet
NLP L IA2
23 pages
Ngrams Final
No ratings yet
Ngrams Final
28 pages
Statistical Inference
No ratings yet
Statistical Inference
38 pages
Module 2
No ratings yet
Module 2
26 pages
Xu00b Icslp
No ratings yet
Xu00b Icslp
4 pages
Unit 2
No ratings yet
Unit 2
7 pages
Chapter-02 2
No ratings yet
Chapter-02 2
42 pages
Language Modeling
No ratings yet
Language Modeling
43 pages
Unit 11: Traveling Around Viet Nam Period: Getting Started - Listen and Read
No ratings yet
Unit 11: Traveling Around Viet Nam Period: Getting Started - Listen and Read
6 pages
5) Lecture Feb11&13&17&18
No ratings yet
5) Lecture Feb11&13&17&18
21 pages
04 - N-Gram Language Models
No ratings yet
04 - N-Gram Language Models
41 pages
Ai Lecture22
No ratings yet
Ai Lecture22
32 pages
English AS A2 Y11 PPT 1
No ratings yet
English AS A2 Y11 PPT 1
17 pages
NLP Unit-V
No ratings yet
NLP Unit-V
30 pages
NLP Unit-4
No ratings yet
NLP Unit-4
48 pages
NLP Techniques for Word Prediction
No ratings yet
NLP Techniques for Word Prediction
77 pages
NLP Lecture 8 Week 4
No ratings yet
NLP Lecture 8 Week 4
10 pages
Cultura y Traducción Parcial I
No ratings yet
Cultura y Traducción Parcial I
21 pages
BÀI TIẾNG ANH
No ratings yet
BÀI TIẾNG ANH
4 pages
NLP Lec 05
No ratings yet
NLP Lec 05
18 pages
Rancangan Pelaksanaan Aktiviti
No ratings yet
Rancangan Pelaksanaan Aktiviti
3 pages
Unit 2b
No ratings yet
Unit 2b
22 pages
An Empirical Study of Smoothing Techniques For Language Modeling
No ratings yet
An Empirical Study of Smoothing Techniques For Language Modeling
9 pages
CS671A/CS671: Introduction To Natural Language Processing Mid-Semester Exam
No ratings yet
CS671A/CS671: Introduction To Natural Language Processing Mid-Semester Exam
7 pages
Lecture13 LM YirenWang
No ratings yet
Lecture13 LM YirenWang
8 pages
Ngrams
100% (1)
Ngrams
22 pages
Chapter 4 Part 2
No ratings yet
Chapter 4 Part 2
15 pages
Grade 8 English Lesson Plan
100% (1)
Grade 8 English Lesson Plan
6 pages
Language Model Evaluation Methods
No ratings yet
Language Model Evaluation Methods
21 pages
3-Lecture Three - (Chapter Two-N-gram Language Models)
No ratings yet
3-Lecture Three - (Chapter Two-N-gram Language Models)
28 pages
Language Models & N-Gram Analysis
No ratings yet
Language Models & N-Gram Analysis
41 pages
Definitions, Dyads, Triads and Other Points of Connection in Translation and Adaptation Discourse - 24 - 09 - 26 - 19 - 53 - 36
No ratings yet
Definitions, Dyads, Triads and Other Points of Connection in Translation and Adaptation Discourse - 24 - 09 - 26 - 19 - 53 - 36
23 pages
(Ebook PDF) Exploring Writing: Sentences and Paragraphs 3Rd Edition
No ratings yet
(Ebook PDF) Exploring Writing: Sentences and Paragraphs 3Rd Edition
49 pages
Language 1 Student Module
No ratings yet
Language 1 Student Module
248 pages
CS 388: Natural Language Processing:: N-Gram Language Models
No ratings yet
CS 388: Natural Language Processing:: N-Gram Language Models
22 pages
Jurnal - Aksi Nyata - Modul 1 - Arum Oktavia Sari
No ratings yet
Jurnal - Aksi Nyata - Modul 1 - Arum Oktavia Sari
14 pages
Maximum Entropy Markov Models: Alan Ritter CSE 5525
No ratings yet
Maximum Entropy Markov Models: Alan Ritter CSE 5525
70 pages
Books Research Methods
No ratings yet
Books Research Methods
3 pages
Language Models: CS6370: Natural Language Processing
No ratings yet
Language Models: CS6370: Natural Language Processing
35 pages
Machine Learning and Statistical Natural Language Processing
No ratings yet
Machine Learning and Statistical Natural Language Processing
27 pages
Lecture 3 - Language Modelling and RNNs Part 1
No ratings yet
Lecture 3 - Language Modelling and RNNs Part 1
44 pages
English 1st Lesson
No ratings yet
English 1st Lesson
16 pages
Artificial Intelligence: N-Gram Models: Russell & Norvig: Section 22.1
No ratings yet
Artificial Intelligence: N-Gram Models: Russell & Norvig: Section 22.1
32 pages
Corpus (Pl. Corpora) A Computer-Readable Collection Of: Introduction To NLP
No ratings yet
Corpus (Pl. Corpora) A Computer-Readable Collection Of: Introduction To NLP
3 pages
HCI - Unit 5 - 1
No ratings yet
HCI - Unit 5 - 1
17 pages
MS2 Exam2
No ratings yet
MS2 Exam2
2 pages
Critical Discourse Studies Overview
No ratings yet
Critical Discourse Studies Overview
7 pages
Jada and Will Smith Article Review by JForrest English
No ratings yet
Jada and Will Smith Article Review by JForrest English
6 pages
English 7 Lesson Plan: WH-Questions
No ratings yet
English 7 Lesson Plan: WH-Questions
5 pages
Introducing Finger Phonics Flyer
No ratings yet
Introducing Finger Phonics Flyer
2 pages