0% found this document useful (0 votes)

5 views8 pages

Natural Language Processing (Weekly Laboratory Assignments) : Sumit Kumar Banerjee

The document outlines weekly laboratory assignments focused on Natural Language Processing, specifically on language modeling techniques. It includes Python programming tasks for implementing Unigram, Bigram, and Trigram models, as well as spell correction, POS tagging, forward probability, and named entity recognition using Hidden Markov Models. Each section provides code snippets and instructions for completing the assignments.

Uploaded by

aguha1001

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views8 pages

Natural Language Processing (Weekly Laboratory Assignments) : Sumit Kumar Banerjee

Uploaded by

aguha1001

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Natural Language Processing

(Weekly Laboratory Assignments)

Sumit Kumar Banerjee

Contents

1 2

2 3

3 Assignments on Language Modeling 4

3.1 Question 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3.2 Question 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3.3 Question 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3.4 Question 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3.5 Question 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.6 Question 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.7 Question 7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1
Chapter 1

2
Chapter 2

3
Chapter 3

Language Modeling

3.1 Write a Python program to perform Unigram

Language Model with Laplace Smoothing.
from c o l l e c t i o n s i m p o r t Counter
i m p o r t math

d e f unigram_model ( c o r p u s ) :
tokens = corpus . s p l i t ()
c o u n t s = Counter ( t o k e n s )
v = len ( counts )
t o t a l _ t o k e n s = sum ( c o u n t s . v a l u e s ( ) )
d e f prob ( word ) :
r e t u r n ( c o u n t s [ word ] + 1 ) / ( t o t a l _ t o k e n s + v )
r e t u r n prob

c o r p u s = ” t h e c a t s a t on t h e mat t h e c a t a t e f i s h ”
model = unigram_model ( c o r p u s )
p r i n t ( ”P( c a t ) =” , model ( ” c a t ” ) )

3.2 Write a Python program to perform Bigram Model

with Laplace Smoothing
from c o l l e c t i o n s i m p o r t d e f a u l t d i c t
d e f bigram_model ( c o r p u s ) :
tokens = corpus . s p l i t ()
b = defaultdict ( int )
u = defaultdict ( int )
vocab = s e t ( t o k e n s )
f o r i in range ( l e n ( tokens ) −1):
u [ t o k e n s [ i ] ] += 1
b [ ( t o k e n s [ i ] , t o k e n s [ i + 1 ] ) ] += 1
v = l e n ( vocab )

4
CHAPTER 3. ASSIGNMENTS ON LANGUAGE MODELING 5

d e f prob ( w1 , w2 ) :
r e t u r n ( b [ ( w1 , w2 ) ] + 1 ) / ( u [ w1 ] + v )
r e t u r n prob

c o r p u s = ” t h e c a t s a t on t h e mat t h e c a t a t e f i s h ”
model = bigram_model ( c o r p u s )
p r i n t ( ”P( c a t | t h e ) =” , model ( ” t h e ” , ” c a t ” ) )

3.3 Write a Python program to execute Trigram Text

Generator.
from random i m p o r t c h o i c e
from c o l l e c t i o n s i m p o r t d e f a u l t d i c t

d e f t r i g r a m _ g e n e r a t o r ( c o r p u s , s t a r t , l e n g t h = 1 0 ):
o = corpus . s p l i t ()
t = defaultdict ( l i s t )
f o r i in range ( l e n ( o ) −2):
t [ ( o [ i ] , o [ o ] ) ] . append ( t o k e n s [ i +2])
text = l i s t ( start )
f o r _ in range ( length ) :
pair = tuple ( text [ −2:])
next_word = c h o i c e ( t . g e t ( p a i r , [” <END> ” ] ) )
i f next_word == ”<END>”: b r e a k
t e x t . append ( next_word )
return ” ”. join ( text )

corpus = input ()
p r i n t ( trigram_generator ( corpus , (” the ” , ” cat ” ) ) )

3.4 Write a Python program to perform Bigram Spell

Correction.
from d i f f l i b i m p o r t g e t _ c l o s e _ m a t c h e s
d e f s p e l l _ c o r r e c t ( s e n t e n c e , vocab , p ) :
words = s e n t e n c e . s p l i t ( )
o = [ words [ 0 ] ]
f o r i i n r a n g e ( 1 , l e n ( words ) ) :
i f words [ i ] not i n vocab :
c1 = g e t _ c l o s e _ m a t c h e s ( words [ i ] , vocab )
i f c1 :
s = [ ( c , p ( o [ − 1 ] , c ) ) f o r c i n c1 ]
words [ i ] = max ( s , key=lambda x : x [ 1 ] ) [ 0 ]
o . append ( words [ i ] )
CHAPTER 3. ASSIGNMENTS ON LANGUAGE MODELING 6

return ” ”. join (o)

c o r p u s = ” t h e c a t s a t on t h e mat”
vocab = s e t ( c o r p u s . s p l i t ( ) )
model = bigram_model ( c o r p u s )
p r i n t ( s p e l l _ c o r r e c t ( ” t h e c e t s a t on t e h mat ” , vocab , model ) )

3.5 Write a Python program to perform Viterbi POS

Tagging.
N = [ ’ Noun ’ , ’ Verb ’ ]
s t a r t _ p = { ’ Noun ’ : 0 . 6 , ’ Verb ’ : 0 . 4 }
T = { ’ Noun ’ : { ’ Noun ’ : 0 . 1 , ’ Verb ’ : 0 . 9 } ,
’ Verb ’ : { ’ Noun ’ : 0 . 8 , ’ Verb ’ : 0 . 2 } }
E = { ’ Noun ’ : { ’ f i s h ’ : 0 . 5 , ’ eat ’ : 0 . 5 } ,
’ Verb ’ : { ’ f i s h ’ : 0 . 4 , ’ eat ’ : 0 . 6 } }

d e f v i t e r b i ( o , N, s tar t_p , T, E ) :
V = [{}]
path = {}
f o r s i n N:
V [ 0 ] [ s ] = s t a r t _ p [ s ] ∗ E [ s ] . g e t ( o [ 0 ] , 1 e −4)
path [ s ] = [ s ]
f o r t in range (1 , len ( o ) ) :
V. append ( { } )
new_path = {}
f o r s i n N:
(P , S ) = max ( (V[ t − 1 ] [ x ] ∗ T [ x ] [ s ] ∗ E [ s ] . g e t ( o [ t ] , 1 e −4) , x )
V[ t ] [ s ] = P
new_path [ s ] = path [ S ] + [ s ]
path = new_path
( prob , s t a t e ) = max ( (V[ l e n ( o ) − 1 ] [ s ] , s ) f o r s i n N)
r e t u r n path [ s t a t e ]
p r i n t ( v i t e r b i ( [ ’ f i s h ’ , ’ eat ’ ] , sn , sta rt_ p , T, E ) )

3.6 Write a Python program to perform Forward Prob-

ability.
d e f f o r w a r d ( obs , s t a t e s , sta rt_ p , trans_p , emit_p ) :
fwd = [ { } ]
for s in states :
fwd [ 0 ] [ s ] = s t a r t _ p [ s ] ∗ emit_p [ s ] . g e t ( obs [ 0 ] , 0 . 0 0 0 1 )
f o r t i n r a n g e ( 1 , l e n ( obs ) ) :
fwd . append ( { } )
for s in states :
CHAPTER 3. ASSIGNMENTS ON LANGUAGE MODELING 7

fwd [ t ] [ s ] = sum ( fwd [ t − 1 ] [ s 0 ] ∗ trans_p [ s 0 ] [ s ] f o r s 0 i n s

r e t u r n sum ( fwd [ − 1 ] [ s ] f o r s i n s t a t e s )
p r i n t ( f o r w a r d ( [ ’ f i s h ’ , ’ eat ’ ] , s t a t e s , sta rt_ p , trans_p , emit_p ) )

3.7 Write a Python Program to perform HMM Named

Entity Recognition.
s t a t e s = [ ’ O’ , ’PER ’ ]
s t a r t _ p = { ’O ’ : 0 . 9 , ’PER ’ : 0 . 1 }
trans_p = { ’O ’ : { ’O ’ : 0 . 9 , ’PER ’ : 0 . 1 } , ’PER ’ : { ’O ’ : 0 . 4 , ’PER ’ : 0 . 6 }
emit_p = { ’O ’ : { ’ I ’ : 0 . 4 , ’ l i v e ’ : 0 . 6 } , ’PER ’ : { ’ John ’ : 0 . 7 , ’ Smith ’ :

d e f v i t e r b i ( o , N, s tar t_p , T, E ) :
V = [{}]
path = {}
f o r s i n N:
V [ 0 ] [ s ] = s t a r t _ p [ s ] ∗ E [ s ] . g e t ( o [ 0 ] , 1 e −4)
path [ s ] = [ s ]
f o r t in range (1 , len ( o ) ) :
V. append ( { } )
new_path = {}
f o r s i n N:
(P , S ) = max ( (V[ t − 1 ] [ x ] ∗ T [ x ] [ s ] ∗ E [ s ] . g e t ( o [ t ] , 1 e −4) , x )
V[ t ] [ s ] = P
new_path [ s ] = path [ S ] + [ s ]
path = new_path
( prob , s t a t e ) = max ( (V[ l e n ( o ) − 1 ] [ s ] , s ) f o r s i n N)
r e t u r n path [ s t a t e ]
p r i n t ( v i t e r b i ( [ ’ John ’ , ’ Smith ’ ] , s t a t e s , sta rt _p , trans_p , emit_p ) )

Transform Raw Texts Into Training and Development Data: Instructor: Nikos Aletras
No ratings yet
Transform Raw Texts Into Training and Development Data: Instructor: Nikos Aletras
2 pages
Glove
100% (1)
Glove
10 pages
Kami Export - Assignment - 2 - 20240709
No ratings yet
Kami Export - Assignment - 2 - 20240709
13 pages
NLP Final
No ratings yet
NLP Final
26 pages
InfoSec Lab Manual for Students
No ratings yet
InfoSec Lab Manual for Students
25 pages
Neural Language Models & Classifiers Guide
No ratings yet
Neural Language Models & Classifiers Guide
7 pages
CS 388: Natural Language Processing:: N-Gram Language Models
No ratings yet
CS 388: Natural Language Processing:: N-Gram Language Models
22 pages
Ngrams
100% (1)
Ngrams
22 pages
3-Lecture Three - (Chapter Two-N-gram Language Models)
No ratings yet
3-Lecture Three - (Chapter Two-N-gram Language Models)
28 pages
Ai&Ml Bai601 NLP Lab Manual
No ratings yet
Ai&Ml Bai601 NLP Lab Manual
48 pages
Assignment 2 - 20240709
No ratings yet
Assignment 2 - 20240709
13 pages
NLP - (Natural Language Processing Lab Manual)
No ratings yet
NLP - (Natural Language Processing Lab Manual)
12 pages
Ngrams
No ratings yet
Ngrams
22 pages
XCS224N Module4 Slides
No ratings yet
XCS224N Module4 Slides
91 pages
Gen Ai Lab Programs
No ratings yet
Gen Ai Lab Programs
15 pages
NLP Final Review
No ratings yet
NLP Final Review
32 pages
Xu00b Icslp
No ratings yet
Xu00b Icslp
4 pages
Anlp 02 Wordrep Textclass
No ratings yet
Anlp 02 Wordrep Textclass
58 pages
Probabilistic Language Modeling Challenges
No ratings yet
Probabilistic Language Modeling Challenges
12 pages
Anlp 02 Wordrep Textclass
No ratings yet
Anlp 02 Wordrep Textclass
59 pages
CB3591 - Engineering Ssecure Software Systems - Notes
No ratings yet
CB3591 - Engineering Ssecure Software Systems - Notes
50 pages
Lecture - 3 - Statistical Language Models
No ratings yet
Lecture - 3 - Statistical Language Models
56 pages
Machine Learning and Statistical Natural Language Processing
No ratings yet
Machine Learning and Statistical Natural Language Processing
27 pages
02 NLP LM
No ratings yet
02 NLP LM
99 pages
Formal Aspects of Language Modeling
No ratings yet
Formal Aspects of Language Modeling
252 pages
CS671A/CS671: Introduction To Natural Language Processing Mid-Semester Exam
No ratings yet
CS671A/CS671: Introduction To Natural Language Processing Mid-Semester Exam
7 pages
Language Models
No ratings yet
Language Models
59 pages
Module 5
No ratings yet
Module 5
69 pages
NLP
No ratings yet
NLP
12 pages
Introduction To Computational Linguistics: Eugene Charniak and Mark Johnson
No ratings yet
Introduction To Computational Linguistics: Eugene Charniak and Mark Johnson
148 pages
Maximum Entropy Markov Models: Alan Ritter CSE 5525
No ratings yet
Maximum Entropy Markov Models: Alan Ritter CSE 5525
70 pages
BAI601 All Modules VTU 10 Mark Complete
No ratings yet
BAI601 All Modules VTU 10 Mark Complete
18 pages
NLP Final
No ratings yet
NLP Final
11 pages
NLP Unit-4
No ratings yet
NLP Unit-4
48 pages
Language Models & N-Gram Analysis
No ratings yet
Language Models & N-Gram Analysis
41 pages
6.chapter6 LanguageModel
No ratings yet
6.chapter6 LanguageModel
33 pages
CS6314
No ratings yet
CS6314
2 pages
Faculty of Engineering: - Answer Any Four Full Questions Missing Data, If Any, May Be Assumed Suitably. 1. (A)
No ratings yet
Faculty of Engineering: - Answer Any Four Full Questions Missing Data, If Any, May Be Assumed Suitably. 1. (A)
2 pages
Notes - Ryan
No ratings yet
Notes - Ryan
258 pages
Cs224n 2023 Lecture05 RNNLM
No ratings yet
Cs224n 2023 Lecture05 RNNLM
68 pages
NLP Record
No ratings yet
NLP Record
15 pages
Nn4nlp 02 LM
No ratings yet
Nn4nlp 02 LM
47 pages
Word2Vec for NLP Enthusiasts
100% (1)
Word2Vec for NLP Enthusiasts
12 pages
NLP Study Plan For Beginners - HW Samples
No ratings yet
NLP Study Plan For Beginners - HW Samples
47 pages
Assignment 1
No ratings yet
Assignment 1
7 pages
L5 Cse256 Fa24 LM
No ratings yet
L5 Cse256 Fa24 LM
65 pages
NLP PLM
No ratings yet
NLP PLM
35 pages
NLP Previous Sem-1-3
No ratings yet
NLP Previous Sem-1-3
3 pages
Rajeev Mishra 20 SCSE1180087
No ratings yet
Rajeev Mishra 20 SCSE1180087
29 pages
0 Yqn EK3 VG 4 He OTv 089 KX SI1 Ij Wzu Ax T1 Ag Gev OKKJE
No ratings yet
0 Yqn EK3 VG 4 He OTv 089 KX SI1 Ij Wzu Ax T1 Ag Gev OKKJE
4 pages
Crackbitswilp - In: Instructions To Candidates
No ratings yet
Crackbitswilp - In: Instructions To Candidates
9 pages
CS-875-Lecture 4
No ratings yet
CS-875-Lecture 4
47 pages
PA3 Problem Statement
No ratings yet
PA3 Problem Statement
5 pages
2.1 Chap NLP Ngrams
No ratings yet
2.1 Chap NLP Ngrams
37 pages
Lecture 4
No ratings yet
Lecture 4
87 pages
NLP Presentation Topics - BTech C
No ratings yet
NLP Presentation Topics - BTech C
2 pages
Methodology
No ratings yet
Methodology
9 pages
Texport Overseas PVT LTD.: 1.1 An Introduction On Organization Study
No ratings yet
Texport Overseas PVT LTD.: 1.1 An Introduction On Organization Study
65 pages
Azure VPN Setup for IT Professionals
No ratings yet
Azure VPN Setup for IT Professionals
19 pages
Instruction Manual: P/N 30-2131-XXX Pressure Sensors
No ratings yet
Instruction Manual: P/N 30-2131-XXX Pressure Sensors
2 pages
Daily Lesson Log of Stem - Bc11Lc-Iiib-2: Compare The Graph of The Three Special Functions
No ratings yet
Daily Lesson Log of Stem - Bc11Lc-Iiib-2: Compare The Graph of The Three Special Functions
5 pages
Mitosis Lecture PDF
No ratings yet
Mitosis Lecture PDF
11 pages
One Minute Manager Notes
No ratings yet
One Minute Manager Notes
8 pages
Blue Lock: Isagi & Rin's Reunion
No ratings yet
Blue Lock: Isagi & Rin's Reunion
24 pages
FLC Provider Database
0% (1)
FLC Provider Database
15 pages
23:23:48
No ratings yet
23:23:48
364 pages
Virtual Palletization Plan FNDE
No ratings yet
Virtual Palletization Plan FNDE
299 pages
9 A. Factory - Cost
No ratings yet
9 A. Factory - Cost
4 pages
Multi2sim Quickstart
No ratings yet
Multi2sim Quickstart
10 pages
Math 8 Q1 Week 2.2
No ratings yet
Math 8 Q1 Week 2.2
6 pages
Arrays: Shristi Technology Labs
No ratings yet
Arrays: Shristi Technology Labs
9 pages
Chapter 1+2+GSCM
No ratings yet
Chapter 1+2+GSCM
45 pages
Hand, Foot and Mouth Disease (HFMD)
No ratings yet
Hand, Foot and Mouth Disease (HFMD)
3 pages
Gestation - Biology Presentation
No ratings yet
Gestation - Biology Presentation
8 pages
Grade 9 Math: Understanding Mean
No ratings yet
Grade 9 Math: Understanding Mean
8 pages
Noting and Drafting Skills
100% (2)
Noting and Drafting Skills
33 pages
Introduction To This Teacher Resource
No ratings yet
Introduction To This Teacher Resource
2 pages
Cre6-C-240
No ratings yet
Cre6-C-240
1 page
Drg. Reza Fajarsyah Putra, SP - BM Prodi Ikg FK Univ Yarsi
No ratings yet
Drg. Reza Fajarsyah Putra, SP - BM Prodi Ikg FK Univ Yarsi
38 pages
Adcps: Question Paper Cum Answer Sheet
No ratings yet
Adcps: Question Paper Cum Answer Sheet
5 pages
Model Analysis
100% (3)
Model Analysis
7 pages
Sample CV
No ratings yet
Sample CV
6 pages
U.S.S. Europa Starship Specs
No ratings yet
U.S.S. Europa Starship Specs
1 page
Sample Guard House Drawing-Model
No ratings yet
Sample Guard House Drawing-Model
1 page
Grade 9 - English All Unit 3 and Moments #3
No ratings yet
Grade 9 - English All Unit 3 and Moments #3
5 pages
WinDNC V06 02 NewFeatures en
100% (3)
WinDNC V06 02 NewFeatures en
2 pages
Sabrang' 22 Final Rulebook
No ratings yet
Sabrang' 22 Final Rulebook
50 pages

Natural Language Processing (Weekly Laboratory Assignments) : Sumit Kumar Banerjee

Uploaded by

Natural Language Processing (Weekly Laboratory Assignments) : Sumit Kumar Banerjee

Uploaded by

Natural Language Processing

(Weekly Laboratory Assignments)

Sumit Kumar Banerjee

3 Assignments on Language Modeling 4

3.1 Write a Python program to perform Unigram

3.2 Write a Python program to perform Bigram Model

3.3 Write a Python program to execute Trigram Text

3.4 Write a Python program to perform Bigram Spell

return ” ”. join (o)

3.5 Write a Python program to perform Viterbi POS

3.6 Write a Python program to perform Forward Prob-

fwd [ t ] [ s ] = sum ( fwd [ t − 1 ] [ s 0 ] ∗ trans_p [ s 0 ] [ s ] f o r s 0 i n s

3.7 Write a Python Program to perform HMM Named

You might also like