Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
5 views8 pages

Natural Language Processing (Weekly Laboratory Assignments) : Sumit Kumar Banerjee

The document outlines weekly laboratory assignments focused on Natural Language Processing, specifically on language modeling techniques. It includes Python programming tasks for implementing Unigram, Bigram, and Trigram models, as well as spell correction, POS tagging, forward probability, and named entity recognition using Hidden Markov Models. Each section provides code snippets and instructions for completing the assignments.

Uploaded by

aguha1001
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views8 pages

Natural Language Processing (Weekly Laboratory Assignments) : Sumit Kumar Banerjee

The document outlines weekly laboratory assignments focused on Natural Language Processing, specifically on language modeling techniques. It includes Python programming tasks for implementing Unigram, Bigram, and Trigram models, as well as spell correction, POS tagging, forward probability, and named entity recognition using Hidden Markov Models. Each section provides code snippets and instructions for completing the assignments.

Uploaded by

aguha1001
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Natural Language Processing

(Weekly Laboratory Assignments)

Sumit Kumar Banerjee


Contents

1 2

2 3

3 Assignments on Language Modeling 4


3.1 Question 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3.2 Question 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3.3 Question 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3.4 Question 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3.5 Question 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.6 Question 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.7 Question 7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1
Chapter 1

2
Chapter 2

3
Chapter 3

Language Modeling

3.1 Write a Python program to perform Unigram


Language Model with Laplace Smoothing.
from c o l l e c t i o n s i m p o r t Counter
i m p o r t math

d e f unigram_model ( c o r p u s ) :
tokens = corpus . s p l i t ()
c o u n t s = Counter ( t o k e n s )
v = len ( counts )
t o t a l _ t o k e n s = sum ( c o u n t s . v a l u e s ( ) )
d e f prob ( word ) :
r e t u r n ( c o u n t s [ word ] + 1 ) / ( t o t a l _ t o k e n s + v )
r e t u r n prob

c o r p u s = ” t h e c a t s a t on t h e mat t h e c a t a t e f i s h ”
model = unigram_model ( c o r p u s )
p r i n t ( ”P( c a t ) =” , model ( ” c a t ” ) )

3.2 Write a Python program to perform Bigram Model


with Laplace Smoothing
from c o l l e c t i o n s i m p o r t d e f a u l t d i c t
d e f bigram_model ( c o r p u s ) :
tokens = corpus . s p l i t ()
b = defaultdict ( int )
u = defaultdict ( int )
vocab = s e t ( t o k e n s )
f o r i in range ( l e n ( tokens ) −1):
u [ t o k e n s [ i ] ] += 1
b [ ( t o k e n s [ i ] , t o k e n s [ i + 1 ] ) ] += 1
v = l e n ( vocab )

4
CHAPTER 3. ASSIGNMENTS ON LANGUAGE MODELING 5

d e f prob ( w1 , w2 ) :
r e t u r n ( b [ ( w1 , w2 ) ] + 1 ) / ( u [ w1 ] + v )
r e t u r n prob

c o r p u s = ” t h e c a t s a t on t h e mat t h e c a t a t e f i s h ”
model = bigram_model ( c o r p u s )
p r i n t ( ”P( c a t | t h e ) =” , model ( ” t h e ” , ” c a t ” ) )

3.3 Write a Python program to execute Trigram Text


Generator.
from random i m p o r t c h o i c e
from c o l l e c t i o n s i m p o r t d e f a u l t d i c t

d e f t r i g r a m _ g e n e r a t o r ( c o r p u s , s t a r t , l e n g t h = 1 0 ):
o = corpus . s p l i t ()
t = defaultdict ( l i s t )
f o r i in range ( l e n ( o ) −2):
t [ ( o [ i ] , o [ o ] ) ] . append ( t o k e n s [ i +2])
text = l i s t ( start )
f o r _ in range ( length ) :
pair = tuple ( text [ −2:])
next_word = c h o i c e ( t . g e t ( p a i r , [” <END> ” ] ) )
i f next_word == ”<END>”: b r e a k
t e x t . append ( next_word )
return ” ”. join ( text )

corpus = input ()
p r i n t ( trigram_generator ( corpus , (” the ” , ” cat ” ) ) )

3.4 Write a Python program to perform Bigram Spell


Correction.
from d i f f l i b i m p o r t g e t _ c l o s e _ m a t c h e s
d e f s p e l l _ c o r r e c t ( s e n t e n c e , vocab , p ) :
words = s e n t e n c e . s p l i t ( )
o = [ words [ 0 ] ]
f o r i i n r a n g e ( 1 , l e n ( words ) ) :
i f words [ i ] not i n vocab :
c1 = g e t _ c l o s e _ m a t c h e s ( words [ i ] , vocab )
i f c1 :
s = [ ( c , p ( o [ − 1 ] , c ) ) f o r c i n c1 ]
words [ i ] = max ( s , key=lambda x : x [ 1 ] ) [ 0 ]
o . append ( words [ i ] )
CHAPTER 3. ASSIGNMENTS ON LANGUAGE MODELING 6

return ” ”. join (o)


c o r p u s = ” t h e c a t s a t on t h e mat”
vocab = s e t ( c o r p u s . s p l i t ( ) )
model = bigram_model ( c o r p u s )
p r i n t ( s p e l l _ c o r r e c t ( ” t h e c e t s a t on t e h mat ” , vocab , model ) )

3.5 Write a Python program to perform Viterbi POS


Tagging.
N = [ ’ Noun ’ , ’ Verb ’ ]
s t a r t _ p = { ’ Noun ’ : 0 . 6 , ’ Verb ’ : 0 . 4 }
T = { ’ Noun ’ : { ’ Noun ’ : 0 . 1 , ’ Verb ’ : 0 . 9 } ,
’ Verb ’ : { ’ Noun ’ : 0 . 8 , ’ Verb ’ : 0 . 2 } }
E = { ’ Noun ’ : { ’ f i s h ’ : 0 . 5 , ’ eat ’ : 0 . 5 } ,
’ Verb ’ : { ’ f i s h ’ : 0 . 4 , ’ eat ’ : 0 . 6 } }

d e f v i t e r b i ( o , N, s tar t_p , T, E ) :
V = [{}]
path = {}
f o r s i n N:
V [ 0 ] [ s ] = s t a r t _ p [ s ] ∗ E [ s ] . g e t ( o [ 0 ] , 1 e −4)
path [ s ] = [ s ]
f o r t in range (1 , len ( o ) ) :
V. append ( { } )
new_path = {}
f o r s i n N:
(P , S ) = max ( (V[ t − 1 ] [ x ] ∗ T [ x ] [ s ] ∗ E [ s ] . g e t ( o [ t ] , 1 e −4) , x )
V[ t ] [ s ] = P
new_path [ s ] = path [ S ] + [ s ]
path = new_path
( prob , s t a t e ) = max ( (V[ l e n ( o ) − 1 ] [ s ] , s ) f o r s i n N)
r e t u r n path [ s t a t e ]
p r i n t ( v i t e r b i ( [ ’ f i s h ’ , ’ eat ’ ] , sn , sta rt_ p , T, E ) )

3.6 Write a Python program to perform Forward Prob-


ability.
d e f f o r w a r d ( obs , s t a t e s , sta rt_ p , trans_p , emit_p ) :
fwd = [ { } ]
for s in states :
fwd [ 0 ] [ s ] = s t a r t _ p [ s ] ∗ emit_p [ s ] . g e t ( obs [ 0 ] , 0 . 0 0 0 1 )
f o r t i n r a n g e ( 1 , l e n ( obs ) ) :
fwd . append ( { } )
for s in states :
CHAPTER 3. ASSIGNMENTS ON LANGUAGE MODELING 7

fwd [ t ] [ s ] = sum ( fwd [ t − 1 ] [ s 0 ] ∗ trans_p [ s 0 ] [ s ] f o r s 0 i n s

r e t u r n sum ( fwd [ − 1 ] [ s ] f o r s i n s t a t e s )
p r i n t ( f o r w a r d ( [ ’ f i s h ’ , ’ eat ’ ] , s t a t e s , sta rt_ p , trans_p , emit_p ) )

3.7 Write a Python Program to perform HMM Named


Entity Recognition.
s t a t e s = [ ’ O’ , ’PER ’ ]
s t a r t _ p = { ’O ’ : 0 . 9 , ’PER ’ : 0 . 1 }
trans_p = { ’O ’ : { ’O ’ : 0 . 9 , ’PER ’ : 0 . 1 } , ’PER ’ : { ’O ’ : 0 . 4 , ’PER ’ : 0 . 6 }
emit_p = { ’O ’ : { ’ I ’ : 0 . 4 , ’ l i v e ’ : 0 . 6 } , ’PER ’ : { ’ John ’ : 0 . 7 , ’ Smith ’ :

d e f v i t e r b i ( o , N, s tar t_p , T, E ) :
V = [{}]
path = {}
f o r s i n N:
V [ 0 ] [ s ] = s t a r t _ p [ s ] ∗ E [ s ] . g e t ( o [ 0 ] , 1 e −4)
path [ s ] = [ s ]
f o r t in range (1 , len ( o ) ) :
V. append ( { } )
new_path = {}
f o r s i n N:
(P , S ) = max ( (V[ t − 1 ] [ x ] ∗ T [ x ] [ s ] ∗ E [ s ] . g e t ( o [ t ] , 1 e −4) , x )
V[ t ] [ s ] = P
new_path [ s ] = path [ S ] + [ s ]
path = new_path
( prob , s t a t e ) = max ( (V[ l e n ( o ) − 1 ] [ s ] , s ) f o r s i n N)
r e t u r n path [ s t a t e ]
p r i n t ( v i t e r b i ( [ ’ John ’ , ’ Smith ’ ] , s t a t e s , sta rt _p , trans_p , emit_p ) )

You might also like