Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
147 views2 pages

NLP - Assignment 1

The document outlines an assignment involving n-gram language modeling and naive bayes classification. It provides sample training data and test sentences for computing n-gram probabilities and completing sentences. It also describes tasks for applying smoothing, interpolation, and training naive bayes models on movie review data to classify sentences.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
147 views2 pages

NLP - Assignment 1

The document outlines an assignment involving n-gram language modeling and naive bayes classification. It provides sample training data and test sentences for computing n-gram probabilities and completing sentences. It also describes tasks for applying smoothing, interpolation, and training naive bayes models on movie review data to classify sentences.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Assignment 1

Part 1: N-Grams ` (60 points)

Consider the following dataset for all the problems below:


Training Corpus
The quick brown fox jumps over the lazy dog. The lazy dog barks loudly.
The brown fox jumps over the log. The quick cat meows softly.
The lazy cat sleeps soundly. The quick brown dog barks at night.
The brown fox jumps high. The lazy cat wakes up early.
The quick cat sleeps peacefully. The lazy dog snores loudly.

Testing Sentences
The lazy dog barks loudly.
The brown fox jumps high.
The lazy cat sleeps soundly.
The quick brown dog barks at night.
The lazy cat wakes up early.

1. Compute probabilities of test sentences using bigram , trigram and quadgram LM. (20
points)

2. Complete the following sentences to 10 words using bigram, trigram and quadgram LM.
(10 points)
● "The lazy dog……."
● "The brown…...."
● "The quick……..."
3. Apply laplace smoothing with k =0.5 for bigram LM and compute probability of test
sentences. (10 points)
4. Implement interpolation approach for trigram LM on python for the above training dataset
with lamda 1=0.5, lambda 2=0.3 and lambda 3=0.2. (20 points)
Part 2: Naive Bayes Classification (40 points)

1. Assume the following likelihoods for each word being part of a positive or negative
movie review, and equal prior probabilities for each class. What class will Naive bayes
assign to the sentence “I always like foreign films.”?(5 points)

2. Given the following short movie reviews, each labeled with a genre, either comedy or
action:

A new document D: fast, couple, shoot, fly. Compute the most likely class for D.
Assume a naive Bayes classifier and use add-1 smoothing for the likelihoods. (5 points)

3. Train two models, multinominal naive Bayes and binarized naive Bayes, both with add-1
smoothing, on the following document counts for key sentiment words, with positive or
negative class assigned as noted. (10 points)

Use both naive Bayes models to assign a class (pos or neg) to this sentence:

A good, good plot and great characters, but poor acting.

4. Implement Naive Bayes classifier for sentiment classification on Python. You may take
any dataset with movie reviews, tweets, email amazon reviews or something else. Data
must be publicly accessible. You are not allowed to work on toy dataset. (20 points)

You might also like