0% found this document useful (0 votes)

30 views5 pages

Practice Set NLP

The document outlines various topics related to Natural Language Processing (NLP), including challenges, preprocessing techniques, syntax, language modeling, and sentiment analysis. It includes specific tasks and questions aimed at understanding concepts such as tokenization, stemming, morphological parsing, and the use of algorithms like CKY and Viterbi. Additionally, it covers practical applications of NLP, such as information extraction and machine translation, along with exercises for calculating similarity and probabilities.

Uploaded by

Rex D

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

30 views5 pages

Practice Set NLP

Uploaded by

Rex D

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Practice Set- NLP

Intro to NLP and Challenges (1–4)

1. What is Natural Language Processing (NLP)? Provide two real-world examples of NLP
applications and explain their significance.
2. Discuss the challenge of lexical ambiguity in NLP. Provide an example sentence with a
polysemous word and explain how it affects processing.
3. Why is context understanding a major challenge in NLP? Illustrate with an example of a
sentence where context changes meaning.
4. Explain how evolving language (e.g., slang, emojis) poses a challenge for NLP systems.
Suggest one approach to address this issue.

Preprocessing 1: Tokenization and Stopwords (5–8)

5. Given the text: "Wow!! NLP is super fun...", tokenize it into lowercase words, remove
punctuation, and list the tokens. [3 Marks]
6. What are two types of tokenization techniques? Provide an example of each applied to
the sentence "I’m learning NLP."
7. Given the text: "The dog runs in the park.", remove stopwords {the, in, a} and list the
remaining tokens. [3 Marks]
8. Explain the impact of stopword removal on text classification. Provide an example where
stopword removal improves model performance.

Preprocessing 2: Stemming and Lemmatization (9–12)

9. What is the difference between stemming and lemmatization? Apply Porter Stemmer
and a lemmatizer to the word "running" and compare results.
10.Given the words {studies, studying, studied}, apply Porter Stemmer to each and list the
stems. [3 Marks]
11.Lemmatize the words {geese, better, running} using a standard lemmatizer. Assume verb
context for "running". List the lemmas. [3 Marks]
12.Explain why lemmatization is preferred over stemming in tasks like information retrieval.
Provide an example.

Morphology (13–15)

13.What is morphological parsing? Provide an example of parsing the word "unhappiness"

into its morphemes.
14.Explain how Finite State Transducers (FSTs) are used in morphological analysis. Give
an example of an FST rule for pluralization.
15.Given the word "cats", use an FST to generate its singular form and explain the
transformation steps. [4 Marks]

Syntax and CKY Algorithm (16–19)

16.What is syntactic parsing? Draw a constituency parse tree for the sentence "The cat
sleeps."
17.Given the CFG: S → NP VP, NP → Det N, VP → V, Det → the, N → dog, V → barks,
use the CKY algorithm to parse "the dog barks". Show the parsing table. [6 Marks]
18.Explain why the CKY algorithm requires grammars in Chomsky Normal Form (CNF).
Provide an example of converting a rule to CNF.
19.Given the sentence "a cat runs" and CFG: S → NP VP, NP → Det N, VP → V, Det → a,
N → cat, V → runs, fill the CKY table to verify if it’s grammatical. [5 Marks]

Language Modeling and N-grams (20–23)

20.What is a language model? Provide an example of how a trigram model predicts the next
word in a sentence.
21.Given the corpus: "I eat rice. I eat bread.", list all bigrams and their counts. [4 Marks]
22.Using the corpus: "the cat runs the dog jumps", calculate the MLE probability of the
bigram P(runs|cat). [4 Marks]
23.Explain the difference between unigram, bigram, and trigram models. Why do
higher-order n-grams capture more context?

Smoothing (24–26)

24.Why is smoothing necessary in n-gram models? Provide an example of a

zero-probability issue without smoothing.
25.Given a unigram model with counts {the: 3, cat: 2, runs: 1}, vocabulary size 4, calculate
Laplace-smoothed probabilities for "cat" and an unseen word "dog". [5 Marks]
26.Given a bigram model with counts: P(dog|the) = 2/5, P(cat|the) = 3/5, total "the" count =
5, vocabulary size = 3, compute Laplace-smoothed P(cat|the). [4 Marks]

POS Tagging and HMMs (27–30)

27.What is Part-of-Speech (POS) tagging? Explain how statistical POS taggers use context
to assign tags.
28.Given an HMM with tags {NOUN, VERB}, transition counts: NOUN→VERB = 2,
NOUN→NOUN = 1, and emission counts: NOUN(cat) = 2, VERB(runs) = 1, calculate
P(VERB|NOUN) and P(cat|NOUN). [5 Marks]
29.Explain the components of a Hidden Markov Model (HMM) for POS tagging. Provide an
example of states and observations.
30.Given a tag sequence: [NOUN, VERB, NOUN] and emissions: P(dog|NOUN) = 0.5,
P(runs|VERB) = 0.6, P(cat|NOUN) = 0.4, calculate the probability of the sequence "dog
runs cat". [5 Marks]

Viterbi Algorithm (31–32)

31.Explain how the Viterbi algorithm decodes the most likely tag sequence in an HMM.
Provide a simple example with two tags.
32.Given an HMM with tags {NOUN, VERB}, transitions: P(NOUN|NOUN) = 0.6,
P(VERB|NOUN) = 0.4, emissions: P(cat|NOUN) = 0.7, P(runs|VERB) = 0.8, use Viterbi
to find the most likely tag sequence for "cat runs". Show the trellis. [6 Marks]

One-Hot Encoding and BoW (33–35)

33.Given a vocabulary {apple, banana, orange}, create one-hot encoding vectors for "apple"
and "banana". [4 Marks]
34.Given the sentence "I eat an apple" and vocabulary {I, eat, an, apple, banana}, create a
Bag-of-Words vector. [4 Marks]
35.Explain the limitations of one-hot encoding in NLP. Why is it less effective than dense
embeddings for semantic tasks?

TF-IDF (36–37)

36.Given two documents: Doc1: "I love NLP", Doc2: "NLP is fun", calculate the TF-IDF
score for "NLP" in Doc1. Show TF, IDF, and final score. [6 Marks]
37.Explain the TF-IDF formula. Why does it prioritize rare terms in document
representation? Provide an example.

Word2Vec and Semantics (38–40)

38.Explain the core idea behind Word2Vec’s CBOW model. How does it learn word
embeddings from a corpus?
39.What is Word Sense Disambiguation (WSD)? Apply a simplified Lesk algorithm to
disambiguate "pen" in "She wrote with a pen" using glosses: pen₁ (writing tool), pen₂
(animal enclosure). [5 Marks]
40.Explain how semantic similarity is measured in Word2Vec. Provide an example of two
words with high similarity.

Sentiment, Classification, and Summarization (41–44)

41.Given a dataset: Positive: "Great app", Negative: "Bad app", calculate Naive Bayes
likelihoods for "great" and "bad". Classify "Great app" assuming equal priors. [6 Marks]
42.What is a sentiment lexicon? Provide an example of using a lexicon to classify "This
movie is awesome" as positive.
43.Explain how TextRank performs extractive summarization. Given sentences S1–S3 with
similarities S1–S2 = 0.5, S1–S3 = 0.3, S2–S3 = 0.4, rank them after one iteration
(damping = 0.85). [6 Marks]
44.What is text classification? Provide an example of classifying emails as spam or not
spam using Naive Bayes.

Information Extraction and Question Answering (45–47)

45.What is relation extraction? Provide an example of extracting a "works-at" relation from

"Jane works at IBM."
46.Given a query: "AI tools" and documents: Doc1: {AI, tech}, Doc2: {AI, tools}, calculate
Jaccard similarity for each. [5 Marks]
47.Given a QnA system with documents: Doc1: {cat, runs}, Doc2: {dog, jumps}, and query:
{cat}, compute cosine similarity for each document using term frequency vectors.
48.Given the English sentence "The blue car is big", translate it to Hindi using rule-based
MT. Explain morphological and syntactic transfer rules. [6 Marks]
49.What are the challenges of multilingual NLP? Provide an example of word order
differences affecting translation.
50.Explain how dictionary lookup fails in machine translation. Provide an example sentence
where it produces incorrect output.
51.Explain how the attention mechanism in NMT improves translation quality over Statistical
MT. Provide an example of translating "The cat sleeps" to French, highlighting attention’s
role.
52.Given an NMT model with a vocabulary of 3 words {the, cat, sleeps} and a test sentence
"the cat sleeps", compute the probability of the target French translation "le chat dort"
using hypothetical softmax outputs: P(le|start) = 0.8, P(chat|le) = 0.7, P(dort|chat) = 0.6.
[5 Marks]
53.A QnA system uses Jaccard Similarity to retrieve the most relevant document for a
user’s query. The knowledge base and query are preprocessed (stopwords removed):

Knowledge Base Documents:

○ Doc A: {‘artificial’, ‘intelligence’, ‘systems’}

○ Doc B: {‘machine’, ‘intelligence’, ‘models’}
○ Doc C: {‘systems’, ‘data’, ‘processing’}

User Query: Query Q: {‘intelligence’, ‘systems’} Tasks: a) State the formula for Jaccard
Similarity. [2 Marks]
b) Calculate the Jaccard Similarity between Query Q and each document (Doc A, Doc
B, Doc C). Show step-by-step calculations for intersection and union sizes. [6 Marks]
c) Which document is retrieved as the most relevant? Explain why based on the
similarity scores. [2 Marks]

54. You are using the VADER sentiment analysis tool to analyze the following tweet:
“Love the new app’s design, but it crashes constantly!”
Tasks:
Token-Level Analysis:

a) Identify the sentiment-bearing words in the tweet. [2 Marks]

b) For each, indicate whether it contributes to positive, negative, or neutral sentiment in

VADER. [2 Marks]
c) Explain how the word “constantly” impacts the sentiment score of “crashes” in
VADER. [2 Marks]

d) Describe how the conjunction “but” affects the overall sentiment score per VADER’s
rules. [2 Marks]

e) If VADER’s compound score is -0.15, classify the sentiment as positive, negative, or

neutral using VADER thresholds. [1 Mark]
54. You are tasked with summarizing a news article using the TextRank algorithm. The
article has four sentences:

S1: “AI improves efficiency in manufacturing.”

S2: “Tests were conducted across multiple factories.”

S3: “AI achieved 90% accuracy in quality control.”

S4: “Factories report cost savings with AI.”

Assume cosine similarity scores: S1–S2 = 0.3, S1–S3 = 0.7, S1–S4 = 0.5, S2–S3
= 0.4, S2–S4 = 0.2, S3–S4 = 0.6.
Tasks:
a) Construct the sentence similarity graph (nodes = sentences, edges =
similarities). [2 Marks]
b) Perform two iterations of the TextRank algorithm with a damping factor of
0.85. Show initial scores and calculations. [5 Marks]
c) Rank the sentences and select the top two for the summary. Explain their
relevance. [3 Marks]

Question Bank - NLP
No ratings yet
Question Bank - NLP
3 pages
MCQ NLP
67% (3)
MCQ NLP
11 pages
Slide-Lesson 27-Open & Closed Syllable, Split Syllable
100% (1)
Slide-Lesson 27-Open & Closed Syllable, Split Syllable
51 pages
The Subject, Object and Possessive Pronouns and Possessive Adjectives
100% (2)
The Subject, Object and Possessive Pronouns and Possessive Adjectives
26 pages
Kukharenko - A Book of Practice in Stylistics
No ratings yet
Kukharenko - A Book of Practice in Stylistics
84 pages
SSC CHSL 2025 Detailed Notes
No ratings yet
SSC CHSL 2025 Detailed Notes
8 pages
Cse3015 NLP QP
No ratings yet
Cse3015 NLP QP
6 pages
NLP Question Bank Template
100% (1)
NLP Question Bank Template
11 pages
TOEFL Untuk Persiapan
No ratings yet
TOEFL Untuk Persiapan
20 pages
NLP Exam Prep for Engineering Students
No ratings yet
NLP Exam Prep for Engineering Students
52 pages
NLP Applications and Techniques
No ratings yet
NLP Applications and Techniques
7 pages
1.1 Smiles 1 Plani Vjetor 2022-2023
No ratings yet
1.1 Smiles 1 Plani Vjetor 2022-2023
9 pages
Computational Analytics and NLP Question Paper
No ratings yet
Computational Analytics and NLP Question Paper
2 pages
Question Bank NLP SOLUTIONS
No ratings yet
Question Bank NLP SOLUTIONS
21 pages
Kids Zone 3
No ratings yet
Kids Zone 3
21 pages
NLP Endsem 2016
No ratings yet
NLP Endsem 2016
2 pages
NLP Sample Questions-Stu
No ratings yet
NLP Sample Questions-Stu
4 pages
Sample Questions: Subject Name: Semester: VIII
No ratings yet
Sample Questions: Subject Name: Semester: VIII
7 pages
Question Bank
No ratings yet
Question Bank
3 pages
NLP Exam Questions and Answers
No ratings yet
NLP Exam Questions and Answers
10 pages
NLP Final Review
No ratings yet
NLP Final Review
32 pages
Eti 3111
No ratings yet
Eti 3111
28 pages
NLP Endsem Paper Regular Paper SOLUTION April 2024
No ratings yet
NLP Endsem Paper Regular Paper SOLUTION April 2024
10 pages
Qns
No ratings yet
Qns
6 pages
NLP Previous Sem-1-3
No ratings yet
NLP Previous Sem-1-3
3 pages
NLP Previous Sem
No ratings yet
NLP Previous Sem
5 pages
Comprehensive NLP Practice Assignment
No ratings yet
Comprehensive NLP Practice Assignment
2 pages
NLP Quiz
No ratings yet
NLP Quiz
2 pages
CEGP013091: 49.248.216.238 17/05/2024 13:48:57 Static-238
No ratings yet
CEGP013091: 49.248.216.238 17/05/2024 13:48:57 Static-238
3 pages
MTE Practice Set
No ratings yet
MTE Practice Set
4 pages
NLP Midsem Paper Jan 2024 Regular Exam
No ratings yet
NLP Midsem Paper Jan 2024 Regular Exam
4 pages
SNLP
No ratings yet
SNLP
18 pages
It-3035 (NLP) - CS Mid Feb 2024
No ratings yet
It-3035 (NLP) - CS Mid Feb 2024
6 pages
Vaishnavi NLP
No ratings yet
Vaishnavi NLP
6 pages
Q ClassX AI Ch7
No ratings yet
Q ClassX AI Ch7
6 pages
How To Tell Time in Mandarin Chinese - Days, Weeks, Months, and Years
No ratings yet
How To Tell Time in Mandarin Chinese - Days, Weeks, Months, and Years
7 pages
NLP Quizzes
No ratings yet
NLP Quizzes
2 pages
NLP Sem
No ratings yet
NLP Sem
4 pages
NLP QB2
No ratings yet
NLP QB2
9 pages
X - AI-NLP Worksheet
No ratings yet
X - AI-NLP Worksheet
2 pages
NLP M1
No ratings yet
NLP M1
31 pages
VND Openxmlformats-Officedocument Wordprocessingml Document&rendition 1
No ratings yet
VND Openxmlformats-Officedocument Wordprocessingml Document&rendition 1
5 pages
Practice Problems of NLP
No ratings yet
Practice Problems of NLP
3 pages
9783293-CLASS10 AI Worksheet PART B UNIT6 Natural Language Processing
No ratings yet
9783293-CLASS10 AI Worksheet PART B UNIT6 Natural Language Processing
3 pages
Test and Train Practice Test B2 FFS Writing Sample Answers and Examiner Commentaries
No ratings yet
Test and Train Practice Test B2 FFS Writing Sample Answers and Examiner Commentaries
9 pages
Wa0002.
No ratings yet
Wa0002.
6 pages
Assignment 8
No ratings yet
Assignment 8
5 pages
NLP New QB
No ratings yet
NLP New QB
3 pages
English Grammar Essentials
No ratings yet
English Grammar Essentials
4 pages
NLP 2K22 MAY CS3EA06 Natural Language Processing
No ratings yet
NLP 2K22 MAY CS3EA06 Natural Language Processing
2 pages
CSE 3652 Lab Record Format - PDF
No ratings yet
CSE 3652 Lab Record Format - PDF
13 pages
Lucas Paquetta Raw NLP
No ratings yet
Lucas Paquetta Raw NLP
12 pages
Interaction Between Computers and Human Language
No ratings yet
Interaction Between Computers and Human Language
15 pages
Assignment 1
No ratings yet
Assignment 1
3 pages
0 Yqn EK3 VG 4 He OTv 089 KX SI1 Ij Wzu Ax T1 Ag Gev OKKJE
No ratings yet
0 Yqn EK3 VG 4 He OTv 089 KX SI1 Ij Wzu Ax T1 Ag Gev OKKJE
4 pages
Nlpiat1word 1
No ratings yet
Nlpiat1word 1
11 pages
Assignmet Questions NLP
No ratings yet
Assignmet Questions NLP
2 pages
Module 2
No ratings yet
Module 2
3 pages
Welcome: Okky Sulistiyorini, S.PD SMK Kesehatan Fahd Islamic School
No ratings yet
Welcome: Okky Sulistiyorini, S.PD SMK Kesehatan Fahd Islamic School
35 pages
Grade 7 - Week 5 (English)
No ratings yet
Grade 7 - Week 5 (English)
6 pages
NLP Sem 7 Imp Questions
No ratings yet
NLP Sem 7 Imp Questions
11 pages
Bad Language Introduction
No ratings yet
Bad Language Introduction
24 pages
Corpus Typology
No ratings yet
Corpus Typology
23 pages
Active To Passive Voice Bangla Rules
No ratings yet
Active To Passive Voice Bangla Rules
3 pages
End Sem Answer Key 2023
No ratings yet
End Sem Answer Key 2023
4 pages
TL - SB2 - p01 - Title - Page - Indd 1 9/24/19 12:04 PM
No ratings yet
TL - SB2 - p01 - Title - Page - Indd 1 9/24/19 12:04 PM
25 pages
GMAT Grammar Rules - Complete List of GMAT Sentence Correction Rules
No ratings yet
GMAT Grammar Rules - Complete List of GMAT Sentence Correction Rules
21 pages
NLP Sem QB
No ratings yet
NLP Sem QB
4 pages
CM3060 Past Paper September 2024
No ratings yet
CM3060 Past Paper September 2024
5 pages
Ut1 Notes
No ratings yet
Ut1 Notes
5 pages
KIR A1 Test Unit 1
No ratings yet
KIR A1 Test Unit 1
8 pages
English Grammar Topic - Conjunction
No ratings yet
English Grammar Topic - Conjunction
8 pages
AI ML Assessment Test
No ratings yet
AI ML Assessment Test
4 pages
English Exam for Primary Education
No ratings yet
English Exam for Primary Education
7 pages
NLP Comprehensive Study Guide Pokhara University Fall 2025
No ratings yet
NLP Comprehensive Study Guide Pokhara University Fall 2025
50 pages
Phase 2 - Sebastian Camilo Guerrero Acuña
No ratings yet
Phase 2 - Sebastian Camilo Guerrero Acuña
7 pages
German Subjunctive I, II
No ratings yet
German Subjunctive I, II
6 pages
Show Dont Tell
No ratings yet
Show Dont Tell
4 pages
Inglés 01 Mg. Nancy León Pereyra: I'm David Clark
No ratings yet
Inglés 01 Mg. Nancy León Pereyra: I'm David Clark
4 pages
Cl.7 U.3 L.2
No ratings yet
Cl.7 U.3 L.2
3 pages
Use of MLA: Batangas State University ARASOF-Nasugbu
No ratings yet
Use of MLA: Batangas State University ARASOF-Nasugbu
3 pages
Grade II Remedial Reading Plan
No ratings yet
Grade II Remedial Reading Plan
2 pages
ATIVIDADE DE INGLÊS IF CLAUSES - Type 1
No ratings yet
ATIVIDADE DE INGLÊS IF CLAUSES - Type 1
1 page
English Pronunciation and Spelling Guide
No ratings yet
English Pronunciation and Spelling Guide
9 pages
Spanish Daily Routines PBA Rubric
No ratings yet
Spanish Daily Routines PBA Rubric
4 pages

Practice Set NLP

Uploaded by

Practice Set NLP

Uploaded by

Practice Set- NLP

Intro to NLP and Challenges (1–4)

Preprocessing 1: Tokenization and Stopwords (5–8)

Preprocessing 2: Stemming and Lemmatization (9–12)

13.​What is morphological parsing? Provide an example of parsing the word "unhappiness"

Syntax and CKY Algorithm (16–19)

Language Modeling and N-grams (20–23)

24.​Why is smoothing necessary in n-gram models? Provide an example of a

POS Tagging and HMMs (27–30)

Viterbi Algorithm (31–32)

One-Hot Encoding and BoW (33–35)

Word2Vec and Semantics (38–40)

Sentiment, Classification, and Summarization (41–44)

Information Extraction and Question Answering (45–47)

45.​What is relation extraction? Provide an example of extracting a "works-at" relation from

Knowledge Base Documents:

○​ Doc A: {‘artificial’, ‘intelligence’, ‘systems’}

a) Identify the sentiment-bearing words in the tweet. [2 Marks]

b) For each, indicate whether it contributes to positive, negative, or neutral sentiment in

e) If VADER’s compound score is -0.15, classify the sentiment as positive, negative, or

S1: “AI improves efficiency in manufacturing.”

S2: “Tests were conducted across multiple factories.”

S3: “AI achieved 90% accuracy in quality control.”

S4: “Factories report cost savings with AI.”

You might also like

13.What is morphological parsing? Provide an example of parsing the word "unhappiness"

24.Why is smoothing necessary in n-gram models? Provide an example of a

45.What is relation extraction? Provide an example of extracting a "works-at" relation from

○ Doc A: {‘artificial’, ‘intelligence’, ‘systems’}