NLP Assignment 1
Units Covered: I – Classical Approaches of NLP, II – Empirical & Statistical Approaches
Total Marks: 20
Instructions:
• Answer all questions.
• Write neatly and label all diagrams.
• Show all steps in numerical questions.
• Use examples wherever necessary.
Section A: Descriptive & Conceptual (6 Marks)
Q1. Explain the four major levels of linguistic analysis used in classical NLP:
• Morphology
• Syntax
• Semantics
• Pragmatics
Provide one example sentence to illustrate all four levels.
Q2. Discuss three real-life applications of classical NLP. For each, mention which linguistic
levels are most relevant and why.
Q3. What are the major challenges of ambiguity in natural language processing? Explain
lexical ambiguity and syntactic ambiguity with examples.
Section B: Comparative & Short Notes (4 Marks)
Q4. Write short notes (100–120 words) on any two of the following:
a) Corpus Linguistics
b) Rule-based vs Statistical NLP
c) Need for Treebanks
d) Lexical Resources and Annotation
Q5. Differentiate between the following (Any two – 1 mark each):
a) Tokenization vs Segmentation
b) Hidden Markov Model vs Conditional Random Field
c) Supervised vs Unsupervised POS tagging
d) Shallow Parsing vs Deep Parsing
Section C: Numerical & Analytical (6 Marks)
Q6. Given the following corpus:
"He loves NLP. She learns NLP. NLP is fun."
Build a bigram probability table (only for the bigrams including “NLP”) using add-one
smoothing.
Calculate the probability of the phrase:
“She learns NLP”
Q7. Apply the Viterbi Algorithm to perform POS tagging on the sentence:
"Time flies like an arrow."
Assume a simplified tag set: {Noun, Verb, Det, Prep} and assign assumed probabilities for
emissions and transitions.
(State assumptions clearly.)
Q8. Consider the word: “unbelievable”
a) Perform morphological analysis (root + affixes)
b) Show how it might be stemmed and lemmatized.
c) Mention if stemming or lemmatization is more suitable for sentiment analysis and why.
Section D: Diagram & Case-Based (4 Marks)
Q9.
a) Draw the architecture of POS tagging using HMM with state transitions and
observation emissions.
b) Label each part of the model and describe what the transition and emission
probabilities represent.
Q10. Case Study: Consider an NLP-based Question Answering system.
a) Describe which types of corpora and annotations are necessary for building such a
system.
b) Draw a simple system architecture to show how user input gets processed.