1. Define the main focus of Natural Language Processing.
a) Image recognition
b) Signal processing
c) Interaction between computers and human language
d) Circuit design
2. Describe the two broad categories of NLP.
a) Symbolic and Analog
b) Rule-based and Statistical
c) Linear and Nonlinear
d) Sequential and Parallel
3. Identify which component deals with sentence meaning.
a) Syntax
b) Semantics
c) Morphology
d) Phonology
4. Classify the applications of NLP.
a) Data mining, Sorting
b) Machine translation, Chatbots, Sentiment analysis
c) Hardware optimization, Storage
d) Circuit evaluation, Compiling
5. Examine which is a subfield of NLP.
a) Compiler design
b) Information Retrieval
c) Operating systems
d) Database indexing
6. Locate the earliest milestone in NLP history.
a) Google Translate
b) ELIZA (1966)
c) Siri
d) Alexa
7. Recall the first stage of NLP pipeline.
a) Lexical analysis
b) Syntax analysis
c) Semantic analysis
d) Pragmatics
8. Enumerate challenges of NLP.
a) Ambiguity, Context, Sarcasm
b) Sorting, Searching, Indexing
c) Multiplication, Addition
d) Compiling, Linking
9. Identify the stage that checks grammar.
a) Lexical analysis
b) Syntax analysis
c) Pragmatics
d) Information retrieval
10.Distinguish between syntactic and semantic analysis.
a) Syntax deals with meaning, semantics with structure
b) Syntax deals with structure, semantics with meaning
c) Both deal with phonetics
d) Both are about speech recognition
11.Classify the biggest challenge in NLP.
a) Large memory
b) Ambiguity
c) Parallel computation
d) Indexing speed
12.Explain the role of pragmatics.
a) Meaning of individual words
b) Meaning in context of conversation
c) Sound recognition
d) Data mining
13.Define regular expression.
a) Random text
b) A sequence of characters defining a search pattern
c) Binary search tree
d) Language compiler
14.Identify which symbol matches zero or more repetitions.
a) +
b) ?
c) *
d) ^
15.Match the symbol with its use: “^”.
a) End of string
b) Any digit
c) Whitespace
d) Start of string
16.Recall the regex for matching digits.
a) [a-z]
b) \s
c) \d
d) \w
17.Compare greedy vs non-greedy matching.
a) Greedy takes longest match, non-greedy shortest match
b) Both take same length
c) Greedy is faster
d) Non-greedy ignores regex rules
18.Examine practical use of regex.
a) Compiler optimization
b) Email validation
c) Machine learning training
d) File compression
19.Define text normalization.
a) Transforming text into standard format
b) Compressing text
c) Encrypting text
d) Tokenizing text
20.Identify which is not part of normalization.
a) Encryption
b) Lowercasing
c) Removing punctuation
d) Expanding contractions
21.Describe stemming.
a) Removing suffixes/prefixes to reach root form
b) Converting to lowercase
c) Adding tokens
d) Encoding
22.Distinguish stemming from lemmatization.
a) Both return random roots
b) Lemmatization uses dictionary, stemming cuts off suffixes
c) Lemmatization is faster
d) Stemming uses POS tags
23.Recall the step applied before tokenization.
a) Parsing
b) Cleaning text (punctuation removal, lowercasing)
c) Compiling
d) POS tagging
24.Explain why normalization is necessary.
a) To make text encrypted
b) To reduce file size
c) To make text consistent for processing
d) To identify stopwords only
25.Define minimum edit distance.
a) Number of operations to convert one word into another
b) Number of sentences in a paragraph
c) Steps in parsing
d) Syllables in speech
26.Identify the three operations in edit distance.
a) Merge, Delete, Sort
b) Insert, Delete, Substitute
c) Copy, Replace, Divide
d) Tokenize, Encode, Decode
27.Recall the edit distance between “kitten” and “sitting”.
a) 2
b) 3
c) 4
d) 1
28.Describe the algorithm commonly used.
a) Merge Sort
b) Quick Sort
c) Dynamic Programming (Wagner-Fischer)
d) BFS
29.Distinguish Levenshtein distance from Hamming distance.
a) Both require equal length strings
b) Hamming is for equal-length strings only, Levenshtein allows different lengths
c) Levenshtein is faster
d) Hamming allows insertions
30.Explain application of edit distance.
a) POS tagging
b) Parsing
c) Spell correction
d) Tokenization
31.Define an n-gram.
a) Random set of n tokens
b) Sequence of n words
c) Sequence of n characters
d) Sentence structure
32.Identify bigram model.
a) Probability of word given previous word
b) Probability of sentence length
c) Word embedding method
d) Grammar parser
33.Recall unigram model assumption.
a) Words occur independently
b) Words depend on previous two words
c) Words are random noise
d) Word order is preserved
34.Compare trigram vs bigram.
a) Trigram considers two previous words, bigram one
b) Trigram is faster
c) Both ignore history
d) Bigram uses three words
35.Examine main problem of n-grams.
a) Tokenization
b) Data sparsity
c) Large vocabulary
d) Lowercasing
36.Classify the type of model n-grams belong to.
a) Neural models
b) Statistical models
c) Rule-based models
d) Machine translation
37. Identify the main evaluation metric.
a) Perplexity
b) Accuracy
c) Recall
d) BLEU
38.Describe held-out test data.
a) Data used for training
b) Data kept aside for evaluation
c) Validation set
d) Augmented data
39.Recall the purpose of cross-validation.
a) Reduce vocabulary size
b) Ensure generalization
c) Improve syntax
d) Normalize text
40.Distinguish intrinsic vs extrinsic evaluation.
a) Intrinsic: direct measure of model; Extrinsic: task-based
b) Both are task-based
c) Intrinsic uses BLEU
d) Extrinsic ignores accuracy
41.Explain why log probability is used.
a) To speed up compilation
b) To avoid underflow and simplify multiplication
c) To reduce grammar rules
d) To create embeddings
42.Examine application of BLEU score.
a) Sentiment analysis
b) Machine translation
c) Speech tagging
d) Syntax checking
43.Identify the problem of zeros in n-grams.
a) Negative probabilities
b) Unseen events get probability zero
c) Overflow in computation
d) Division by zero
44.Recall why generalization is needed.
a) To reduce file size
b) To avoid ambiguity
c) To assign probabilities to unseen words/sequences
d) To improve tokenization
45.Compare open vs closed vocabulary.
a) Both handle infinite words
b) Closed has fixed vocabulary, open allows unseen words
c) Open ignores OOV
d) Closed allows infinite
46.Describe the solution for unseen words.
a) Drop them
b) Introduce unknown (UNK) token
c) Ignore them
d) Encode them
47.Distinguish OOV problem from ambiguity.
a) OOV: unseen word; Ambiguity: multiple meanings
b) Both are same
c) OOV deals with multiple senses
d) Ambiguity deals with spelling errors
48.Explain why zero probabilities are harmful.
a) They improve speed
b) They make sentence probability zero
c) They reduce perplexity
d) They simplify models
49.Define smoothing.
a) Technique to handle zero probabilities
b) Removing stopwords
c) Lowercasing text
d) Tokenizing text
50.Identify a simple smoothing method.
a) Add-one (Laplace) smoothing
b) Regex
c) POS tagging
d) Parsing
51.Compare Laplace vs Good-Turing.
a) Both same
b) Good-Turing estimates probability of unseen events better
c) Laplace is advanced
d) Good-Turing ignores unseen events
52.Recall the problem with add-one smoothing.
a) Too fast
b) Overestimates unseen events
c) Ignores seen events
d) Reduces vocabulary
53.Describe backoff smoothing.
a) Uses lower-order n-grams when higher-order is unavailable
b) Ignores unseen words
c) Only uses unigrams
d) Normalizes text
54.Distinguish interpolation from backoff.
a) Both drop higher n-grams
b) Interpolation combines probabilities; Backoff falls back
c) Both are same
d) Backoff is faster
55.Define perplexity.
a) Random guessing
b) Measure of how well a model predicts test data
c) Grammar rule
d) Probability of sentence length
56.Identify relation between perplexity and entropy.
a) Perplexity = 2^(Entropy)
b) Entropy = Perplexity²
c) Both are unrelated
d) Perplexity = Entropy/2
57.Recall lower perplexity means.
a) Worse model
b) Better predictive model
c) Random model
d) No effect
58.Explain entropy in NLP.
a) Word embeddings
b) Average information content per word
c) Tokenization
d) Syntax rule
59.Distinguish perplexity from accuracy.
a) Accuracy is probabilistic, perplexity is binary
b) Accuracy measures correctness, perplexity measures uncertainty
c) Both same
d) Perplexity uses F1-score
60.Describe why perplexity is exponential.
a) To simplify
b) Because it is derived from entropy measured in bits
c) To normalize data
d) To reduce vocabulary
61.Define morphology in NLP.
a) Syntax analysis
b) Study of word structure and formation
c) Sentence meaning
d) Pragmatics
62.Identify the smallest unit of meaning.
a) Phoneme
b) Morpheme
c) Grapheme
d) Token
63.Classify “unhappiness” into morphemes.
a) un + happy + ness
b) unhappy + ness
c) un + happiness
d) happiness
64.Distinguish inflectional morphemes from derivational.
a) Both change meaning
b) Inflection changes tense/number; derivation changes category/meaning
c) Derivational changes tense only
d) Inflectional creates new words
65.Describe the type of morphology in English.
a) Agglutinative
b) Inflectional
c) Polysynthetic
d) Isolating
66.Recall example of an inflectional suffix.
a) un-
b) -ed
c) re-
d) mis-
67.Identify the word class of “quickly”.
a) Adjective
b) Adverb
c) Noun
d) Pronoun
68.Define open word classes.
a) Closed set of function words
b) Classes that accept new members (nouns, verbs, adjectives, adverbs)
c) Classes that never change
d) Prepositions only
69.Recall which is a closed class.
a) Verb
b) Preposition
c) Adjective
d) Adverb
70.Classify “the” in word class.
a) Verb
b) Determiner
c) Adjective
d) Pronoun
71.Distinguish noun vs pronoun.
a) Both are identical
b) Noun names things; pronoun replaces noun
c) Pronoun is descriptive
d) Noun is functional
72.Describe interjections.
a) Complex phrases
b) Exclamatory expressions (Oh!, Wow!)
c) Helping verbs
d) Closed class
73.Define POS tagging.
a) Tokenizing text
b) Assigning word classes to tokens
c) Removing stopwords
d) Normalizing text
74.Identify the POS tag for “run” in “I will run fast”.
a) Noun
b) Verb
c) Adjective
d) Adverb
75.Describe rule-based POS tagging.
a) Uses probabilities
b) Uses handcrafted grammar rules
c) Uses embeddings
d) Uses CRFs
76.Recall the Penn Treebank tag for plural noun.
a) NN
b) NNS
c) VB
d) JJ
77.Distinguish supervised from unsupervised tagging.
a) Both require labeled data
b) Supervised uses labeled corpora; unsupervised uses clustering
c) Unsupervised is faster always
d) Both use rules only
78.Examine application of POS tagging.
a) Speech synthesis
b) Parsing and information extraction
c) Image recognition
d) Sorting words
79.Define HMM.
a) Statistical model with hidden states and observed outputs
b) Rule-based grammar model
c) Embedding model
d) Parsing algorithm
80.Identify hidden states in POS tagging.
a) Words
b) POS tags
c) Sentences
d) Morphemes
81.Recall observable sequence in HMM tagging.
a) Words in a sentence
b) POS tags
c) Morphemes
d) Syntax tree
82.Describe transition probabilities.
a) Probability of tag given previous tag
b) Probability of word given tag
c) Probability of morpheme
d) Probability of sentence length
83.Distinguish emission vs transition.
a) Both same
b) Emission: word given tag; Transition: tag given previous tag
c) Transition is word-based
d) Emission ignores probabilities
84.Explain limitation of HMM in tagging.
a) Always accurate
b) Cannot handle long dependencies well
c) Ignores syntax
d) Uses neural networks
85.Define Viterbi algorithm.
a) Sorting method
b) Dynamic programming algorithm for most probable sequence
c) Neural embedding method
d) Parsing algorithm
86.Identify what Viterbi computes in POS tagging.
a) Lexicon
b) Best sequence of tags
c) Syntax tree
d) Lemmas
87.Recall Viterbi initialization step.
a) Probability = 1 for all tags
b) Start probabilities assigned to first word
c) Transition matrix only
d) Zero for all
88.Distinguish forward vs Viterbi algorithm.
a) Both same
b) Forward sums probabilities; Viterbi chooses maximum
c) Forward ignores states
d) Viterbi ignores probabilities
89.Describe backtracking in Viterbi.
a) Recovering best tag sequence
b) Building syntax tree
c) Tokenizing
d) Counting words
90.Examine time complexity of Viterbi.
a) O(n)
b) O(n × T²) (n = words, T = tags)
c) O(T^n)
d) O(1)
91.Define Named Entity Recognition (NER).
a) Identifying proper nouns like person, location, organization
b) Tokenization
c) POS tagging
d) Parsing
92.Identify the entity in “Google was founded in California”.
a) Founded
b) Google = Organization, California = Location
c) Organization only
d) Action word
93.Recall common NER categories.
a) Pronoun, Verb, Adjective
b) Person, Location, Organization, Date
c) Root, Stem, Affix
d) Syntax, Pragmatics
94.Distinguish NER from POS tagging.
a) Both same
b) NER detects named entities; POS tags word classes
c) POS is for parsing
d) NER ignores text
95.Describe BIO tagging scheme.
a) Bigram model
b) Begin-Inside-Outside notation for entities
c) Binary index operator
d) Bag-of-words
96.Explain application of NER.
a) Information extraction in text (e.g., news, resumes)
b) Syntax analysis
c) Tokenization
d) Lowercasing
97.Define CRFs.
a) Neural networks
b) Probabilistic sequence models discriminatively trained
c) Rule-based grammar
d) Embedding models
98.Identify difference between HMM and CRF.
a) Both are generative
b) HMM is generative; CRF is discriminative
c) Both discriminative
d) HMM ignores probabilities
99.Recall why CRFs are better for NER.
a) Faster
b) They capture overlapping, global features
c) Use fewer labels
d) No probabilities needed
100. Describe feature function in CRF.
a) Maps input sequence and label sequence to real values
b) Tokenizes text
c) Embeds words
d) Parses grammar
101. Distinguish linear-chain CRF.
a) Specialized for sequential data like text
b) Ignores sequence order
c) Used for parsing trees
d) Random clustering
102. Explain training challenge of CRF.
a) Easy optimization
b) High computational cost
c) Small data requirement
d) No labeling needed
103. Define the standard metrics for NER evaluation.
a) BLEU, Perplexity
b) Precision, Recall, F1-score
c) Accuracy only
d) Word error rate
104. Identify what precision measures.
a) Correct entities out of predicted entities
b) Correct entities out of total entities
c) Predicted entities out of all tokens
d) Errors in tagging
105. Recall recall formula.
a) TP / (TP+FP)
b) TP / (TP+FN)
c) FP / (TP+FN)
d) FN / (TP+FP)
106. Distinguish micro vs macro evaluation.
a) Micro averages over all instances; Macro averages over classes
b) Both same
c) Micro ignores recall
d) Macro ignores precision
107. Describe effect of high recall but low precision.
a) Few entities detected
b) Many false positives included
c) Many entities missed
d) Perfect accuracy
108. Explain CoNLL evaluation metric.
a) F1 score for entity-level evaluation
b) Word error rate
c) BLEU score
d) Entropy
109.