NLP
One mark Questions with answers
UNIT - I
1.) list out few applications of NLP.
• Question Answering
• spam detection
• machine translation
• speech correction
• Chatbot
• Speech recognition
2.) Components of NLP
• NLU (natural language understanding)
• NLG (natural language generation)
3.) Name five phases involved in NLP.
• lexical Analysis and morphological
• Syntactic Analysis
• semantic
• discourse integration
• pragmatic analysis
4.) Differentiate lexeme and lemma
Aspect Lexeme Lemma
The base unit of meaning; an abstract unit The dictionary form or canonical form
Definition
representing all inflected forms of a word. of a lexeme.
All the inflectional variants (e.g., walk, walks, A single standard form, typically used as
Represents
walked, walking). a headword in dictionaries.
Example Lexeme: RUN → run, runs, ran, running Lemma: run
Used in Linguistic analysis, corpus studies, NLP Dictionaries, NLP, morphological parsing
Nature Abstract and general Specific and representative
5.) Define Morphology
Morphology is the branch of linguistics that studies the structure and formation of words. It
analyzes how morphemes (the smallest units of meaning) combine to form words, including roots,
prefixes, and suffixes.
Example: In the word “unhappiness”, un- (prefix), happy (root), and -ness (suffix) are all morphem
6.) What is typology
Typology in linguistics is the study and classification of languages based on their structural features,
such as word order, sentence structure, or morphological patterns. It helps identify similarities and
differences among languages, regardless of their historical or genetic relationships.
Example: English follows SVO (Subject-Verb-Object) word order, while Hindi follows SOV (Subject-
Object-Verb).
7.) Mention about Fusional language
Fusional languages are defined by their feature-per-morpheme ratio higher than one (as in Arabic,
Czech, Latin, Sanskrit, German, etc.).
Ex: Word: Head
She nodded her head
She is the head of the department
check the head of the page
We should head back home now
The toothpaste came out of the head of the tube
8.) Features of NLTK
• Tokenization
• Lowercasing
• Removing stopwords
• Punctuation removal
• Stemming
• Lemmatization
• POS tagging
• Named Entity Recognition (NER)
9.) Define stemming
Stemming is the process of reducing a word to its base or root form, called a "stem.“
It helps group related words together so they can be analyzed as a single item, regardless of tense or
form.
Ex: Helping - help
studying - studi
flying - fli
helper - help
10.) Define Lemmatizing
Lemmatizing is the process of reducing a word to its lemma, or base form. Unlike stemming, it
produces a valid English word that makes sense on its own.
Stemming:
→ "caring" → "car" (not meaningful in context)
→ Fast but less accurate.
Lemmatizing:
→ "caring" → "care" (meaningful root word)
→ Slower but context-aware and grammatically correct.
11.) List out the libraries that are imported with respect to NLTK
import contractions
import nltk
import re
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
from nltk.stem import PorterStemmer, WordNetLemmatizer
from nltk import pos_tag
12.) Differentiate chunking and chinking
Chunking: the process of identifying and grouping phrases in a sentence — like noun phrases,
verb phrases, etc.
Chinking: Removes specific patterns within a chunk (like verbs or adverbs that don't belong)
13.) Define NER (Named Entity Recognition)
Process of identifying entities in the given sentence
Ex: Person names (e.g., Mahatma Gandhi)
Organizations (e.g., MRCET)
Locations (e.g., Hyderabad, India)
Dates (e.g., 20 June 2025)
Monetary values (e.g., ₹500, $1000)(AMOUNT MENTIONED IN TEXT)
Time, Percentages, Events, etc.
UNIT-II
1.) Define Parsing/Syntax Analysis.
A. the process of analyzing a sentence's grammatical structure according to the rules of a formal
grammar. It identifies the syntactic structure of a sentence and determines how the words relate to
each other.
2.) Applications of Syntactic analysis
• Grammar checking (e.g., Grammarly)
• Question answering systems
• Chatbots
• Machine translation
• Text summarization
3.) List out Approaches to Syntax Analysis.
Top-Down Parsing – Starts from the start symbol and tries to derive the sentence.
Bottom-Up Parsing – Builds the parse tree from the input up to the start symbol.
Chart Parsing – Uses dynamic programming to store intermediate parsing results.
Shift-Reduce Parsing – A bottom-up method using a stack to shift and reduce tokens.
Recursive Descent Parsing – A top-down parser using recursive functions for grammar rules.
Dependency Parsing – Focuses on word-to-word relations (head-dependent).
Constituency Parsing – Breaks sentences into phrase structures (like NP, VP).
Probabilistic Parsing – Uses probabilities to select the most likely parse tree.
4.) Define Treebanks
Treebanks are annotated text corpora that include syntactic or grammatical structure (usually in
the form of parse trees) for each sentence. They are used in Natural Language Processing (NLP) and
linguistics to train and evaluate parsers and grammar models.
Example: A sentence like "The cat sat on the mat." would be annotated to show how words group
into phrases (like noun phrases and verb phrases).
5.) Types of Syntax trees and what are they?
There are two main types of syntax trees in linguistics:
1. Constituency Tree (Phrase Structure Tree):
Shows how words group into phrases (like noun phrases or verb phrases) based on grammar
rules.
Example: [NP The cat] [VP sat [PP on [NP the mat]]]
2. Dependency Tree:
Shows word-to-word relationships, where one word (the "head") governs the others (its
"dependents").
Example: In "The cat sat," "sat" is the main verb, and "cat" is its subject dependent.
These trees help analyze sentence structure and grammatical relationships.
6.) Uses of Treebanks.
• Training parsers (e.g., probabilistic context-free grammar parsers, neural parsers)
• Evaluating parsing algorithms
• Linguistic research
• Building tools for translation
• sentiment analysis, etc.
7.) Write about data driven approach
A data-driven approach in linguistics and NLP relies on large annotated datasets (corpora) to learn
patterns and make predictions. Instead of using fixed grammar rules, this approach uses statistical
models or machine learning algorithms trained on real language data.
Example: A machine translation system trained on parallel corpora learns how to translate based on
patterns in the data, not predefined rules.
8.) Define dependency graph
A. A Dependency Graph is defined as how words in a sentence are connected based on their
grammar roles.
Ex:"Don't drink and drive.“
9.) Where do dependency graph is used.
• A. NLP parsers (like spaCy, Stanford NLP)
• Grammar checking tools
• Machine translation
• Information extraction
10.) List out the tools used to build Phrase structure trees.
• NLTK (Natural Language Toolkit) — Python
• Stanford Parser / CoreNLP
• spaCy + Benepar (Berkeley Neural Parser)
• RSyntaxTree (Web GUI Tool)
• SyntaxNet
11.) Write about types of Parsing algorithms.
• Shift-Reduce Parsing
• Chart Parsing (CYK Algorithm)
• Hypergraph-based Parsing
12.) Define Hypergraph.
hypergraph is a type of graph in which an edge, called a hyperedge, can connect more than two
vertices. It is used to represent multi-way relationships between elements.
Vertices: A, B, C, D
A B C D
●-------●-------●
\ | /
\_____|_____/
Hyperedge E1
13.) Write about Probabilistic Context-free Grammer.
A. Probabilistic Context-Free Grammar (PCFG) is an extension of CFG (Context-Free Grammar) where:
• Each production rule has an associated probability.
• These probabilities help choose the most likely parse tree when a sentence has multiple
possible meanings.
14.) List out Types of Generative models.
• PCFG (Probabilistic Context-Free Grammar)
• Lexicalized PCFG
• Generative Neural Parsers
• Data-Oriented Parsing (DOP)
• Bayesian Generative Models
• Stochastic Tree-Substitution Grammars (STSG)
• Generative Dependency Parsers
• Minimalist Grammars (generative, theoretical)
15.) What are the advantages of Discriminative models for parsing.
• Can use rich and overlapping features (lexical, syntactic, semantic).
• Do not take own decisions
• Provides higher accuracy parsing
UNIT-III
1.) How many types of n-gram models are there. What are they?
Types of N-Gram Model
• Unigram
• Bigram
• Trigram
• Higher-order N-gram Models
2.) What is the purpose of language model evaluation?
• The accuracy of word predictions
• The fluency and naturalness of generated text
• How well the model captures language structure and meaning
3.) Define perplexity.
Perplexity is a measurement of how well a language model predicts a sequence of words.
It tells user how “confused” or “surprised” the model is when it sees the actual text.
4.) Types of Smoothing techniques.
• Add-One (Laplace) Smoothing
• Add-k Smoothing
• Good-Turing Discounting
• Backoff and Interpolation
5.) Describe the role of smoothing in N-gram models. Why is it necessary?
Answer:
Smoothing helps when some N-grams in the test sentence do not appear in the training corpus,
resulting in zero probabilities.
Example: If "I enjoy mango" never appeared in training, then:
P("mango" | "enjoy") = 0 → Whole sentence probability = 0
Solution:
Laplace Smoothing: Adds 1 to all counts to avoid zeros.
Backoff Models: Fall back to smaller N-grams if higher ones are missing.
Smoothing ensures the model assigns non-zero probabilities to unseen sequences.
6) What are the limitations of N-gram models and how can they be addressed?:
Limitations:
Data sparsity: Many possible word sequences may not appear in training data.
Limited context: N-gram models only look at a few previous words.
High memory: Storing large N-gram tables is resource-heavy.
Solutions:
Smoothing: Adjusts probabilities of unseen N-grams (e.g., Laplace Smoothing).
Backoff and Interpolation: Uses lower-order N-grams when higher-order ones are
unavailable.