1.
Context-Free Grammars (CFGs)
Overview:
Context-Free Grammars (CFGs) are a fundamental concept in computational linguistics and
formal language theory, used extensively in Natural Language Processing (NLP) to model the
syntactic structure of languages. A CFG is a type of formal grammar that consists of a set of
production rules used to generate all possible strings in a language.
A CFG is defined by a 4-tuple (N,Σ,R,S)(N, \Sigma, R, S)(N,Σ,R,S):
NNN: A finite set of non-terminal symbols (syntactic categories like NP, VP, S)
Σ\SigmaΣ: A finite set of terminal symbols (the actual words or tokens)
RRR: A set of production rules, each of the form A→αA \rightarrow \alphaA→α, where
A∈NA \in NA∈N and α∈(N∪Σ)∗\alpha \in (N \cup \Sigma)^*α∈(N∪Σ)∗
S∈NS \in NS∈N: The start symbol, representing a sentence or root node
How CFG Works:
CFGs allow recursive rules. For example, a sentence SSS might be composed of a noun phrase
NPNPNP followed by a verb phrase VPVPVP, and NPNPNP itself could contain other non-
terminals or terminals. This recursive property enables CFGs to capture the hierarchical nature
of language syntax.
Example:
Consider a simplified English grammar:
S→NP VPS \rightarrow NP \ VPS→NP VP
NP→Det NNP \rightarrow Det \ NNP→Det N
VP→V NPVP \rightarrow V \ NPVP→V NP
Det→"the" ∣ "a"Det \rightarrow \text{"the"} \ | \ \text{"a"}Det→"the" ∣ "a"
N→"dog" ∣ "cat"N \rightarrow \text{"dog"} \ | \ \text{"cat"}N→"dog" ∣ "cat"
V→"chased" ∣ "saw"V \rightarrow \text{"chased"} \ | \ \text{"saw"}V→"chased" ∣ "saw"
Using these rules, the sentence “the dog chased a cat” can be generated:
1. S→NP VPS \rightarrow NP \ VPS→NP VP
2. NP→Det N→"the" "dog"NP \rightarrow Det \ N \rightarrow \text{"the"} \ \
text{"dog"}NP→Det N→"the" "dog"
3. VP→V NP→"chased" NPVP \rightarrow V \ NP \rightarrow \text{"chased"} \
NPVP→V NP→"chased" NP
4. NP→Det N→"a" "cat"NP \rightarrow Det \ N \rightarrow \text{"a"} \ \
text{"cat"}NP→Det N→"a" "cat"
Thus, the CFG provides a formal mechanism to generate syntactically valid sentences and
parse them.
Applications:
Parsing natural language sentences
Syntax checking in compilers
Defining programming languages syntax
2. Grammar Rules for English
Overview:
English grammar rules describe how words combine to form phrases and sentences. Unlike
programming languages, natural language grammar is complex and often ambiguous.
Grammar rules include syntax (sentence structure), morphology (word forms), and semantics
(meaning), but here we focus mainly on syntactic rules.
Phrase Structure Rules:
These rules describe valid sentence structure. The typical English sentence can be broken
down as:
Sentence (S) → Noun Phrase (NP) + Verb Phrase (VP)
NP → (Determiner) + (Adjective)* + Noun + (Prepositional Phrase)*
VP → Verb + (NP) + (Prepositional Phrase)* + (Adverb)*
Prepositional Phrase (PP) → Preposition + NP
Example Sentence:
"The quick brown fox jumps over the lazy dog."
S → NP + VP
NP → Det + Adj + Adj + Noun → "The" + "quick" + "brown" + "fox"
VP → Verb + PP → "jumps" + "over the lazy dog"
PP → Preposition + NP → "over" + ("the" + "lazy" + "dog")
Transformational Rules:
English also has transformational grammar, where deep structures can be transformed into
surface structures (questions, passive voice, etc.). For example:
Declarative: "She is eating an apple."
Interrogative: "Is she eating an apple?"
Here, a transformational rule moves the auxiliary verb "is" before the subject.
Complexities:
English grammar also involves agreement rules (subject-verb agreement), tense, aspect,
modality, and syntactic phenomena like coordination ("and", "or"), subordination (relative
clauses), etc.
Example Rule Set for a Simple Fragment:
S → NP VP
NP → Pronoun | Det N | Det Adj N
VP → V NP | V
Det → "the" | "a"
Adj → "quick" | "lazy"
N → "fox" | "dog"
V → "jumps" | "runs" | "sees"
3. Treebanks
Overview:
Treebanks are large corpora of sentences annotated with their syntactic (and sometimes
semantic) structure. They provide parsed trees that show how each sentence can be broken
down into constituents according to a grammar. Treebanks are essential resources for training
and evaluating NLP models, especially syntactic parsers.
Types of Treebanks:
Constituency Treebanks: Annotate phrase structures using CFG-like trees. Each node
is a phrase or syntactic category. Example: Penn Treebank (widely used for English).
Dependency Treebanks: Annotate relationships between words (head-dependent),
showing which word governs another.
Example from Penn Treebank:
Sentence: "The dog chased the cat."
Constituency tree:
(S
(NP (DT The) (NN dog))
(VP (VBD chased)
(NP (DT the) (NN cat))))
This shows the hierarchical phrase structure where the sentence SSS consists of an NP and a
VP.
Utility:
Training parsers
Linguistic analysis
Benchmarking NLP tools
Challenges:
Treebank creation is labor-intensive and requires expert linguists. Different treebanks may
follow different annotation guidelines.
4. Normal Forms for Grammar
Overview:
Normal forms are standardized ways of writing grammar rules to simplify parsing and
analysis. For CFGs, the two most common normal forms are Chomsky Normal Form (CNF) and
Greibach Normal Form (GNF).
Chomsky Normal Form (CNF):
Every rule is either of the form:
o A→BCA \rightarrow BCA→BC, where BBB and CCC are non-terminals, or
o A→aA \rightarrow aA→a, where aaa is a terminal.
No epsilon (ϵ\epsilonϵ) productions or unit productions (except for start symbol).
Why use CNF?
Facilitates efficient parsing algorithms like the CYK parser.
Simplifies theoretical proofs.
Example:
Original rules:
S→AB∣BCS \rightarrow AB | BCS→AB∣BC
A→BA∣aA \rightarrow BA | aA→BA∣a
B→CC∣bB \rightarrow CC | bB→CC∣b
C→AB∣aC \rightarrow AB | aC→AB∣a
These rules are in CNF format because the right side has either two non-terminals or one
terminal.
Greibach Normal Form (GNF):
Every rule is of the form A→aαA \rightarrow a \alphaA→aα, where aaa is a terminal and
α\alphaα is a (possibly empty) string of non-terminals.
Useful for top-down parsing.
5. Dependency Grammar
Overview:
Dependency Grammar is an alternative to constituency grammar. Instead of breaking
sentences into nested phrases, dependency grammar focuses on binary relations between
words: one word (the head) governs another (the dependent).
Key Concepts:
Head: The main word that governs others (e.g., the verb in a sentence).
Dependent: Words that depend on the head (subjects, objects, modifiers).
Dependency Relation: Labeled edges that describe grammatical roles (subject,
object, modifier).
Example:
For the sentence "The dog chased the cat":
"chased" is the head (root)
"dog" is the subject dependent of "chased"
"cat" is the object dependent of "chased"
"The" modifies both "dog" and "cat"
Graphically:
chased
/ \
dog cat
| |
The The
Advantages:
More direct representation of syntactic relations
Useful for languages with free word order
Easier integration with semantic roles and dependency parsing
6. Syntactic Parsing
Definition and Purpose:
Syntactic parsing, also called syntactic analysis, is the process by which a sentence is
analyzed to determine its grammatical structure with respect to a given formal grammar,
often a Context-Free Grammar (CFG). The goal is to identify how the words in a sentence are
related syntactically, revealing the hierarchical structure of phrases or the dependency
relations between words.
Parsing is a critical step in many NLP applications because it provides a structural
understanding of the sentence that helps in semantic interpretation, machine translation,
question answering, and more.
Types of Parsing:
Constituency Parsing (Phrase-Structure Parsing):
In constituency parsing, the parser builds a tree that groups words into nested constituents or
phrases based on grammatical categories (e.g., noun phrase, verb phrase). The output is a
tree where internal nodes represent syntactic categories, and leaf nodes are the actual words.
Dependency Parsing:
Dependency parsing focuses on the binary relations between words, defining a directed graph
that represents “head-dependent” relationships. Each word (except the root) depends on
exactly one head word, and the structure expresses grammatical functions such as subject,
object, modifier, etc.
Parsing Techniques:
Top-Down Parsing:
Starts with the start symbol (e.g., S for sentence) and tries to rewrite it to produce the input
sentence. This approach uses the grammar rules to expand non-terminals, predicting what
the sentence structure should look like before matching the words.
Bottom-Up Parsing:
Starts from the input tokens (words) and attempts to combine them into higher-level
constituents until reaching the start symbol. It’s data-driven, building structure based on
actual input rather than prediction.
Chart Parsing (Dynamic Programming):
Algorithms like the Earley parser and CYK parser efficiently handle ambiguity and recursion
using dynamic programming techniques. They store partial results in a "chart" to avoid
redundant computation.
Example:
Consider the sentence:
“The dog chased the cat.”
Step 1: Lexical Analysis
Words and their parts of speech (POS):
The (Det)
dog (Noun)
chased (Verb)
the (Det)
cat (Noun)
Step 2: Parsing (Constituency)
Using simplified grammar rules:
S→NP VPS \rightarrow NP \ VPS→NP VP
NP→Det NNP \rightarrow Det \ NNP→Det N
VP→V NPVP \rightarrow V \ NPVP→V NP
Parsing tree:
S
/\
NP VP
/\ /\
Det N V NP
| | | /\
The dog chased Det N
| |
the cat
This tree shows the hierarchical grouping: "The dog" is a noun phrase, "chased the cat" is a
verb phrase composed of a verb and a noun phrase.
Dependency Parsing Example:
Dependency tree for the same sentence:
chased
/ \
dog cat
| |
The The
"chased" is the root verb.
"dog" is the subject (dependent) of "chased".
"cat" is the object (dependent) of "chased".
Each determiner ("The") modifies its noun.
Parsing Applications:
Machine Translation: Helps map structures between source and target languages.
Information Extraction: Understanding syntactic roles enables better extraction of
entities and relations.
Question Answering: Parses questions to understand focus and expected answer
type.
Grammar Checking: Identifies syntactic errors.
7. Ambiguity
Overview:
Ambiguity is a fundamental challenge in natural language because a single sentence can
have multiple valid interpretations. It is one of the reasons why natural language processing is
difficult and why computers struggle to "understand" human languages.
Ambiguity can occur at different levels:
Lexical Ambiguity: A single word has multiple meanings.
Syntactic Ambiguity: A sentence structure allows more than one possible parse.
Semantic Ambiguity: Multiple meanings arise from sentence interpretation.
Syntactic Ambiguity in Detail:
Syntactic ambiguity happens when a sentence can be parsed in multiple ways because the
grammar allows different structures.
Example:
“I saw the man with the telescope.”
Two interpretations:
1. I used the telescope to see the man.
Here, the prepositional phrase "with the telescope" modifies the verb "saw."
(I saw [the man] [with the telescope])
2. The man has the telescope.
Here, the prepositional phrase "with the telescope" modifies the noun "man."
(I saw [the man with the telescope])
Parse Tree Example:
Interpretation 1:
S
/\
NP VP
| / \
I V PP
| / \
saw P NP
| /\
with Det N
| |
the telescope
Interpretation 2:
S
/\
NP VP
| |
I V NP
| /|\
saw Det N PP
| | / \
the man P NP
| |
with telescope
Resolving Ambiguity:
Ambiguity can be tackled using:
Probabilistic Parsing: Assign probabilities to parse trees based on large annotated
corpora (treebanks). The parser chooses the most likely parse.
Contextual Information: Semantic and pragmatic context can disambiguate
interpretations.
Semantic Role Labeling: Identifies roles (agent, instrument, etc.) to clarify
ambiguous attachments.
Lexical Ambiguity Example:
Word: “bank”
A financial institution
The side of a river
Sentence: “He sat by the bank.”
Without context, the meaning is ambiguous.
Summary of Ambiguity Challenges:
Ambiguity complicates parsing and downstream tasks.
Multiple parses may need to be maintained or ranked.
Requires sophisticated algorithms combining syntax, semantics, and context.
8. Dynamic Programming Parsing
Overview:
Dynamic programming (DP) parsing is used to optimize parsing algorithms by avoiding
redundant sub-computations. It is especially effective when dealing with ambiguous
grammars and long sentences. The central idea is to store and reuse intermediate parsing
results.
Example Algorithm: CYK (Cocke–Younger–Kasami)
Requires the grammar to be in Chomsky Normal Form (CNF).
Parses a sentence in O(n³) time, where n is the sentence length.
Steps:
For input: “the dog barks”
Grammar in CNF:
S → NP VP
VP → V
NP → Det N
Det → the
N → dog
V → barks
CYK Matrix:
0- 1- 2-
1 2 3
Row De
N V
0 t
Row
NP
1
0- 1- 2-
1 2 3
Row
VP
2
Row
S
3
DP stores combinations like:
NP from Det+N
VP from V
S from NP+VP
Benefits:
Efficient for parsing with ambiguity.
Foundation for Probabilistic CYK.
9. Shallow Parsing (Chunking)
❖ What is Shallow Parsing?
Shallow parsing, also called chunking, is the process of identifying and labeling constituent
parts of a sentence such as noun phrases (NP), verb phrases (VP), or prepositional phrases
(PP) without building full parse trees.
Unlike deep (full) parsing, which analyzes hierarchical syntactic structure (e.g., nested
phrases), shallow parsing gives only surface-level groupings.
❖ Why is Shallow Parsing Useful?
Faster and less complex than full parsing.
Useful in:
o Named Entity Recognition (NER)
o Information extraction
o Question answering
o POS tagging and chunking
❖ Example:
Consider the sentence:
"The quick brown fox jumps over the lazy dog."
POS Tags:
The/DT quick/JJ brown/JJ fox/NN jumps/VBZ over/IN the/DT lazy/JJ dog/NN
Shallow Parse Output:
[NP The quick brown fox] [VP jumps] [PP over] [NP the lazy dog]
We can chunk:
NP: Determiner + adjectives + noun
VP: Verb (and possibly object NP)
PP: Preposition + NP
This gives a partial syntactic view — no internal decomposition like nested NPs.
❖ Tools:
NLTK (with regular expression parsers)
spaCy and OpenNLP
CRFs, BiLSTMs, Transformers in modern systems
10. Probabilistic Context-Free Grammars (PCFGs)
❖ What is a PCFG?
A Probabilistic Context-Free Grammar (PCFG) extends a CFG by attaching a probability
to each rule. It models how likely a rule is used in real language.
This is vital when multiple parses exist — PCFGs allow choosing the most likely parse.
❖ Rule Format:
A → B C [p]
Where p is the probability, and the sum of all rules expanding A = 1.
❖ Example Grammar:
S → NP VP [1.0]
VP → V NP [0.6]
VP → V [0.4]
NP → Det N [0.5]
NP → N [0.5]
Det → the [1.0]
N → dog [0.6]
N → cat [0.4]
V → sees [1.0]
❖ Sentence: "the dog sees the cat"
Possible Parse:
S
├── NP (the dog)
│ ├── Det → the
│ └── N → dog
└── VP
├── V → sees
└── NP (the cat)
├── Det → the
└── N → cat
Probability:
P = P(S → NP VP) × P(NP → Det N) × P(Det → the) × P(N → dog)
× P(VP → V NP) × P(V → sees) × P(NP → Det N) × P(N → cat)
= 1.0 × 0.5 × 1.0 × 0.6 × 0.6 × 1.0 × 0.5 × 0.4 = 0.036
This helps compare multiple parses based on likelihood.
11. Probabilistic CYK Parsing (PCYK)
🧠 What is PCYK?
The Probabilistic Cocke-Younger-Kasami (PCYK) algorithm extends the classic CYK
parsing algorithm by incorporating probabilities from Probabilistic Context-Free
Grammars (PCFGs). It finds not just any parse, but the most probable one.
🧮 How it Works:
PCYK uses a dynamic programming table (chart) where each cell contains:
The non-terminal symbol(s) that can generate the span of words.
The probability of that derivation.
The backpointers to reconstruct the most probable parse tree.
📘 Grammar (in CNF):
Let’s parse:
Sentence: the dog chased the cat
PCFG in CNF form:
S → NP VP [1.0]
NP → Det N [0.6]
NP → N [0.4]
VP → V NP [0.7]
VP → V [0.3]
Det → the [1.0]
N → dog [0.5]
N → cat [0.5]
V → chased [1.0]
🧱 Step-by-Step Table Construction:
Let’s consider n = 5 words in the sentence. We'll create a triangular chart with base (length 1
spans) filled using lexical rules.
Then for larger spans, we calculate:
P(X → Y Z) = P(rule) × P(Y) × P(Z)
Example:
To form NP from Det (the) and N (dog):
P(NP → Det N) = 0.6 × 1.0 (Det) × 0.5 (N) = 0.3
For VP → V NP, compute:
P(VP) = 0.7 × 1.0 (V) × 0.3 (NP) = 0.21
This continues recursively until you fill the full span (0–4), where:
If S exists in that cell → successful parse
Use the highest probability derivation
✅ Why it Matters:
Handles ambiguity quantitatively.
Foundation for statistical parsers (e.g., Stanford, Berkeley).
Trained from annotated corpora like the Penn Treebank.
12. Probabilistic Lexicalized Context-Free Grammars (PLCFGs)
🔍 What is a Lexicalized CFG?
In Lexicalized CFGs, each non-terminal carries a lexical head (usually a key word like a
verb or noun) to capture more precise syntactic and semantic information.
📌 Example:
Without lexicalization:
VP → V NP
With lexicalization:
VP(chased) → V(chased) NP(cat) [P]
Here:
chased is the head of VP.
cat is the head of NP.
3
🎯 Why It’s Powerful:
Lexical heads encode dependencies, improving disambiguation.
Example:
"He saw the man with a telescope."
o Is “with a telescope” modifying “saw” or “man”?
o With heads, you can model:
saw(with) = high likelihood
man(with) = low likelihood
o Statistical data decides which structure is more probable.
💡 Collins’ Parser (1997) and others use lexicalized PCFGs to:
Model head word propagation
Estimate distance-based probabilities
Handle prepositional attachment and subcategorization
13. Feature Structures
🧬 What is a Feature Structure?
A Feature Structure is a formal representation of linguistic information using attribute-
value pairs. It’s the backbone of constraint-based grammars like:
HPSG (Head-driven Phrase Structure Grammar)
LFG (Lexical Functional Grammar)
📦 Example:
A noun phrase like “the cats” can have:
{
"category": "NP",
"number": "plural",
"person": "third",
"case": "nominative"
}
A verb like “run” in past tense:
{
"category": "V",
"tense": "past",
"agreement": {
"number": "plural",
"person": "third"
}
}
🧩 Benefits:
Encodes multiple dimensions: agreement, tense, gender, etc.
Can represent complex linguistic phenomena like:
o Verb agreement
o Control structures
o Binding and anaphora
14. Unification of Feature Structures
⚙️What is Unification?
Unification is the process of merging two feature structures under the constraint that no
conflicting values exist.
It is a fundamental operation in unification-based grammars to ensure grammatical
constraints are satisfied.
📘 Example 1: Successful Unification
Structure A:
{ "number": "singular", "gender": "feminine" }
Structure B:
json
CopyEdit
{ "number": "singular" }
✅ Unified:
json
CopyEdit
{ "number": "singular", "gender": "feminine" }
📕 Example 2: Failed Unification
A:
{ "number": "singular" }
B:
{ "number": "plural" }
❌ Conflict → unification fails
Application in Parsing:
Grammar rules require that subjects and verbs agree in number and person.
During parsing, unification checks whether combining two constituents (like a subject
and a verb) is allowed.
Rule:
S → NP VP
Constraint: NP.number = VP.agreement.number
If NP has {number: plural} and VP has {agreement: {number: plural}}, unification succeeds.
🧠 Where It’s Used:
HPSG uses large, detailed feature structures and unification at every rule application.
Constraint-based parsing and generation systems.
Useful in multilingual NLP where agreement rules vary.