. What is Grammar in Formal Language Theory?
A grammar is a set of rules that defines a language. It specifies
how valid sequences of symbols (or sentences) in a language can
be constructed.
Grammars are essential in both natural language processing and
programming languages, as they determine the structure of valid
sentences or expressions.
Types of Grammars:
o Regular Grammar: Produces Regular Languages, which
can be recognized by finite automata.
o Context-Free Grammar (CFG): Produces Context-Free
Languages, recognizable by pushdown automata and
parsable with parse trees.
o Context-Sensitive Grammar: More powerful than CFG,
used to define languages where context matters in
production.
o Unrestricted Grammar: The most general form, equivalent
to Turing Machines in computational power.
2. Context-Free Grammar (CFG)
Definition: A CFG is a set of production rules that describe all
possible strings in a context-free language. It is “context-free”
because each rule can be applied regardless of the surrounding
symbols.
Formal Components of a CFG:
o Non-terminal symbols (N): These represent syntactic
categories (like <Expression> or <Term>) and can be
expanded.
o Terminal symbols (Σ): These are actual characters or tokens
from the language (e.g., a, b, +, *).
o Start symbol (S): A special non-terminal from which
parsing begins.
o Production rules (P): Rules that describe how non-
terminals can be replaced by combinations of terminals and
non-terminals.
CFG Notation: CFGs are often written in Backus-Naur Form
(BNF) or similar notation:
o <Expression> → <Expression> + <Term>
o <Expression> → <Term>
o <Term> → <Term> * <Factor>
o <Term> → <Factor>
o <Factor> → ( <Expression> )
o <Factor> → a | b | c
Example CFG for Simple Arithmetic Expressions:
S→S+S
S→S*S
S→(S)
S→a|b|c
This grammar allows for arithmetic expressions involving +, *, and
variables a, b, and c.
3. Parsing with Context-Free Grammar (CFG)
Parsing: Parsing is the process of analyzing a sequence of symbols
to determine its grammatical structure.
Goal: Check if a string belongs to a language defined by the CFG
and produce a parse tree for that string.
Parsing Techniques:
o Top-Down Parsing: Starts with the start symbol and applies
production rules to match the input string. Common methods
include Recursive Descent Parsing and LL Parsing.
o Bottom-Up Parsing: Begins with the input string and
applies production rules in reverse to reach the start symbol.
LR Parsing is a common bottom-up technique.
4. Detailed Example of Parsing with CFG
CFG for Arithmetic Expressions
S→S+S|S*S|(S)|a|b|c
Start Symbol: S
Non-terminals: {S}
Terminals: {a, b, c, +, *, (, )}
Production Rules:
o S → S + S (for addition)
o S → S * S (for multiplication)
o S → ( S ) (for grouping)
o S → a | b | c (for variables)
Example 1: Parsing (a + b) * c
Let's break down the parsing process step-by-step using a parse tree:
1. Initialize with Start Symbol S: We begin with S, which is the
starting point.
2. Apply Production Rule: S → S * S for multiplication.
3. Expand First S: Use the production rule S → ( S ) to match the
parentheses.
4. Expand Inside Parentheses: Apply S → S + S for the addition.
5. Match Terminals: Expand each S within the addition to a and b,
and the other outer S to c.
Applications of CFG and Parsing
Programming Languages: CFGs are fundamental in defining the
syntax of programming languages (e.g., Java, Python).
Natural Language Processing (NLP): CFGs help model the
syntax of human languages.
Compilers and Interpreters: CFGs are used in lexical and
syntactic analysis, allowing compilers to check if code follows the
language’s syntax rules.
Feature-Based Grammar Overview
Feature-Based Grammar (FBG) enhances context-free grammars
by associating features (e.g., number, tense, gender) with grammar
rules to allow more precise syntactic and semantic analysis.
This grammar type is highly effective for languages where
syntactic structures are complex and where features like
agreement (in number, gender, etc.) need to be enforced between
different elements.
Example Use Case:
o In English, a verb must agree in number with its subject,
while in languages like French, adjectives must also agree in
gender and number with nouns.
Grammatical Features
Grammatical Features are additional constraints that specify
properties of syntactic elements. Each feature is defined as an
attribute-value pair.
Key Grammatical Features:
o Number: Singular or plural (e.g., "cat" is singular, "cats" is
plural).
o Person: Indicates if the subject is first, second, or third
person (e.g., "I" is first person, "you" is second person).
o Tense: Specifies the timing of the action (past, present,
future).
o Gender: Masculine, feminine, neuter, which is common in
gendered languages (e.g., in Spanish, "niño" (boy) is
masculine, "niña" (girl) is feminine).
o Case: Indicates the syntactic role of a noun (e.g., nominative
for the subject, accusative for the object).
Example: The sentence "She walks" can be represented with the
following feature-based grammar:
o “She”: [Gender: feminine, Number: singular, Person: third]
o “walks”: [Tense: present, Number: singular, Person: third]
o The agreement between "She" and "walks" is enforced by
matching the number and person features.
Detailed Example with Different Features:
o Sentence: "The children are playing."
"children": [Number: plural]
"are": [Number: plural, Tense: present]
"playing": [Tense: present participle, Aspect:
continuous]
This structure enforces agreement by ensuring all
elements agree in number.
Processing Feature Structures
Feature Structures are a formal representation of grammatical
features, typically in attribute-value pairs.
Attribute-Value Pairs:
o Attributes (like Tense, Number, Gender) represent
grammatical categories.
o Values (like present, singular, masculine) specify properties
for those categories.
Operations on Feature Structures:
o Unification is the primary operation in feature-based
grammar, combining compatible feature structures.
When two structures unify, they merge their attribute-
value pairs; if they conflict, unification fails.
Example of Feature Structures and Unification:
Feature Structure for "she": [Gender: feminine, Number: singular,
Person: third]
Feature Structure for "walks": [Tense: present, Number: singular,
Person: third]
Unified Structure (only if all features match)
[Gender: feminine, Number: singular, Person: third, Tense: present]
If “walks” were plural instead of singular, unification would fail due to
mismatched number features.