Unit3 : Grammars and
Languages
Syllabus
• Derivation and ambiguity,
• BNF & CNF notations,
• Union, Concatenation and *’s of CFLs,
• Eliminating production & unit productions from
CFG,
• Eliminating useless variables from a context Free
Grammar.
• Parsing: Top-Down, Recursive Descent
and Bottom-Up Parsing
Introduction
• Grammar is a set of rules by which strings in a language
can be generated.
• Here we are discussed the CFG- Context Free Grammar a
more powerful method of describing languages.
• The set of strings generated by a context-free grammar is
called a context-free language and context-free languages
can describe many practically important systems.
• Most programming languages can be approximated by
context-free grammar and compilers for them have been
developed based on properties of context-free languages.
• Let us define context-free grammars and context-free
languages here.
CFG Vs RE
• The CFG are more powerful than the regular
expressions as they have more expressive power
than the regular expression.
• Generally regular expressions are useful for
describing the structure of lexical constructs as
identical keywords, constants etc.
• But they do not have the capability to specify the
recursive structure of the programming constructs.
• However, the CFG are capable to define any of the
recursive structure also.
• Thus, CFG can define the languages that are regular
as well as those languages that are not regular.
Definition (Context-Free Grammar) :
• A 4-tuple G = < V ,∑ , S , P > is a context-free
grammar (CFG)
• if V and ∑ are finite sets sharing no elements
between them (V∩ ∑ = ) ,
• S belongs to V is the start symbol, and
• P is a finite set of productions of the form
X -> α , where X € V , and α € (V ∪ ∑ )* .
A language is a context-free language (CFL) if all
of its strings are generated by a context-free
grammar.
Properties of Context-Free Language
Theorem 1:
Let L1 and L2 be context-free languages. Then
L1 ∪ L2 , L1L2 , and L1* are context-free languages.
Normal forms and Simplification of CFG
BNF & CNF notations,
• Productions in CFG, satisfying certain restrictions
are said to be in Normal Forms.
• There are 2 notations used for Normal Forms,
1. CNF- Chomsky Normal Form:
A context-free grammar is said to be in
Chomsky normal form if every production is
of one of these two types:
A → BC (where B and C are variables)
A → a (where a is a terminal symbol)
Simplification of CFG
• To get CFG in CNF form, we need to make a
number of preliminary Simplifications, which
are themselves useful in various ways.
• A CFG is simplified by eliminating the following,
1. “Useless Symbols”: Those variables or terminals
that do not appear in any derivation of a terminal
string from the start symbol.
2. “Null-productions”: Those of the form
A→ Є for some variable A.
3. “Unit productions”: Those of the form
A →B for variables A and B.
1) Eliminating“Useless Symbols”
There are 2 types of symbols are useless,
• Non-genrating symbols
• Non-reachable symbols
Thus useful symbols are those variables or
terminals that appear in any derivation of a
terminal string from the start symbol.
Eliminating a useless symbol includes identifying
whether or not the symbol is “generating”‖ and
“reachable”‖.
• Generating Symbol:
We say x is generating if x→*w for some
terminal string w: Note that every terminal is
generated since w can be that terminal itself,
which is derived by zero steps.
• Reachable symbol:
We say x is reachable if there is derivation
S →* αxβ for some α and β.
• Thus if we eliminate the non generating symbols
and then non-reachable, we shall have only the
useful symbols left.
• Example :
Consider a grammar defined by following
productions:
S→aB | bX
A → Bad | bSX | a
B → aSB | bBX
X → SBd | aBX | ad
Here;
A and X can directly generate terminal symbols. So, A
and X are generating symbols. As we have the
productions A→ a and X→ ad.
Also,
S→bX and X generates terminal string so S can also
generate terminal string. Hence, S is also generating
symbol.
B can not produce any terminal symbol, so it is non-
generating.
Hence, the new grammar after removing
non-generating symbols is:
S → bX
A → bSX | a
X → ad
• Here,
• A is non-reachable as there is no any derivation
of the form S→* α A β in the grammar. Thus
eliminating the non-reachable symbols, the
resulting grammar is:
S→ bX
X→ ad
This is the grammar with only useful symbols.
Exercise
1) Remove useless symbol from the following
grammar:
S→ xyZ | XyzZ
X → Xz | xYZ
Y → yYy | XZ
Z → Zy | z
2) Remove useless symbol from the following grammar
S → aC | SB
A → bSCa
B → aSB | bBC
C → aBc | ad
2)Eliminating“Null-productions” :
A grammar is said to have Є-
productions if there is a production of
the form
A → Є.
Here our strategy is to begin by
discovering which variables are
“nullable”.
A variable ‘A‘ is “nullable” if A→ Є .
Algorithm (Steps to remove Є-production from the
grammar):
• If there is a production of the form A →Є,
then A is “nullable”.
• If there is production of the form B→ X1,
X2………. And each Xi‘s are nullable then B is also
nullable.
• Find all the nullable variables.
• If B→X1, X2……………. Xn is a production in P then
add all productions P‘ formed by striking out
some subsets of there Xi‘s that are nullable.
• Do not include B→ Є if there is such production.
Example:
Consider the grammar:
S→ABC
A → BB | Є
B → CC | a
C → AA | b
Here,
A→Є A is nullable.
C → AA → * Є, C is nullable
B → CC → * Є, B is nullable
S → ABC → * Є, S is nullable
Now for removal of Є –production:
• In production
S→ABC, all A, B and C are nullable.
So, striking out subset of each the possible
combination of production gives new productions
as:
S→ ABC | AB | BC | AC | A | B | C
• Similarly for other can be done and the resulting
grammar after removal of e-production is:
S → ABC | AB | BC | AC | A | B | C
A → BB | B
B → CC | C | a
C → AA | A | b
Exercise:
Remove Є-productions for each of grammar;
1)
S→ AB
A → aAA | Є
B → bBB | Є
3) Eliminating Unit Production:
• A unit production is a production of the
form A→ B, where A and B are both
variables.
• Here, if A → B, we say B is A-derivable.
B→ C, we say C is B-derivable.
• Thus if both of two A → B and B → C, then
A → * C, hence C is also A-derivable.
• Here pairs (A, B), (B, C) and (A, C) are
called the unit pairs.
To eliminate the unit productions, first find all of the unit
pairs. The unit pairs are;
(A, A) is a unit pair for any variable A as A→* A
If we have A → B then (A, B) is unit pair.
If (A, B) is unit pair i.e. A → B, and if we have B → C then
(A, C) is also a unit pair.
Now, to eliminate those unit productions for a, gives
grammar say G = (V, T, P, S), we have to find another
grammar G‘ = (V, T, P‘, S) with no unit productions. For this,
we may workout as below;
• Initialize P‘ = P
• For each A ε V, find a set of A-derivable variables.
• For every pair (A, B) such that B is A-derivable and for every
non-unit production B→ α, we add production A → α is P‘
if it is not in P‘ already.
• Delete all unit productions from P‘.
Example
Remove the unit production for grammar G defined
by productions:
P = { S→ S + T | T
T → T* F | F
F → (S) | a };
Solution:
Initialize
1) P‘= { S→ S + T | T
T → T* F | F
F → (S) | a };
2) Now, find unit pairs;
Here, S→ T So, (S, T) is unit pair.
T→ F So, (T, F) is unit pair.
Also, S → T and T → F So, (S, F) is unit pair.
3) Now, add each non-unit productions of the form B → α
for each pair (A, B);
P‘ = {
S → S + T |T * F| (S) | a
T → T * F | (S) | a | F
F → (S) | a
}
4) Delete the unit productions from the grammar;
P‘ = {
S→ S + T | T * F | (S) | a
T→ T * F | (S) | a
F→ (S) | a
}
Exercise
1) Simply the grammar G = (V, T, P, S) defined by following
productions.
1) S→ ASB | Є
A→> aAS | a
B→SbS | A | bb | Є
Note: Here simplify means you have to remove all the useless
symbol, Unit production and Є- productions.
2) Simplify the grammar defined by following production:
S → 0A0 | 1B1 | BB
A→C
B→S|A
C→S|Є
1)CNF- Chomsky Normal Form:
A context-free grammar is said to be in Chomsky
normal form if every production is of one of
these two types:
A → BC (where B and C are variables)
A → a (where a is a terminal symbol)
and Thus a grammar in CNF is one which should
not have;
• Є-production
• Unit production
• Useless symbols
Algorithm to convert CFG into CNF:
• Step 1: Eliminate Є-production, Unit production
Useless symbols from given CFG.
• Step2: If all the productions are of the form
A→ a and A→BC with A, B, C ε V and
a ε T, we have done.
Otherwise, we have to do two task as:
1. Arange that all bodies of length 2 or more
consist only of variable.
2. Break bodies of length 3 or more into a
cascade of production , each with a body
consisting of two variable.
• The construction for task (1) is as follows :
if the productions are of the form:
A→ X1, X2, ………………Xm, m>2 and if some
Xi is terminal a,
then we replace the Xi by Ca having Ca→ a
where Ca is a variable itself.
Thus as result we will have all productions of
the form:
A→ B1B2…………Bm, m>2;
where all Bi‘s are non-terminal.
• The construction for task (2) is as follows :
We break those production
A→ B1B2…………Bm for m>=3, into group of
production with two variables in each body.
We introduce m-2 new variables
C1,C2,…………Cm-2.
The original production is replaced by the m-1
productions :
A→B1C1,
C1→B2C2,
……..
…….. …
Ck-2→Bk-1Bk
• Finally, all of the productions are achieved
in the form as:
A→ BC or A→a
This is certainly a grammar in CNF and
generates a language without Є-
productions.
• Consider an example: Convert CFG to CNF,
S→ AAC
A→ aAb | Є
C → aC | a
Solution: 1) First, removing Є- productions;
Here, A is nullable symbol as A→Є
So, eliminating such Є-productions,
we have;
S→ AAC | AC | C
A→ aAb | ab
C→ aC | a
2) Removing unit-productions:
Here, the unit pair we have is (S, C) as S→C
So, removing unit-production,
we have CFG as ;
S→ AAC | AC | aC| a
A→ aAb | ab
C→ aC | a
3) Here in CFG , we do not have any useless
symbols.
Now, we can convert the grammar to CNF. For this;
• First replace the terminal by a variable and introduce new productions for
those which are not as the productions in CNF.
• i.e. S→AAC | AC |C1C | a
C1→ a
A→ C1AB1 | C1B1
B1→ b
C→ C1C | a
Now, replace the sequence of non-terminals by a variable and introduce new
productions.
Here, replace S→ AAC by S→AC2, C2→AC
Similarly, replace A→ C1AB1 by A→ C1C3, C3→ AB1
Thus the final grammar in CNF form will be as;
S→ AC2 | AC | C1C | a
A→ C1C3 | C1b1
C1→ a
B1→ b
C2→ AC
C3→ AB1
C→C1C | a
Q. Convert the following CFG into CNF form,
S→TU|V
T → aT b |
U → cU |
V → aV c | W
W → bW |
Solution:
1. (Identifying nullable variables)
The variables T , U, and W are nullable because they are involved in -
productions;
V is nullable because of the production V → W; and S is also, either because of
the production S → T U or
because of S → V .
So all the variables are!
2. (Eliminating -productions)
Before the -productions are eliminated, the following productions are added:
S→T S→U T → ab U→c V → ac W → b After
eliminating -productions, we are left with
S→TU|T|U|V
T → aT b | ab
U → cU | c
V → aV c | ac | W
W → bW | b
3. (Identifying A-derivable variables, for each A) The S-
derivable variables obviously include T , U, and V , and
they also include W because of the production V → W. The
V -derivable variable is W.
4. (Eliminating unit productions)
We add the productions
S → aT b | ab | cU | c | aV c | ac | bW | b
V → bW | b
before eliminating unit productions.
At this stage, we have
S → T U | aT b | ab | cU | c | aV c | ac | bW | b
T → aT b | ab
U → cU | c
V → aV c | ac | bW | b
W → bW | b
5. (Converting to Chomsky normal form)
We replace a, b, and c by Xa, Xb, and Xc, respectively, in productions whose right
sides are not single terminals, obtaining
S → T U | XaT Xb | XaXb | XcU | c | XaV Xc | XaXc |
XbW | b T → XaT Xb | XaXb
U → XcU | c V → XaV Xc | XaXc | XbW | b
W → XbW | b
This grammar fails to be in Chomsky normal form only because of the productions
S → XaT Xb, S → XaV Xc, T → XaT Xb, and V → XaV Xc. When we take care of
these as described above, we obtain the final CFG G1 with productions
S → T U | XaY1 | XaXb | XcU | c | XaY2 | XaXc | XbW | b
Y1 → T Xb
Y2 → V Xc
T → XaY3 | XaXb
Y3 → T Xb
U → XcU | c
V → XaY4 | XaXc | XbW | b
Y4 → V Xc
W → XbW | b
(We obviously don’t need both Y1 and Y3, and we don’t need both Y2 and Y4, so we
could simplify G1 slightly.)