Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
26 views36 pages

Unit 3

The document covers the concepts of grammars and languages, focusing on Context-Free Grammars (CFG) and Context-Free Languages (CFL). It discusses various topics including derivation, ambiguity, normal forms, simplification techniques, and parsing methods. Additionally, it provides algorithms and examples for eliminating useless symbols, null-productions, and unit productions, as well as converting CFGs into Chomsky Normal Form (CNF).

Uploaded by

iameverywhere792
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views36 pages

Unit 3

The document covers the concepts of grammars and languages, focusing on Context-Free Grammars (CFG) and Context-Free Languages (CFL). It discusses various topics including derivation, ambiguity, normal forms, simplification techniques, and parsing methods. Additionally, it provides algorithms and examples for eliminating useless symbols, null-productions, and unit productions, as well as converting CFGs into Chomsky Normal Form (CNF).

Uploaded by

iameverywhere792
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 36

Unit3 : Grammars and

Languages
Syllabus
• Derivation and ambiguity,
• BNF & CNF notations,
• Union, Concatenation and *’s of CFLs,
• Eliminating production & unit productions from
CFG,
• Eliminating useless variables from a context Free
Grammar.
• Parsing: Top-Down, Recursive Descent
and Bottom-Up Parsing
Introduction
• Grammar is a set of rules by which strings in a language
can be generated.
• Here we are discussed the CFG- Context Free Grammar a
more powerful method of describing languages.
• The set of strings generated by a context-free grammar is
called a context-free language and context-free languages
can describe many practically important systems.
• Most programming languages can be approximated by
context-free grammar and compilers for them have been
developed based on properties of context-free languages.
• Let us define context-free grammars and context-free
languages here.
CFG Vs RE
• The CFG are more powerful than the regular
expressions as they have more expressive power
than the regular expression.
• Generally regular expressions are useful for
describing the structure of lexical constructs as
identical keywords, constants etc.
• But they do not have the capability to specify the
recursive structure of the programming constructs.
• However, the CFG are capable to define any of the
recursive structure also.
• Thus, CFG can define the languages that are regular
as well as those languages that are not regular.
Definition (Context-Free Grammar) :
• A 4-tuple G = < V ,∑ , S , P > is a context-free
grammar (CFG)
• if V and ∑ are finite sets sharing no elements
between them (V∩ ∑ = ) ,
• S belongs to V is the start symbol, and
• P is a finite set of productions of the form
X -> α , where X € V , and α € (V ∪ ∑ )* .
A language is a context-free language (CFL) if all
of its strings are generated by a context-free
grammar.
Properties of Context-Free Language
Theorem 1:
Let L1 and L2 be context-free languages. Then
L1 ∪ L2 , L1L2 , and L1* are context-free languages.
Normal forms and Simplification of CFG
BNF & CNF notations,
• Productions in CFG, satisfying certain restrictions
are said to be in Normal Forms.
• There are 2 notations used for Normal Forms,
1. CNF- Chomsky Normal Form:
A context-free grammar is said to be in
Chomsky normal form if every production is
of one of these two types:
A → BC (where B and C are variables)
A → a (where a is a terminal symbol)
Simplification of CFG
• To get CFG in CNF form, we need to make a
number of preliminary Simplifications, which
are themselves useful in various ways.
• A CFG is simplified by eliminating the following,
1. “Useless Symbols”: Those variables or terminals
that do not appear in any derivation of a terminal
string from the start symbol.
2. “Null-productions”: Those of the form
A→ Є for some variable A.
3. “Unit productions”: Those of the form
A →B for variables A and B.
1) Eliminating“Useless Symbols”
There are 2 types of symbols are useless,
• Non-genrating symbols
• Non-reachable symbols
Thus useful symbols are those variables or
terminals that appear in any derivation of a
terminal string from the start symbol.
Eliminating a useless symbol includes identifying
whether or not the symbol is “generating”‖ and
“reachable”‖.
• Generating Symbol:
We say x is generating if x→*w for some
terminal string w: Note that every terminal is
generated since w can be that terminal itself,
which is derived by zero steps.
• Reachable symbol:
We say x is reachable if there is derivation
S →* αxβ for some α and β.

• Thus if we eliminate the non generating symbols


and then non-reachable, we shall have only the
useful symbols left.
• Example :
Consider a grammar defined by following
productions:

S→aB | bX
A → Bad | bSX | a
B → aSB | bBX
X → SBd | aBX | ad
Here;
A and X can directly generate terminal symbols. So, A
and X are generating symbols. As we have the
productions A→ a and X→ ad.
Also,
S→bX and X generates terminal string so S can also
generate terminal string. Hence, S is also generating
symbol.
B can not produce any terminal symbol, so it is non-
generating.
Hence, the new grammar after removing
non-generating symbols is:
S → bX
A → bSX | a
X → ad
• Here,
• A is non-reachable as there is no any derivation
of the form S→* α A β in the grammar. Thus
eliminating the non-reachable symbols, the
resulting grammar is:
S→ bX
X→ ad
This is the grammar with only useful symbols.
Exercise
1) Remove useless symbol from the following
grammar:

S→ xyZ | XyzZ
X → Xz | xYZ
Y → yYy | XZ
Z → Zy | z
2) Remove useless symbol from the following grammar

S → aC | SB
A → bSCa
B → aSB | bBC
C → aBc | ad
2)Eliminating“Null-productions” :
A grammar is said to have Є-
productions if there is a production of
the form
A → Є.
Here our strategy is to begin by
discovering which variables are
“nullable”.

A variable ‘A‘ is “nullable” if A→ Є .


Algorithm (Steps to remove Є-production from the
grammar):
• If there is a production of the form A →Є,
then A is “nullable”.
• If there is production of the form B→ X1,
X2………. And each Xi‘s are nullable then B is also
nullable.
• Find all the nullable variables.
• If B→X1, X2……………. Xn is a production in P then
add all productions P‘ formed by striking out
some subsets of there Xi‘s that are nullable.
• Do not include B→ Є if there is such production.
Example:
Consider the grammar:
S→ABC
A → BB | Є
B → CC | a
C → AA | b
Here,
A→Є A is nullable.
C → AA → * Є, C is nullable
B → CC → * Є, B is nullable
S → ABC → * Є, S is nullable
Now for removal of Є –production:
• In production
S→ABC, all A, B and C are nullable.
So, striking out subset of each the possible
combination of production gives new productions
as:
S→ ABC | AB | BC | AC | A | B | C
• Similarly for other can be done and the resulting
grammar after removal of e-production is:
S → ABC | AB | BC | AC | A | B | C
A → BB | B
B → CC | C | a
C → AA | A | b
Exercise:
Remove Є-productions for each of grammar;
1)
S→ AB
A → aAA | Є
B → bBB | Є
3) Eliminating Unit Production:
• A unit production is a production of the
form A→ B, where A and B are both
variables.
• Here, if A → B, we say B is A-derivable.
B→ C, we say C is B-derivable.
• Thus if both of two A → B and B → C, then
A → * C, hence C is also A-derivable.
• Here pairs (A, B), (B, C) and (A, C) are
called the unit pairs.
To eliminate the unit productions, first find all of the unit
pairs. The unit pairs are;
(A, A) is a unit pair for any variable A as A→* A
If we have A → B then (A, B) is unit pair.
If (A, B) is unit pair i.e. A → B, and if we have B → C then
(A, C) is also a unit pair.
Now, to eliminate those unit productions for a, gives
grammar say G = (V, T, P, S), we have to find another
grammar G‘ = (V, T, P‘, S) with no unit productions. For this,
we may workout as below;
• Initialize P‘ = P
• For each A ε V, find a set of A-derivable variables.
• For every pair (A, B) such that B is A-derivable and for every
non-unit production B→ α, we add production A → α is P‘
if it is not in P‘ already.
• Delete all unit productions from P‘.
Example
Remove the unit production for grammar G defined
by productions:
P = { S→ S + T | T
T → T* F | F
F → (S) | a };
Solution:
Initialize
1) P‘= { S→ S + T | T
T → T* F | F
F → (S) | a };
2) Now, find unit pairs;
Here, S→ T So, (S, T) is unit pair.
T→ F So, (T, F) is unit pair.
Also, S → T and T → F So, (S, F) is unit pair.

3) Now, add each non-unit productions of the form B → α


for each pair (A, B);
P‘ = {
S → S + T |T * F| (S) | a
T → T * F | (S) | a | F
F → (S) | a
}
4) Delete the unit productions from the grammar;
P‘ = {
S→ S + T | T * F | (S) | a
T→ T * F | (S) | a
F→ (S) | a
}
Exercise
1) Simply the grammar G = (V, T, P, S) defined by following
productions.
1) S→ ASB | Є
A→> aAS | a
B→SbS | A | bb | Є
Note: Here simplify means you have to remove all the useless
symbol, Unit production and Є- productions.

2) Simplify the grammar defined by following production:

S → 0A0 | 1B1 | BB
A→C
B→S|A
C→S|Є
1)CNF- Chomsky Normal Form:
A context-free grammar is said to be in Chomsky
normal form if every production is of one of
these two types:
A → BC (where B and C are variables)
A → a (where a is a terminal symbol)
and Thus a grammar in CNF is one which should
not have;
• Є-production
• Unit production
• Useless symbols
Algorithm to convert CFG into CNF:
• Step 1: Eliminate Є-production, Unit production
Useless symbols from given CFG.
• Step2: If all the productions are of the form
A→ a and A→BC with A, B, C ε V and
a ε T, we have done.
Otherwise, we have to do two task as:
1. Arange that all bodies of length 2 or more
consist only of variable.
2. Break bodies of length 3 or more into a
cascade of production , each with a body
consisting of two variable.
• The construction for task (1) is as follows :
if the productions are of the form:
A→ X1, X2, ………………Xm, m>2 and if some
Xi is terminal a,
then we replace the Xi by Ca having Ca→ a
where Ca is a variable itself.
Thus as result we will have all productions of
the form:
A→ B1B2…………Bm, m>2;
where all Bi‘s are non-terminal.
• The construction for task (2) is as follows :
We break those production
A→ B1B2…………Bm for m>=3, into group of
production with two variables in each body.
We introduce m-2 new variables
C1,C2,…………Cm-2.
The original production is replaced by the m-1
productions :
A→B1C1,
C1→B2C2,
……..
…….. …
Ck-2→Bk-1Bk
• Finally, all of the productions are achieved
in the form as:
A→ BC or A→a
This is certainly a grammar in CNF and
generates a language without Є-
productions.
• Consider an example: Convert CFG to CNF,
S→ AAC
A→ aAb | Є
C → aC | a
Solution: 1) First, removing Є- productions;
Here, A is nullable symbol as A→Є
So, eliminating such Є-productions,
we have;
S→ AAC | AC | C
A→ aAb | ab
C→ aC | a
2) Removing unit-productions:
Here, the unit pair we have is (S, C) as S→C
So, removing unit-production,
we have CFG as ;
S→ AAC | AC | aC| a
A→ aAb | ab
C→ aC | a

3) Here in CFG , we do not have any useless


symbols.
Now, we can convert the grammar to CNF. For this;
• First replace the terminal by a variable and introduce new productions for
those which are not as the productions in CNF.
• i.e. S→AAC | AC |C1C | a
C1→ a
A→ C1AB1 | C1B1
B1→ b
C→ C1C | a
Now, replace the sequence of non-terminals by a variable and introduce new
productions.
Here, replace S→ AAC by S→AC2, C2→AC
Similarly, replace A→ C1AB1 by A→ C1C3, C3→ AB1
Thus the final grammar in CNF form will be as;
S→ AC2 | AC | C1C | a
A→ C1C3 | C1b1
C1→ a
B1→ b
C2→ AC
C3→ AB1
C→C1C | a
Q. Convert the following CFG into CNF form,
S→TU|V
T → aT b |
U → cU |
V → aV c | W
W → bW |
Solution:
1. (Identifying nullable variables)
The variables T , U, and W are nullable because they are involved in -
productions;
V is nullable because of the production V → W; and S is also, either because of
the production S → T U or
because of S → V .
So all the variables are!
2. (Eliminating -productions)
Before the -productions are eliminated, the following productions are added:
S→T S→U T → ab U→c V → ac W → b After
eliminating -productions, we are left with
S→TU|T|U|V
T → aT b | ab
U → cU | c
V → aV c | ac | W
W → bW | b
3. (Identifying A-derivable variables, for each A) The S-
derivable variables obviously include T , U, and V , and
they also include W because of the production V → W. The
V -derivable variable is W.
4. (Eliminating unit productions)
We add the productions
S → aT b | ab | cU | c | aV c | ac | bW | b
V → bW | b
before eliminating unit productions.
At this stage, we have
S → T U | aT b | ab | cU | c | aV c | ac | bW | b
T → aT b | ab
U → cU | c
V → aV c | ac | bW | b
W → bW | b
5. (Converting to Chomsky normal form)
We replace a, b, and c by Xa, Xb, and Xc, respectively, in productions whose right
sides are not single terminals, obtaining
S → T U | XaT Xb | XaXb | XcU | c | XaV Xc | XaXc |
XbW | b T → XaT Xb | XaXb
U → XcU | c V → XaV Xc | XaXc | XbW | b
W → XbW | b
This grammar fails to be in Chomsky normal form only because of the productions
S → XaT Xb, S → XaV Xc, T → XaT Xb, and V → XaV Xc. When we take care of
these as described above, we obtain the final CFG G1 with productions
S → T U | XaY1 | XaXb | XcU | c | XaY2 | XaXc | XbW | b
Y1 → T Xb
Y2 → V Xc
T → XaY3 | XaXb
Y3 → T Xb
U → XcU | c
V → XaY4 | XaXc | XbW | b
Y4 → V Xc
W → XbW | b
(We obviously don’t need both Y1 and Y3, and we don’t need both Y2 and Y4, so we
could simplify G1 slightly.)

You might also like