Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
135 views122 pages

Context Free Grammars

The document provides an overview of context-free grammars (CFGs). It discusses how CFGs can be used to model the syntax of computer programs and check syntax validity. Examples are given of constructing CFGs for various languages, including {anbn}, palindromes over {a,b}, and languages with equal numbers of as and bs. It also covers definitions of derivation, acceptance, context-free languages, and properties such as closure of operations on context-free languages.

Uploaded by

majidalbadawi687
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
135 views122 pages

Context Free Grammars

The document provides an overview of context-free grammars (CFGs). It discusses how CFGs can be used to model the syntax of computer programs and check syntax validity. Examples are given of constructing CFGs for various languages, including {anbn}, palindromes over {a,b}, and languages with equal numbers of as and bs. It also covers definitions of derivation, acceptance, context-free languages, and properties such as closure of operations on context-free languages.

Uploaded by

majidalbadawi687
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 122

Theory of Computation

(Context-Free Grammars)

Pramod Ganapathi
Department of Computer Science
State University of New York at Stony Brook

January 24, 2021


Contents

Contents
Context-Free Grammars (CFG)
Context-Free Languages
Pushdown Automata (PDA)
Transformations
Pumping Lemma
Context-Free Grammars (CFG)
Computer program compilation
C++ program: C++ program:
1. #include <iostream> 1. #include <iostream>
2. using namespace std; 2. using namespace std;
3. int main() 3.
4. { 4. int main()
5. if (true) 5. {
6. { 6. if (true)
7. cout << "Hi 1"; 7. cout << "Hi 1";
8. else 8. else
9. cout << "Hi 2"; 9. cout << "Hi 2";
10. } 10.
11. return 0; 11. return 0;
12. } 12. }
Computer program compilation
C++ program: C++ program:
1. #include <iostream> 1. #include <iostream>
2. using namespace std; 2. using namespace std;
3. int main() 3.
4. { 4. int main()
5. if (true) 5. {
6. { 6. if (true)
7. cout << "Hi 1"; 7. cout << "Hi 1";
8. else 8. else
9. cout << "Hi 2"; 9. cout << "Hi 2";
10. } 10.
11. return 0; 11. return 0;
12. } 12. }

Output: Output:
error: expected ‘}’ before ‘else’ Hi 1
Computer program compilation
C++ program: C++ program:
1. #include <iostream> 1. #include <iostream>
2. using namespace std; 2. using namespace std;
3. int main() 3.
4. { 4. int main()
5. if (true) 5. {
6. { 6. if (true)
7. cout << "Hi 1"; 7. cout << "Hi 1";
8. else 8. else
9. cout << "Hi 2"; 9. cout << "Hi 2";
10. } 10.
11. return 0; 11. return 0;
12. } 12. }

Output: Output:
error: expected ‘}’ before ‘else’ Hi 1

DFA cannot check the syntax of a computer program.


We need context-free grammars – a computational model more
powerful than finite automata to check the syntax of most
structures in a computer program.
Construct CFG for L = {an bn | n ≥ 0}

Problem
Construct a CFG that accepts all strings from the language
L = {an bn | n ≥ 0}
Construct CFG for L = {an bn | n ≥ 0}

Problem
Construct a CFG that accepts all strings from the language
L = {an bn | n ≥ 0}
Solution
Language L = {, ab, aabb, aaabbb, aaaabbbb, . . .}
CFG G.
S → aSb
S→
Construct CFG for L = {an bn | n ≥ 0}
Solution (continued)
CFG G.
S → aSb | 
Accepting . B 1-step computation
S⇒ (∵ S → )
Accepting ab. B 2-steps computation
S ⇒ aSb (∵ S → aSb)
⇒ ab (∵ S → )
Accepting aabb. B 3-steps computation
S ⇒ aSb (∵ S → aSb)
⇒ aaSbb (∵ S → aSb)
⇒ aabb (∵ S → )
Accepting aaabbb. B 4-steps computation
S ⇒ aSb (∵ S → aSb)
⇒ aaSbb (∵ S → aSb)
⇒ aaaSbbb (∵ S → aSb)
⇒ aaabbb (∵ S → )
Construct CFGs

Problems
Construct CFGs to accept all strings from the following languages:
R = a∗
R = a+
R = a∗ b∗
R = a+ b+
R = a∗ ∪ b∗
R = (a ∪ b)∗
R = a∗ b∗ c∗
Construct CFG for palindromes over {a, b}

Problem
Construct a CFG that accepts all strings from the language
L = {w | w = wR and Σ = {a, b}}
Construct CFG for palindromes over {a, b}

Problem
Construct a CFG that accepts all strings from the language
L = {w | w = wR and Σ = {a, b}}
Solution
Language L = {, a, b, aa, bb, aaa, aba, bab, bbb,
aaaa, abba, baab, bbbb, . . .}
CFG G.
S → aSa | bSb | a | b | 
Construct CFG for palindromes over {a, b}

Solution (continued)
CFG G. S → aSa | bSb | a | b | 
Accepting . S ⇒  B 1 step
Accepting a. S ⇒ a
Accepting b. S ⇒ b
Accepting aa. S ⇒ aSa ⇒ aa B 2 steps
Accepting bb. S ⇒ bSb ⇒ bb
Accepting aaa. S ⇒ aSa ⇒ aaa B 2 steps
Accepting aba. S ⇒ aSa ⇒ aba
Accepting bab. S ⇒ bSb ⇒ bab
Accepting bbb. S ⇒ bSb ⇒ bbb
Accepting aaaa. S ⇒ aSa ⇒ aaSaa ⇒ aaaa B 3 steps
Accepting abba. S ⇒ aSa ⇒ abSba ⇒ abba
Accepting baab. S ⇒ bSb ⇒ baSab ⇒ baab
Accepting bbbb. S ⇒ bSb ⇒ bbSbb ⇒ bbbb
Construct CFG for non-palindromes over {a, b}

Problem
Construct a CFG that accepts all strings from the language
L = {w | w 6= wR and Σ = {a, b}}
Construct CFG for non-palindromes over {a, b}

Problem
Construct a CFG that accepts all strings from the language
L = {w | w 6= wR and Σ = {a, b}}
Solution
Language L = {, ab, ba, aab, abb, baa, bba, . . .}
CFG G.
S → aSa | bSb | aAb | bAa
A → Aa | Ab | 
Construct CFG for non-palindromes over {a, b}

Solution (continued)
CFG G.
S → aSa | bSb | aAb | bAa
A → Aa | Ab | 
Accepting abbbbaaba. B 7-step derivation
S ⇒ aSa
⇒ abSba
⇒ abbAaba
⇒ abbAaaba
⇒ abbAbaaba
⇒ abbAbbaaba
⇒ abbbbaaba
What is a context-free grammar (CFG)?

Grammar = A set of rules for a language


Context-free = LHS of productions have only 1 nonterminal

Definition
A context-free grammar (CFG) G is a 4-tuple
G = (N, Σ, S, P ), where,
1. N : A finite set (set of nonterminals/variables).
2. Σ: A finite set (set of terminals).
3. P : A finite set of productions/rules of the form A → α,
A ∈ N, α ∈ (N ∪ Σ)∗ . B Time (computation)
B Space (computer memory)
4. S: The start nonterminal (belongs to N ).
Derivation, acceptance, and rejection

Definitions
Derivation.
αAγ ⇒ αβγ (∵ A → β) B 1-step derivation
Acceptance.
G accepts string w iff
S ⇒∗G w B multistep derivation
Rejection.
G rejects string w iff
S 6⇒∗G w B no derivation
What is a context-free language (CFL)?

Definition
If G = (N, Σ, S, P ) is a CFG, the language generated by G is
L(G) = {w ∈ Σ∗ | S ⇒∗G w}
A language L is a context-free language (CFL) if there is a CFG
G with L = L(G).
Construct CFG for L = {w | na (w) = nb (w)}

Problem
Construct a CFG that accepts all strings from the language
L = {w | na (w) = nb (w)}
Construct CFG for L = {w | na (w) = nb (w)}

Problem
Construct a CFG that accepts all strings from the language
L = {w | na (w) = nb (w)}
Solution
Language L = {, ab, ba, ba, aabb, abab, abba, bbaa, . . .}
CFGs.
1. S → SaSbS | SbSaS | 
2. S → aSbS | bSaS | 
3. S → aSb | bSa | SS | 
Derive the following 4-letter strings from G.
aabb, abab, abba, bbaa, baba, baab
Write G as a 4-tuple.
What is the meaning/interpretation/logic of the grammar?
Construct CFGs

Problem
Construct CFGs that accepts all strings from the following lan-
guages
1. L = {w | na (w) > nb (w)}
2. L = {w | na (w) = 2nb (w)}
3. L = {w | na (w) 6= nb (w)}
Construct CFGs

Problem
Construct CFGs that accepts all strings from the following lan-
guages
1. L = {w | na (w) > nb (w)}
2. L = {w | na (w) = 2nb (w)}
3. L = {w | na (w) 6= nb (w)}
Solutions
1. S → aS | bSS | SSb | SbS | a
2. S → SS | bAA | AbA | AAb | 
A → aS | SaS | Sa | a
3. ?
Union, concatenation, and star are closed on CFL’s

Properties
If L1 and L2 are context-free languages over an alphabet Σ,
then L1 ∪ L2 , L1 L2 , and L∗1 are also CFL’s.
Union, concatenation, and star are closed on CFL’s

Properties
If L1 and L2 are context-free languages over an alphabet Σ,
then L1 ∪ L2 , L1 L2 , and L∗1 are also CFL’s.
Construction
Let G1 = (N1 , Σ, S1 , P1 ) be CFG for L1 .
Let G2 = (N2 , Σ, S2 , P2 ) be CFG for L2 .
Union.
Let Gu = (Nu , Σ, Su , Pu ) be CFG for L1 ∪ L2 .
Nu = N1 ∪ N2 ∪ {Su }; Pu = P1 ∪ P2 ∪ {Su → S1 | S2 }
Concatenation.
Let Gc = (Nc , Σ, Sc , Pc ) be CFG for L1 L2 .
Nu = N1 ∪ N2 ∪ {Sc }; Pc = P1 ∪ P2 ∪ {Sc → S1 S2 }
Kleene star.
Let Gs = (Ns , Σ, Ss , Ps ) be CFG for L∗1 .
Ns = N1 ∪ {Ss }; Ps = P1 ∪ {Ss → Ss S1 | }
Union is closed on CFL’s

Problem
If L1 and L2 are CFL’s then L3 = L1 ∪ L2 is a CFL.
If L1 and L3 = L1 ∪ L2 are CFL’s, is L2 a CFL?
Union is closed on CFL’s

Problem
If L1 and L2 are CFL’s then L3 = L1 ∪ L2 is a CFL.
If L1 and L3 = L1 ∪ L2 are CFL’s, is L2 a CFL?
Solution
L2 may or may not be a CFL.
L1 = Σ∗ B CFL
L3 = L1 ∪ L2 = Σ∗ B CFL
L2 = {an | n is prime} B Non-CFL
Reversal is closed on CFL’s

Property
If L is a CFL, then LR is a CFL.
Reversal is closed on CFL’s

Property
If L is a CFL, then LR is a CFL.
Construction
Let G = (N, Σ, S, P ) be CFG for L.
Let Gr = (N, Σ, S, Pr ) be CFG for LR . Then
Reversal.
Pr = productions from P such that all symbols on the right
hand side of every production is reversed.
i.e., If A → α is in P , then A → αR is in Pr
Example.
Grammar for accepting L is S → aSb | ab.
Grammar for accepting LR is S → bSa | ba.
Intersection is not closed on CFL’s

Problem
Show that L1 , L2 are CFL’s and L = L1 ∩ L2 is a non-CFL.
L = {ai bj ck | i = j and j = k}
= {ai bi ck | i, k ≥ 0} ∩ {ai bj cj | i, j ≥ 0}
L1 ∩ L2
Intersection is not closed on CFL’s

Problem
Show that L1 , L2 are CFL’s and L = L1 ∩ L2 is a non-CFL.
L = {ai bj ck | i = j and j = k}
= {ai bi ck | i, k ≥ 0} ∩ {ai bj cj | i, j ≥ 0}
L1 ∩ L2

Solution
L1 is a CFL.
L1 = {ai bi ck | i, k ≥ 0} = {ai bi | i ≥ 0}{ck | k ≥ 0}
= L3 L4 = CFL (∵ L3 , L4 are CFL’s)
L2 is a CFL.
L2 = {ai bj cj | i, j ≥ 0} = {ai | i ≥ 0} {bj cj | j ≥ 0}
= L5 L6 = CFL (∵ L5 , L6 are CFL’s)
L is a non-CFL.
Use pumping lemma for CFL’s.
Complementation is not closed on CFL’s

Problem
Show that complementation is not closed on CFL’s.
Complementation is not closed on CFL’s

Problem
Show that complementation is not closed on CFL’s.
Solution
Proof by contradiction.
Suppose complementation is closed under CFL’s.
i.e., if L is a CFL, then L̄ is a CFL.
Consider the equation L1 ∩ L2 = (L1 ∪ L2 ).
Closure on complementation implies closure on intersection.
But, intersection is not closed on CFL’s.
Contradiction!
Hence, complementation is not closed on CFL’s.
Complementation is not closed on CFL’s

Problem
Show that L̄ is a CFL and L is a non-CFL.
L̄ = Σ∗ − {ww | w ∈ Σ∗ } = Σ∗ − L
Complementation is not closed on CFL’s

Problem
Show that L̄ is a CFL and L is a non-CFL.
L̄ = Σ∗ − {ww | w ∈ Σ∗ } = Σ∗ − L

Solution
L̄ is a CFL.
S → A | B | AB | BA
A → EAE | a
B → EBE | b
E→a|b
Why does this grammar work?
L is a non-CFL.
Use pumping lemma for CFL’s.
Set difference is not closed on CFL’s

Problem
Show that set difference is not closed on CFL’s.
Set difference is not closed on CFL’s

Problem
Show that set difference is not closed on CFL’s.
Solution
Proof by contradiction.
Suppose set difference is closed under CFL’s.
i.e., if L1 , L2 are CFL’s, then L1 − L2 is a CFL.
Consider the equation L1 ∩ L2 = L1 − (L1 − L2 ).
Closure on set difference implies closure on intersection.
But, intersection is not closed on CFL’s.
Contradiction!
Hence, set difference is not closed on CFL’s.
Summary: Closure properties of CFL’s

Operation Closed on CFL’s?


Union (L1 ∪ L2 ) 3
Concatenation (L1 L2 ) 3
Kleene star (L∗ ) 3
Reversal (LR ) 3
Intersection (L1 ∩ L2 ) 7
Complementation (L̄) 7
Set difference (L1 − L2 ) 7
Construct CFG for L = {ai bj ck | j = i + k}
Problem
Construct a CFG that accepts all strings from the language
L = {ai bj ck | j = i + k}
Construct CFG for L = {ai bj ck | j = i + k}
Problem
Construct a CFG that accepts all strings from the language
L = {ai bj ck | j = i + k}
Solution
Language L = {, ab, bc, a2 b2 , b2 c2 , ab2 c, . . .}
L = {ai bj ck | j = i + k}
= {ai bi+k ck } (∵ substitute for j)
i i k
= {a b b c }k (∵ expand)
= {ai bi }{bk ck } (∵ split the concatenated languages)
= L1 L2
Solve the problem completely by constructing CFG’s for L1 ,
L2 , and then L1 L2 .
Divide-and-conquer. We can solve a complicated problem if
we can break the problem into several simpler subproblems and
solve those simpler problems.
Construct CFG for the variant where j 6= i + k.
Construct CFG for L = {ai bj ck | j 6= i + k}

Problem
Construct a CFG that accepts all strings from the language
L = {ai bj ck | j 6= i + k}
Construct CFG for L = {ai bj ck | j 6= i + k}

Problem
Construct a CFG that accepts all strings from the language
L = {ai bj ck | j 6= i + k}
Solution
Language L = {, a, b, c, ac, a2 , b2 , c2 , . . .}
L = {ai bj ck | j 6= i + k}
= {ai bj ck | j > (i + k)} ∪ {ai bj ck | j < (i + k)}
= L1 ∪ L2
Can we represent L1 and L2 using simpler languages?
Construct CFG for L = {ai bj ck | j 6= i + k}

Solution (continued)
Case 1. L1 = {ai bj ck | j > i + k}
= {ai bj ck | j = i + m + k and m ≥ 1}
= {ai bi+m+k ck | m ≥ 1}
= {ai bi } · {bm | m ≥ 1} · {bk ck }
= {ai bi } · {bbn } · {bk ck }
= L11 · L12 · L13
We know how to construct CFG’s for L11 , L12 , L13
Case 2. L2 = {ai bj ck | j < i + k}
= {ai bj ck | j < i or i ≤ j < i + k}
= {ai bj ck | j < i} ∪ {ai bj ck | i ≤ j < i + k}
= L21 ∪ L22
How to proceed?
Construct CFG for L = {ai bj ck | j 6= i + k}

Solution (continued)
Case 3. L21 = {ai bj ck | j < i}
= {ai bj ck | i = m + j and m ≥ 1}
= {am+j bj ck | m ≥ 1}
= {am | m ≥ 1} · {aj bj } · {ck }
= L211 · L212 · L213
We know how to construct CFG’s for L211 , L212 , L213
Case 4. L22 = {ai bj ck | i ≤ j < i + k}
= {ai bj ck | j ≥ i and k > j − i}
= {ai bi+(j−i) c(j−i)+m | (j − i) ≥ 0 and m ≥ 1}
= {ai bi } · {bj−i cj−i | (j − i) ≥ 0} · {cm | m ≥ 1}
= {ai bi } · {bi ci } · {cm | m ≥ 1}
= L221 · L222 · L223
We know how to construct CFG’s for L221 , L222 , L223
Construct CFG for bba(ab)∗ | (ab | ba∗ b)∗ ba

Problem
Construct a CFG that accepts all strings from the language
correspending to R.E. bba(ab)∗ | (ab | ba∗ b)∗ ba.
Construct CFG for bba(ab)∗ | (ab | ba∗ b)∗ ba

Problem
Construct a CFG that accepts all strings from the language
correspending to R.E. bba(ab)∗ | (ab | ba∗ b)∗ ba.
Solution
Language L = {ba, bba, abba, bbba, . . .}
This is a regular language.
CFG G.
S → S1 | S2
S1 → S1 ab | bba B Generates bba(ab)∗
S2 → T S2 | ba B Generates (ab | ba∗ b)∗ ba
T → ab | bU b B Generates ab | ba∗ b
U → aU |  B Generates a∗
Construct CFG for strings of a DFA
Problem
Construct a CFG that accepts all strings accepted by the fol-
lowing DFA.
a b
a
b
start S A B
b
a
Construct CFG for strings of a DFA
Problem
Construct a CFG that accepts all strings accepted by the fol-
lowing DFA.
a b
a
b
start S A B
b
a

Solution
Language L = {(a | b)∗ ba} B Strings ending with ba
= {ba, aba, bba, aaba, abba, baba, bbba, . . .}
This is a regular language.
How to construct CFG for this DFA?
Approach 1: Compute R.E. Construct CFG for the R.E.
Approach 2: Construct CFG from the DFA using transitions.
Construct CFG for strings of a DFA

Solution (continued)
Idea.
For every transition δ(Q, a) = R, add a production Q → aR.
What does this mean? Why should it work?
Construct CFG for strings of a DFA

Solution (continued)
Idea.
For every transition δ(Q, a) = R, add a production Q → aR.
What does this mean? Why should it work?
CFG. B 3 states = 3 nonterminals
S → aS | bA
A → bA | aB
B → bA | aS |  B -production for halting state
Accepting bbaaba.
b b a a b a
S→− A→ − A− →B− →S→ − A− →B
S ⇒ bA ⇒ bbA ⇒ bbaB ⇒ bbaaS ⇒ bbaabA ⇒ bbaabaB
⇒ bbaaba
What is a regular grammar/language?

Definitions
A context-free grammar G = (N, Σ, S, P ) is called a regular
grammar if every production is of the form A → aB or A → ,
where A, B ∈ N and a ∈ Σ.
A language L ∈ Σ∗ is called a regular language iff L = L(G)
for some regular grammar G.
Construct CFG for understanding human languages

Problem
Construct a CFG to understand some structures in the English
language.
Solution
CFG:
hSentencei → hNounPhrasei hVerbPhrasei
hNounPhrasei → hComplexNouni|hComplexNounihPrepPhrasei
hVerbPhrasei → hComplexVerbi|hComplexVerbihPrepPhrasei
hPrepPhrasei → hPrepi hComplexNouni
hComplexNouni → hArticlei hNouni
hComplexVerbi → hVerbi | hVerbi hNounPhrasei
hArticlei → a | the
hNouni → boy | girl | flower
hVerbi → touches | likes | sees
hPrepi → with
Construct CFG for understanding human languages

Solution (continued)
Accepting “a girl likes”.
hSentencei ⇒ hNounPhraseihVerbPhrasei
⇒ hComplexNounihVerbPhrasei
⇒ hArticleihNounihVerbPhrasei
⇒ a hNounihVerbPhrasei
⇒ a girl hVerbPhrasei
⇒ a girl hComplexVerbi
⇒ a girl hVerbi
⇒ a girl likes
Derive “a girl with a flower likes the boy”.
Construct CFG for strings with valid parentheses

Problem
Construct a CFG that accepts all strings from the language
L = {, (), ()(), (()), ()()(), (()()), ()(()), (())(), ((())), . . .}
Construct CFG for strings with valid parentheses

Problem
Construct a CFG that accepts all strings from the language
L = {, (), ()(), (()), ()()(), (()()), ()(()), (())(), ((())), . . .}
Solution
Applications. Compilers check for syntactic correctness in:
1. Computer programs written by you that possibly contain
nested code blocks with { }, ( ), and [ ].
2. Web pages written by you that contain nested code blocks
with <div></div>, <table></table>, and <ul></ul>.
Language L = {w | w ∈ {(, )}∗ such that n( (w) = n) (w) and
and in any prefix pi<|w| of w, n( (pi ) ≥ n) (pi )}
What is the CFG?
Construct CFG for strings with valid parentheses

Solution (continued)
Multiple correct ways to write the CFG:
1. S → S(S)S | 
2. S → SS | (S) | 
3. S → S(S) | 
4. S → (S)S | 
5. S → SR) | 
R → ( | RR)
6. S → (RS | 
R →) | (RR
Are some CFG’s better than the others?
If so, better in what?
Construct CFG for valid arithmetic expressions

Problem
Construct a CFG that accepts all valid arithmetic expressions
from Σ = {(, ), +, ×, n}, where n represents any integer.
Construct CFG for valid arithmetic expressions

Problem
Construct a CFG that accepts all valid arithmetic expressions
from Σ = {(, ), +, ×, n}, where n represents any integer.
Solution
Language L = {15 + 85, 57 × 3, (27 + 46) × 10, . . .}
Abstraction: Denote n to mean any integer.
Valid expressions: (n + n) + n × n, etc
Invalid expressions: +n, (n+)n, (), n × n), etc
Hint: Use some ideas from the parenthesis problem
Construct CFG for valid arithmetic expressions

Solution (continued)
Multiple correct ways to write the CFG:
1. E → E + E | E × E | ( E ) | n
2. E → E + T | T B expression
T →T ×F | F B term
F →(E )|n B factor
3. E → T E 0
E 0 → +T E 0 | 
T → FT0
T 0 → ×F T 0 | 
F →(E )|n
Can you derive (n × n)?
Are some CFG’s better than the others? If so, better in what?
What is a derivation?

Definition
A derivation in a context-free grammar is a leftmost derivation
(LMD) if, at each step, a production is applied to the leftmost
variable-occurrence in the current string. A rightmost deriva-
tion (RMD) is defined similarly.
Example
CFG: E → E + E | E × E | ( E ) | n
Accepting n + (n).
LMD: E ⇒ E + E ⇒ n + E ⇒ n + (E) ⇒ n + (n)
RMD: E ⇒ E + E ⇒ E + (E) ⇒ E + (n) ⇒ n + (n)
What is an ambiguous grammar?

Definition
A context-free grammar G is ambiguous if for at least one w ∈
L(G), w has more than one derivation tree (or, equivalently,
more than one leftmost derivation).
Intuition: A CFG is ambiguous if it generates a string in several
different ways.
Arithmetic expression: Ambiguous grammar
Problem
Show that the following CFG is ambiguous:
E →E+E | E×E | ( E ) | n
Arithmetic expression: Ambiguous grammar
Problem
Show that the following CFG is ambiguous:
E →E+E | E×E | ( E ) | n
Solution
Consider the strings n + n × n or n + n + n.
There are two derivation trees for each of the strings.
Accepting n + n × n.
LMD 1: E ⇒ E + E ⇒ n + E ⇒ n + E × E ⇒ n + n × E
⇒n+n×n
LMD 2: E ⇒ E × E ⇒ E + E × E ⇒ n + E × E ⇒ n + n × E
⇒n+n×n
Accepting n + n + n.
LMD 1: E ⇒ E + E ⇒ n + E ⇒ n + E + E ⇒ n + n + E
⇒n+n+n
LMD 2: E ⇒ E + E ⇒ E + E + E ⇒ n + E + E ⇒ n + n + E
⇒n+n+n
Arithmetic expression: Ambiguous grammar
Solution (continued)
Two derivation (or parse) trees =⇒ Ambiguity
(Reason 1: The precedence of different operators isn’t enforced.)
LMD 1: E ⇒ E + E ⇒ n + E ⇒ n + E × E ⇒ n + n × E
⇒n+n×n
LMD 2: E ⇒ E × E ⇒ E + E × E ⇒ n + E × E ⇒ n + n × E
⇒n+n×n
E E

E + E E × E

n E × E E + E n

n n n n
Arithmetic expression: Ambiguous grammar
Solution (continued)
Two derivation (or parse) trees =⇒ Ambiguity
(Reason 2: Order of operators of same precedence isn’t enforced.)
LMD 1: E ⇒ E + E ⇒ n + E ⇒ n + E + E ⇒ n + n + E
⇒n+n+n
LMD 2: E ⇒ E + E ⇒ E + E + E ⇒ n + E + E ⇒ n + n + E
⇒n+n+n
E E

E + E E + E

n E + E E + E n

n n n n
Arithmetic expression: Ambiguous grammar

Problem
Consider the following ambiguous grammar:
E →E+E | E×E | ( E ) | n
How many different derivations (or LMDs) are possible for the
string n + n + · · · + n, where n is repeated k times?
Arithmetic expression: Ambiguous grammar

Problem
Consider the following ambiguous grammar:
E →E+E | E×E | ( E ) | n
How many different derivations (or LMDs) are possible for the
string n + n + · · · + n, where n is repeated k times?
Solution
Let d(k) = number of derivations for k operands. Then
d(1) = 1
d(2) = 1
d(3) = 2
d(4) = 5 How?
How do you compute d(k)?
d(k) = k−1
i=1 d(i)d(k − i)
P
If-else ladder: Ambiguous grammar

Problem
Show that the following CFG is ambiguous:
S → if ( E ) S | if ( E ) S else S | O
where, S = statement, E = expression, O = other statement.
Solution
Consider the string: if (e1 ) if (e2 ) F(); else G();
There are two derivation trees for the string.
Can you identify the two derivation trees for the string?
If-else ladder: Ambiguous grammar
Solution (continued)
What is the output of this program?

C++ program:
1. #include <iostream>
2. using namespace std;
3.
4. int main()
5. {
6. if (true)
7. if (false)
8. ;
9. else
10. cout << "Hi!";
11.
12. return 0;
13. }
What is the output of this program?

C++ program:
1. #include <iostream>
2. using namespace std;
3.
4. int main()
5. {
6. if (true)
7. if (false)
8. ;
9. else
10. cout << "Hi!";
11.
12. return 0;
13. }

Output:
Hi!
If-else ladder: Unambiguous grammar

Problem
Can you come up with an unambiguous grammar for the lan-
guage accepted by the following ambiguous grammar?
S → if ( E ) S | if ( E ) S else S | O
where, S = statement, E = expression, O = other statement.
Solution
S → S1 | S2
S1 → if ( E ) S1 else S1 | O
S2 → if ( E ) S | if ( E ) S1 else S2
How do you prove that the grammar is really unambiguous?
What is an inherently ambiguous language?

Definition
A context-free language is called inherently ambiguous if there
exists no unambiguous grammar to generate the language.
What is an inherently ambiguous language?

Definition
A context-free language is called inherently ambiguous if there
exists no unambiguous grammar to generate the language.
Examples
Proofs?
L = {ai bj ck | i = j or j = k}
L = {ai bi cj dj } ∪ {ai bj cj di }
Language generated by a grammar

Problem
Prove that the following grammar G generates all strings of
balanced parentheses and only such strings.
S → (S)S | 
Language generated by a grammar

Problem
Prove that the following grammar G generates all strings of
balanced parentheses and only such strings.
S → (S)S | 
Solution
L(G) = language generated by the grammar G.
L = language of balanced parentheses.
Show that L(G) = L. Two cases.
Case 1. Show that every string derivable from S is balanced.
i.e., L(G) ⊆ L.
Case 2. Show that every balanced string is derivable from S.
i.e., L ⊆ L(G).
Language generated by a grammar

Solution (continued)
Case 1. Show that every string derivable from S is balanced.
Let n = number of steps in derivation.
Basis.
The only string derivable from S in 1 step is  and  is balanced.
Induction.
Suppose all strings with derivation fewer than n steps produce
balanced parentheses.
Consider a LMD of at most n steps.
That derivation must be of the form
S ⇒ (S)S ⇒∗ (x)S ⇒∗ (x)y (LMD)
Derivations of x and y take fewer than n steps.
So, x and y are balanced.
Therefore, the string (x)y must be balanced.
Language generated by a grammar
Solution (continued)
Case 2. Show that every balanced string is derivable from S.
Let 2n = length of a balanced string.
Basis.
A 0-length string is , which is balanced.
Induction.
Assume that every balanced string of length less than 2n is
derivable from S. Consider a balanced string w of length 2n
such that n ≥ 1. String w must begin with a left parenthesis.
Let (x) be the shortest nonempty prefix of w having an equal
number of left and right parentheses. Then, w can be written
as w = (x)y, where, both x and y are balanced. Since x and
y are of length less than 2n, they are derivable from S. Thus,
we can find a derivation of the form
S ⇒ (S)S ⇒∗ (x)S ⇒∗ (x)y (LMD)
proving that w = (x)y must also be derivable from S.
What is Chomsky normal form (CNF)?

Definition
A context-free grammar is said to be in Chomsky normal form
(CNF) if every production is of one of these three types:
A → BC (where B, C are nonterminals and they cannot be
the start nonterminal S)
A → a (where a is a terminal symbol)
S→
Why should we care for CNF?
For every context-free grammar G, there is another CFG GCNF
in Chomsky normal form such that L(GCNF ) = L(G).
Example
S → AA | 
A → AA | a
Converting a CFG to CNF

Algorithm rule Before rule After rule


1. Start nonterminal must S → ASABS S0 → S
not appear on the RHS S → ASABS
2. Remove productions R → ARA R → ARA
like A →  A→a| R → AR | RA | A
A→a
3. Remove productions A→B A → CDD
like A → B B → CDD
4. Convert to CNF A → BCD A → BC 0
C 0 → CD
CFG-to-CNF(G)
1. Start nonterminal must not appear on RHS
2. Remove  productions
3. Remove unit productions
4. Convert to CNF
Converting a CFG to CNF
Problem
Convert the following CFG to CNF.
S → ASA | aB
A→B |S
B→b|
Converting a CFG to CNF
Problem
Convert the following CFG to CNF.
S → ASA | aB
A→B |S
B→b|
Solution
Start nonterminal must not appear on the right hand side
S0 → S
S → ASA | aB
A→B |S
B→b|
Remove B → 
S0 → S
S → ASA | aB | a
A→B |S |
B→b
Converting a CFG to CNF

Solution (continued)
Remove A → 
S0 → S
S → ASA | SA | AS | S | aB | a
A→B |S
B→b
Remove A → B
S0 → S
S → ASA | SA | AS | S | aB | a
A→S |b
B→b
Remove S → S B Do nothing
S0 → S
S → ASA | SA | AS | aB | a
A→S |b
B→b
Converting a CFG to CNF
Solution (continued)
Remove A → S
S0 → S
S → ASA | SA | AS | aB | a
A → ASA | SA | AS | aB | a | b
B→b
Remove S0 → S
S0 → ASA | SA | AS | aB | a
S → ASA | SA | AS | aB | a
A → ASA | SA | AS | aB | a | b
B→b
Convert ASA → AA1
S0 → AA1 | SA | AS | aB | a
S → AA1 | SA | AS | aB | a
A → AA1 | SA | AS | aB | a | b
A1 → SA
B→b
Converting a CFG to CNF

Solution (continued)
Introduce A2 → a
S0 → AA1 | SA | AS | A2 B | a
S → AA1 | SA | AS | A2 B | a
A → AA1 | SA | AS | A2 B | a | b
A1 → SA
A2 → a
B→b
This grammar is now in Chomsky normal form.
What is Griebach normal form (GNF)?

Definition
A context-free grammar is said to be in Griebach normal form
(GNF) if every production is of the following type:
A → aA1 A2 . . . Ad (where a is a terminal symbol and
A1 , A2 , . . . , Ad are nonterminals)
S→ (Not always included)
Why should we care for GNF?
For every context-free grammar G, there is another CFG GGNF
in Griebach normal form such that L(GGNF ) = L(G).
A string of length n has a derivation of exactly n steps.
Example
S → aA | bB
B → bB | b
A → aA | a
Equivalence of different computation models

CFG

Context-Free Recursive
Languages automata

PDA
Pushdown Automata (PDA)
Pushdown automaton

Source: Wikipedia

PDA has access to a stack of unlimited memory


What is a pushdown automaton (PDA)?

Nondetermistic = Events cannot be determined precisely


Pushdown = Using stack of infinite memory
Automaton = Computing machine
What is a pushdown automaton (PDA)?

Nondetermistic = Events cannot be determined precisely


Pushdown = Using stack of infinite memory
Automaton = Computing machine

Definition
A pushdown automaton (PDA) P is a 6-tuple
M = (Q, Σ, Γ, δ, q0 , F ), where,
1. Q: A finite set (set of states).
2. Σ: A finite set (input alphabet).
3. Γ: A finite set (stack alphabet).
4. δ : Q × Σ × Γ → P(Q × Γ ) is the transition function.
B Time (computation)
5. q0 : The start state (belongs to Q).
6. F : The set of accepting/final states, where F ⊆ Q.

Stack B Space (computer memory)


What is a context-free language?

Definition
A PDA M = (Q, Σ, Γ, δ, q0 , F ) accepts a string w ∈ Σ∗ iff

(q0 , w, $) `∗M (qf , , α)

for some α ∈ Γ∗ and some qf ∈ F .


A PDA rejects a string iff it does not accept it.
We say that a PDA M accepts a language L if
L = {w | M accepts w}.
A language is called a context-free language if some PDA ac-
cepts or recognizes it.
Construct PDA for L = {an bn }

Problem
Construct a PDA that accepts all strings from the language
L = {an bn }
Construct PDA for L = {an bn }

Problem
Construct a PDA that accepts all strings from the language
L = {an bn }
Solution
PDA()
1. while next input character is a do
2. push a
3. while next input character is b do
4. pop a
Construct PDA for L = {an bn }

Solution (continued)
Transition (i, s1 → s2 ) means that when you see input charac-
ter i, replace s1 with s2 as the top of stack.
,  → $
start q0 q1 a,  → a

b, a → 

q3 q2 b, a → 
, $ → 
Construct PDA for L = {an bn }

Solution (continued)
PDA P is specified as
Set of states is Q = {q0 , q1 , q2 , q3 }
Set of input symbols is Σ = {a, b}
Set of stack symbols is Γ = {a, $}
Start state is q0
Set of accept states is F = {q0 , q3 }
Transition function δ is: (Empty cell is φ)

Input a b 
Stack a $  a $  a $ 
q0 {(q1 , $)}
q1 {(q1 , a)} {(q2 , )}
q2 {(q2 , )} {(q3 , )}
q3
Construct PDA for L = {an bn }

Solution (continued)
Step State Stack Input Action
1 q0 aaabbb push $
2 q1 $ aaabbb push a
3 q1 $a aabbb push a
4 q1 $aa abbb push a
5 q1 $aaa bbb pop a
6 q2 $aa bb pop a
7 q2 $a b pop a
8 q2 $ pop $
9 q3 accept
Step State Stack Input Action
1 q0 aababb push $
2 q1 $ aababb push a
3 q1 $a ababb push a
4 q1 $aa babb pop a
5 q2 $a abb crash
6 qφ $a bb
7 qφ $a b
8 qφ $a reject
Construct PDA for L = {wwR | w ∈ {a, b}∗ }

Problem
Construct a PDA that accepts all strings from the language
L = {wwR | w ∈ {a, b}∗ }
Construct PDA for L = {wwR | w ∈ {a, b}∗ }

Problem
Construct a PDA that accepts all strings from the language
L = {wwR | w ∈ {a, b}∗ }
Solution
PDA()
1. while next input character is a or b do
2. push the symbol
3. Nondeterministically guess the mid point of the string
4. while next input character is a or b do
5. pop the symbol
Construct PDA for L = {wwR | w ∈ {a, b}∗ }

Problem
Construct a PDA that accepts all strings from the language
L = {wwR | w ∈ {a, b}∗ }
Solution (continued)
a,  → a a, a → 
b,  → b b, b → 
,  → $ ,  →  , $ → 
q0 q1 q2 q3
Construct PDA for L = {ai bj ck | i = j or i = k}

Problem
Construct a PDA that accepts all strings from the language
L = {ai bj ck | i = j or i = k}
Construct PDA for L = {ai bj ck | i = j or i = k}

Problem
Construct a PDA that accepts all strings from the language
L = {ai bj ck | i = j or i = k}
Solution
PDA()
1. while next input character is a do push a
2. Nondeterministically guess whether a’s = b’s or a’s = c’s
Case 1. a’s = b’s.
1. while next input character is b do pop a
2. while next input character is c do nothing
Case 2. a’s = c’s.
1. while next input character is b do nothing
2. while next input character is c do pop a
Construct PDA for L = {ai bj ck | i = j or i = k}

Solution (continued)

a,  → a b, a →  c,  → 

,  → $ ,  →  , $ → 
q0 q1 q2 q3

,  → 

,  →  , $ → 
q4 q5 q6

b,  →  c, a → 
Non-Context-Free Languages
Pumping lemma for context-free languages

Theorem
Suppose L is a context-free language over alphabet Σ. Then
there is a natural number s so that for every long string w ∈ L
satisfying |w| ≥ s, the string w can be split into five strings
w = uvxyz such that the following three conditions are true.
|vxy| ≤ s.
|vy| ≥ 1.
For every i ≥ 0, the string uv i xy i z also belongs to L.
L = {an bn cn } is a non-CFL

Problem
Prove that L = {an bn cn } is not CFL.
L = {an bn cn } is a non-CFL

Problem
Prove that L = {an bn cn } is not CFL.
Solution
Suppose L is CFL. Then it must satisfy pumping property.
Suppose w = as bs cs .
Let w = uvxyz where |vxy| ≤ s and |vy| ≥ 1.
Then uv i xy i z must belong to L for all i ≥ 0.
We will show that uxz 6∈ L for all possible cases.
Three cases:
Case 1. vxy consists of exactly 1 symbol (a’s or b’s or c’s).
Case 2. vxy consist of exactly 2 symbols (ab’s or bc’s).
Case 3. vxy consist of exactly 3 symbols (abc’s).
This case is impossible. Why?
L = {an bn cn } is a non-CFL

Solution (continued)
Case 1. vxy consists of exactly 1 symbol (a’s or b’s or c’s).
Three subcases:
Subcase i. vxy consists only of a’s.
Let w = uvxyz = as bs cs .
uxz is not in L.
Reason: uxz = as−(|v|+|y|) bs cs 6∈ L as (|v| + |y|) > 0.
uxz has fewer a’s than b’s or c’s.
Subcase ii. vxy consists only of b’s.
Similar to Subcase i.
Subcase iii. vxy consists only of c’s.
Similar to Subcase i.
L = {an bn cn } is a non-CFL

Solution (continued)
Case 2. vxy consist of exactly 2 symbols (ab’s or bc’s).
Two subcases:
Subcase i. vxy consist only of a’s and b’s.
Let w = uvxyz = as bs cs .
uxz is not in L.
Reason: uxz = ak1 bk2 cs 6∈ L
where k1 + k2 = 2s − (|v| + |y|) < 2s as (|v| + |y|) > 0.
uxz has either fewer a’s or fewer b’s than c’s.
Subcase ii. vxy consist only of b’s and c’s.
Similar to Subcase i.
L = {ww | w ∈ {a, b}∗ } is a non-CFL

Problem
Prove that L = {ww | w ∈ {a, b}∗ } is not CFL.
L = {ww | w ∈ {a, b}∗ } is a non-CFL

Problem
Prove that L = {ww | w ∈ {a, b}∗ } is not CFL.
Solution
Suppose L is CFL. Then it must satisfy pumping property.
Suppose w = as bs as bs .
Let w = uvxyz where |vxy| ≤ s and |vy| ≥ 1.
Then uv i xy i z must belong to L for all i ≥ 0.
We will show that uxz 6∈ L for all possible cases.
Two cases:
Case 1. vxy consists of exactly 1 symbol (a’s or b’s).
Case 2. vxy consist of exactly 2 symbols (ab’s or ba’s).
L = {ww | w ∈ {a, b}∗ } is a non-CFL

Solution (continued)
Case 1. vxy consists of exactly 1 symbol (a’s or b’s).
Three subcases:
Subcase i. vxy consists only of a’s.
Let w = uvxyz = as bs as bs .
uxz is not in L.
Reason: uxz = as−(|v|+|y|) bs as bs 6∈ L as (|v| + |y|) > 0.
uxz has fewer a’s than b’s.
Subcase ii. vxy consists only of b’s.
Similar to Subcase i.
L = {ww | w ∈ {a, b}∗ } is a non-CFL

Solution (continued)
Case 2. vxy consist of exactly 2 symbols (ab’s or ba’s).
Two subcases:
Subcase i. vxy consist only of a’s and b’s.
Let w = uvxyz = as bs as bs .
uxz is not in L.
Reason: uxz = ak1 bk2 as bs 6∈ L
where k1 + k2 = 2s − (|v| + |y|) < 2s as (|v| + |y|) > 0.
uxz is not in the form of ww.
Subcase ii. vxy consist only of b’s and a’s.
Similar to Subcase i.
L = {an | n is a square} is a non-CFL

Problem
Prove that L = {an | n is a square} is not CFL.
L = {an | n is a square} is a non-CFL

Problem
Prove that L = {an | n is a square} is not CFL.
Solution
Suppose L is CFL. Then it must satisfy pumping property.
2
Suppose w = as .
Let w = uvxyz where |vxy| ≤ s and |vy| ≥ 1.
Then uv i xy i z must belong to L for all i ≥ 0.
But, uv 2 xy 2 z 6∈ L.
Reason: Let |vy| = k. Then, k ∈ [1, s].
2 2
uv 2 xy 2 z = as +|vy| = as +k 6∈ L.
Because, s2 < s2 + k < (s + 1)2 as k ∈ [1, s].
Contradiction! Hence, L is not CFL.
L = {an | n is a power of 2} is a non-CFL

Problem
Prove that L = {an | n is a power of 2} is not CFL.
L = {an | n is a power of 2} is a non-CFL

Problem
Prove that L = {an | n is a power of 2} is not CFL.
Solution
Suppose L is CFL. Then it must satisfy pumping property.
s
Suppose w = a2 , where s is the pumping length.
Let w = uvxyz where |vxy| ≤ s and |vy| ≥ 1.
Then uv i xy i z must belong to L for all i ≥ 0.
But, uv 2 xy 2 z 6∈ L.
Reason: Let |vy| = k, where k ∈ [1, s].
s
Then, uv 2 xy 2 z = a2 +k 6∈ L.
Because, 2s < 2s + k < 2s+1 .
Contradiction! Hence, L is not CFL.
L = {an | n is prime} is a non-CFL

Problem
Prove that L = {an | n is prime} is not CFL.
L = {an | n is prime} is a non-CFL

Problem
Prove that L = {an | n is prime} is not CFL.
Solution
Suppose L is CFL. Then it must satisfy pumping property.
Suppose w = am , where m is prime and m ≥ s.
Let w = uvxyz where |vxy| ≤ s and |vy| ≥ 1.
Then uv i xy i z must belong to L for all i ≥ 0.
But, uv m+1 xy m+1 z 6∈ L.
Reason: Let |vy| = k. Then, k ∈ [1, s].
uv m+1 xy m+1 z = am+m|vy| = am+mk = am(k+1) 6∈ L.
Contradiction! Hence, L is not CFL.
Membership problem: A decision problem on CFL’s

Problem
Given a CFG G and a string w, is w ∈ L(G)?
Membership problem: A decision problem on CFL’s

Problem
Given a CFG G and a string w, is w ∈ L(G)?
Solution
This is a difficult problem. Why?
Nondeterminism cannot be eliminated unlike in finite automata.
Algorithmically solvable.
CYK algorithm (for grammars in CNF)
Earley parser
GLR parser
More decision problems involving CFL’s

Decision problems
Algorithmically solvable.
Given a CFG G, is L(G) nonempty?
Given a CFG G, is L(G) infinite?
Given a CFG G, is G a regular grammar?
Given a CFG G, is L(G) a regular language?
Algorithmically unsolvable.
Given a CFG G, is L(G) = Σ∗ ?
Given a CFG G, is G ambiguous?
Given a CFG G, is L(G) inherently ambiguous?
Given two CFG’s G1 and G2 , is L(G1 ) = L(G2 )?
Given two CFG’s G1 and G2 , is L(G1 ) ⊆ L(G2 )?
Given two CFG’s G1 and G2 , is L(G1 ) ∩ L(G2 ) nonempty?

You might also like