Context Free Grammars
Context Free Grammars
(Context-Free Grammars)
Pramod Ganapathi
Department of Computer Science
State University of New York at Stony Brook
Contents
Context-Free Grammars (CFG)
Context-Free Languages
Pushdown Automata (PDA)
Transformations
Pumping Lemma
Context-Free Grammars (CFG)
Computer program compilation
C++ program: C++ program:
1. #include <iostream> 1. #include <iostream>
2. using namespace std; 2. using namespace std;
3. int main() 3.
4. { 4. int main()
5. if (true) 5. {
6. { 6. if (true)
7. cout << "Hi 1"; 7. cout << "Hi 1";
8. else 8. else
9. cout << "Hi 2"; 9. cout << "Hi 2";
10. } 10.
11. return 0; 11. return 0;
12. } 12. }
Computer program compilation
C++ program: C++ program:
1. #include <iostream> 1. #include <iostream>
2. using namespace std; 2. using namespace std;
3. int main() 3.
4. { 4. int main()
5. if (true) 5. {
6. { 6. if (true)
7. cout << "Hi 1"; 7. cout << "Hi 1";
8. else 8. else
9. cout << "Hi 2"; 9. cout << "Hi 2";
10. } 10.
11. return 0; 11. return 0;
12. } 12. }
Output: Output:
error: expected ‘}’ before ‘else’ Hi 1
Computer program compilation
C++ program: C++ program:
1. #include <iostream> 1. #include <iostream>
2. using namespace std; 2. using namespace std;
3. int main() 3.
4. { 4. int main()
5. if (true) 5. {
6. { 6. if (true)
7. cout << "Hi 1"; 7. cout << "Hi 1";
8. else 8. else
9. cout << "Hi 2"; 9. cout << "Hi 2";
10. } 10.
11. return 0; 11. return 0;
12. } 12. }
Output: Output:
error: expected ‘}’ before ‘else’ Hi 1
Problem
Construct a CFG that accepts all strings from the language
L = {an bn | n ≥ 0}
Construct CFG for L = {an bn | n ≥ 0}
Problem
Construct a CFG that accepts all strings from the language
L = {an bn | n ≥ 0}
Solution
Language L = {, ab, aabb, aaabbb, aaaabbbb, . . .}
CFG G.
S → aSb
S→
Construct CFG for L = {an bn | n ≥ 0}
Solution (continued)
CFG G.
S → aSb |
Accepting . B 1-step computation
S⇒ (∵ S → )
Accepting ab. B 2-steps computation
S ⇒ aSb (∵ S → aSb)
⇒ ab (∵ S → )
Accepting aabb. B 3-steps computation
S ⇒ aSb (∵ S → aSb)
⇒ aaSbb (∵ S → aSb)
⇒ aabb (∵ S → )
Accepting aaabbb. B 4-steps computation
S ⇒ aSb (∵ S → aSb)
⇒ aaSbb (∵ S → aSb)
⇒ aaaSbbb (∵ S → aSb)
⇒ aaabbb (∵ S → )
Construct CFGs
Problems
Construct CFGs to accept all strings from the following languages:
R = a∗
R = a+
R = a∗ b∗
R = a+ b+
R = a∗ ∪ b∗
R = (a ∪ b)∗
R = a∗ b∗ c∗
Construct CFG for palindromes over {a, b}
Problem
Construct a CFG that accepts all strings from the language
L = {w | w = wR and Σ = {a, b}}
Construct CFG for palindromes over {a, b}
Problem
Construct a CFG that accepts all strings from the language
L = {w | w = wR and Σ = {a, b}}
Solution
Language L = {, a, b, aa, bb, aaa, aba, bab, bbb,
aaaa, abba, baab, bbbb, . . .}
CFG G.
S → aSa | bSb | a | b |
Construct CFG for palindromes over {a, b}
Solution (continued)
CFG G. S → aSa | bSb | a | b |
Accepting . S ⇒ B 1 step
Accepting a. S ⇒ a
Accepting b. S ⇒ b
Accepting aa. S ⇒ aSa ⇒ aa B 2 steps
Accepting bb. S ⇒ bSb ⇒ bb
Accepting aaa. S ⇒ aSa ⇒ aaa B 2 steps
Accepting aba. S ⇒ aSa ⇒ aba
Accepting bab. S ⇒ bSb ⇒ bab
Accepting bbb. S ⇒ bSb ⇒ bbb
Accepting aaaa. S ⇒ aSa ⇒ aaSaa ⇒ aaaa B 3 steps
Accepting abba. S ⇒ aSa ⇒ abSba ⇒ abba
Accepting baab. S ⇒ bSb ⇒ baSab ⇒ baab
Accepting bbbb. S ⇒ bSb ⇒ bbSbb ⇒ bbbb
Construct CFG for non-palindromes over {a, b}
Problem
Construct a CFG that accepts all strings from the language
L = {w | w 6= wR and Σ = {a, b}}
Construct CFG for non-palindromes over {a, b}
Problem
Construct a CFG that accepts all strings from the language
L = {w | w 6= wR and Σ = {a, b}}
Solution
Language L = {, ab, ba, aab, abb, baa, bba, . . .}
CFG G.
S → aSa | bSb | aAb | bAa
A → Aa | Ab |
Construct CFG for non-palindromes over {a, b}
Solution (continued)
CFG G.
S → aSa | bSb | aAb | bAa
A → Aa | Ab |
Accepting abbbbaaba. B 7-step derivation
S ⇒ aSa
⇒ abSba
⇒ abbAaba
⇒ abbAaaba
⇒ abbAbaaba
⇒ abbAbbaaba
⇒ abbbbaaba
What is a context-free grammar (CFG)?
Definition
A context-free grammar (CFG) G is a 4-tuple
G = (N, Σ, S, P ), where,
1. N : A finite set (set of nonterminals/variables).
2. Σ: A finite set (set of terminals).
3. P : A finite set of productions/rules of the form A → α,
A ∈ N, α ∈ (N ∪ Σ)∗ . B Time (computation)
B Space (computer memory)
4. S: The start nonterminal (belongs to N ).
Derivation, acceptance, and rejection
Definitions
Derivation.
αAγ ⇒ αβγ (∵ A → β) B 1-step derivation
Acceptance.
G accepts string w iff
S ⇒∗G w B multistep derivation
Rejection.
G rejects string w iff
S 6⇒∗G w B no derivation
What is a context-free language (CFL)?
Definition
If G = (N, Σ, S, P ) is a CFG, the language generated by G is
L(G) = {w ∈ Σ∗ | S ⇒∗G w}
A language L is a context-free language (CFL) if there is a CFG
G with L = L(G).
Construct CFG for L = {w | na (w) = nb (w)}
Problem
Construct a CFG that accepts all strings from the language
L = {w | na (w) = nb (w)}
Construct CFG for L = {w | na (w) = nb (w)}
Problem
Construct a CFG that accepts all strings from the language
L = {w | na (w) = nb (w)}
Solution
Language L = {, ab, ba, ba, aabb, abab, abba, bbaa, . . .}
CFGs.
1. S → SaSbS | SbSaS |
2. S → aSbS | bSaS |
3. S → aSb | bSa | SS |
Derive the following 4-letter strings from G.
aabb, abab, abba, bbaa, baba, baab
Write G as a 4-tuple.
What is the meaning/interpretation/logic of the grammar?
Construct CFGs
Problem
Construct CFGs that accepts all strings from the following lan-
guages
1. L = {w | na (w) > nb (w)}
2. L = {w | na (w) = 2nb (w)}
3. L = {w | na (w) 6= nb (w)}
Construct CFGs
Problem
Construct CFGs that accepts all strings from the following lan-
guages
1. L = {w | na (w) > nb (w)}
2. L = {w | na (w) = 2nb (w)}
3. L = {w | na (w) 6= nb (w)}
Solutions
1. S → aS | bSS | SSb | SbS | a
2. S → SS | bAA | AbA | AAb |
A → aS | SaS | Sa | a
3. ?
Union, concatenation, and star are closed on CFL’s
Properties
If L1 and L2 are context-free languages over an alphabet Σ,
then L1 ∪ L2 , L1 L2 , and L∗1 are also CFL’s.
Union, concatenation, and star are closed on CFL’s
Properties
If L1 and L2 are context-free languages over an alphabet Σ,
then L1 ∪ L2 , L1 L2 , and L∗1 are also CFL’s.
Construction
Let G1 = (N1 , Σ, S1 , P1 ) be CFG for L1 .
Let G2 = (N2 , Σ, S2 , P2 ) be CFG for L2 .
Union.
Let Gu = (Nu , Σ, Su , Pu ) be CFG for L1 ∪ L2 .
Nu = N1 ∪ N2 ∪ {Su }; Pu = P1 ∪ P2 ∪ {Su → S1 | S2 }
Concatenation.
Let Gc = (Nc , Σ, Sc , Pc ) be CFG for L1 L2 .
Nu = N1 ∪ N2 ∪ {Sc }; Pc = P1 ∪ P2 ∪ {Sc → S1 S2 }
Kleene star.
Let Gs = (Ns , Σ, Ss , Ps ) be CFG for L∗1 .
Ns = N1 ∪ {Ss }; Ps = P1 ∪ {Ss → Ss S1 | }
Union is closed on CFL’s
Problem
If L1 and L2 are CFL’s then L3 = L1 ∪ L2 is a CFL.
If L1 and L3 = L1 ∪ L2 are CFL’s, is L2 a CFL?
Union is closed on CFL’s
Problem
If L1 and L2 are CFL’s then L3 = L1 ∪ L2 is a CFL.
If L1 and L3 = L1 ∪ L2 are CFL’s, is L2 a CFL?
Solution
L2 may or may not be a CFL.
L1 = Σ∗ B CFL
L3 = L1 ∪ L2 = Σ∗ B CFL
L2 = {an | n is prime} B Non-CFL
Reversal is closed on CFL’s
Property
If L is a CFL, then LR is a CFL.
Reversal is closed on CFL’s
Property
If L is a CFL, then LR is a CFL.
Construction
Let G = (N, Σ, S, P ) be CFG for L.
Let Gr = (N, Σ, S, Pr ) be CFG for LR . Then
Reversal.
Pr = productions from P such that all symbols on the right
hand side of every production is reversed.
i.e., If A → α is in P , then A → αR is in Pr
Example.
Grammar for accepting L is S → aSb | ab.
Grammar for accepting LR is S → bSa | ba.
Intersection is not closed on CFL’s
Problem
Show that L1 , L2 are CFL’s and L = L1 ∩ L2 is a non-CFL.
L = {ai bj ck | i = j and j = k}
= {ai bi ck | i, k ≥ 0} ∩ {ai bj cj | i, j ≥ 0}
L1 ∩ L2
Intersection is not closed on CFL’s
Problem
Show that L1 , L2 are CFL’s and L = L1 ∩ L2 is a non-CFL.
L = {ai bj ck | i = j and j = k}
= {ai bi ck | i, k ≥ 0} ∩ {ai bj cj | i, j ≥ 0}
L1 ∩ L2
Solution
L1 is a CFL.
L1 = {ai bi ck | i, k ≥ 0} = {ai bi | i ≥ 0}{ck | k ≥ 0}
= L3 L4 = CFL (∵ L3 , L4 are CFL’s)
L2 is a CFL.
L2 = {ai bj cj | i, j ≥ 0} = {ai | i ≥ 0} {bj cj | j ≥ 0}
= L5 L6 = CFL (∵ L5 , L6 are CFL’s)
L is a non-CFL.
Use pumping lemma for CFL’s.
Complementation is not closed on CFL’s
Problem
Show that complementation is not closed on CFL’s.
Complementation is not closed on CFL’s
Problem
Show that complementation is not closed on CFL’s.
Solution
Proof by contradiction.
Suppose complementation is closed under CFL’s.
i.e., if L is a CFL, then L̄ is a CFL.
Consider the equation L1 ∩ L2 = (L1 ∪ L2 ).
Closure on complementation implies closure on intersection.
But, intersection is not closed on CFL’s.
Contradiction!
Hence, complementation is not closed on CFL’s.
Complementation is not closed on CFL’s
Problem
Show that L̄ is a CFL and L is a non-CFL.
L̄ = Σ∗ − {ww | w ∈ Σ∗ } = Σ∗ − L
Complementation is not closed on CFL’s
Problem
Show that L̄ is a CFL and L is a non-CFL.
L̄ = Σ∗ − {ww | w ∈ Σ∗ } = Σ∗ − L
Solution
L̄ is a CFL.
S → A | B | AB | BA
A → EAE | a
B → EBE | b
E→a|b
Why does this grammar work?
L is a non-CFL.
Use pumping lemma for CFL’s.
Set difference is not closed on CFL’s
Problem
Show that set difference is not closed on CFL’s.
Set difference is not closed on CFL’s
Problem
Show that set difference is not closed on CFL’s.
Solution
Proof by contradiction.
Suppose set difference is closed under CFL’s.
i.e., if L1 , L2 are CFL’s, then L1 − L2 is a CFL.
Consider the equation L1 ∩ L2 = L1 − (L1 − L2 ).
Closure on set difference implies closure on intersection.
But, intersection is not closed on CFL’s.
Contradiction!
Hence, set difference is not closed on CFL’s.
Summary: Closure properties of CFL’s
Problem
Construct a CFG that accepts all strings from the language
L = {ai bj ck | j 6= i + k}
Construct CFG for L = {ai bj ck | j 6= i + k}
Problem
Construct a CFG that accepts all strings from the language
L = {ai bj ck | j 6= i + k}
Solution
Language L = {, a, b, c, ac, a2 , b2 , c2 , . . .}
L = {ai bj ck | j 6= i + k}
= {ai bj ck | j > (i + k)} ∪ {ai bj ck | j < (i + k)}
= L1 ∪ L2
Can we represent L1 and L2 using simpler languages?
Construct CFG for L = {ai bj ck | j 6= i + k}
Solution (continued)
Case 1. L1 = {ai bj ck | j > i + k}
= {ai bj ck | j = i + m + k and m ≥ 1}
= {ai bi+m+k ck | m ≥ 1}
= {ai bi } · {bm | m ≥ 1} · {bk ck }
= {ai bi } · {bbn } · {bk ck }
= L11 · L12 · L13
We know how to construct CFG’s for L11 , L12 , L13
Case 2. L2 = {ai bj ck | j < i + k}
= {ai bj ck | j < i or i ≤ j < i + k}
= {ai bj ck | j < i} ∪ {ai bj ck | i ≤ j < i + k}
= L21 ∪ L22
How to proceed?
Construct CFG for L = {ai bj ck | j 6= i + k}
Solution (continued)
Case 3. L21 = {ai bj ck | j < i}
= {ai bj ck | i = m + j and m ≥ 1}
= {am+j bj ck | m ≥ 1}
= {am | m ≥ 1} · {aj bj } · {ck }
= L211 · L212 · L213
We know how to construct CFG’s for L211 , L212 , L213
Case 4. L22 = {ai bj ck | i ≤ j < i + k}
= {ai bj ck | j ≥ i and k > j − i}
= {ai bi+(j−i) c(j−i)+m | (j − i) ≥ 0 and m ≥ 1}
= {ai bi } · {bj−i cj−i | (j − i) ≥ 0} · {cm | m ≥ 1}
= {ai bi } · {bi ci } · {cm | m ≥ 1}
= L221 · L222 · L223
We know how to construct CFG’s for L221 , L222 , L223
Construct CFG for bba(ab)∗ | (ab | ba∗ b)∗ ba
Problem
Construct a CFG that accepts all strings from the language
correspending to R.E. bba(ab)∗ | (ab | ba∗ b)∗ ba.
Construct CFG for bba(ab)∗ | (ab | ba∗ b)∗ ba
Problem
Construct a CFG that accepts all strings from the language
correspending to R.E. bba(ab)∗ | (ab | ba∗ b)∗ ba.
Solution
Language L = {ba, bba, abba, bbba, . . .}
This is a regular language.
CFG G.
S → S1 | S2
S1 → S1 ab | bba B Generates bba(ab)∗
S2 → T S2 | ba B Generates (ab | ba∗ b)∗ ba
T → ab | bU b B Generates ab | ba∗ b
U → aU | B Generates a∗
Construct CFG for strings of a DFA
Problem
Construct a CFG that accepts all strings accepted by the fol-
lowing DFA.
a b
a
b
start S A B
b
a
Construct CFG for strings of a DFA
Problem
Construct a CFG that accepts all strings accepted by the fol-
lowing DFA.
a b
a
b
start S A B
b
a
Solution
Language L = {(a | b)∗ ba} B Strings ending with ba
= {ba, aba, bba, aaba, abba, baba, bbba, . . .}
This is a regular language.
How to construct CFG for this DFA?
Approach 1: Compute R.E. Construct CFG for the R.E.
Approach 2: Construct CFG from the DFA using transitions.
Construct CFG for strings of a DFA
Solution (continued)
Idea.
For every transition δ(Q, a) = R, add a production Q → aR.
What does this mean? Why should it work?
Construct CFG for strings of a DFA
Solution (continued)
Idea.
For every transition δ(Q, a) = R, add a production Q → aR.
What does this mean? Why should it work?
CFG. B 3 states = 3 nonterminals
S → aS | bA
A → bA | aB
B → bA | aS | B -production for halting state
Accepting bbaaba.
b b a a b a
S→− A→ − A− →B− →S→ − A− →B
S ⇒ bA ⇒ bbA ⇒ bbaB ⇒ bbaaS ⇒ bbaabA ⇒ bbaabaB
⇒ bbaaba
What is a regular grammar/language?
Definitions
A context-free grammar G = (N, Σ, S, P ) is called a regular
grammar if every production is of the form A → aB or A → ,
where A, B ∈ N and a ∈ Σ.
A language L ∈ Σ∗ is called a regular language iff L = L(G)
for some regular grammar G.
Construct CFG for understanding human languages
Problem
Construct a CFG to understand some structures in the English
language.
Solution
CFG:
hSentencei → hNounPhrasei hVerbPhrasei
hNounPhrasei → hComplexNouni|hComplexNounihPrepPhrasei
hVerbPhrasei → hComplexVerbi|hComplexVerbihPrepPhrasei
hPrepPhrasei → hPrepi hComplexNouni
hComplexNouni → hArticlei hNouni
hComplexVerbi → hVerbi | hVerbi hNounPhrasei
hArticlei → a | the
hNouni → boy | girl | flower
hVerbi → touches | likes | sees
hPrepi → with
Construct CFG for understanding human languages
Solution (continued)
Accepting “a girl likes”.
hSentencei ⇒ hNounPhraseihVerbPhrasei
⇒ hComplexNounihVerbPhrasei
⇒ hArticleihNounihVerbPhrasei
⇒ a hNounihVerbPhrasei
⇒ a girl hVerbPhrasei
⇒ a girl hComplexVerbi
⇒ a girl hVerbi
⇒ a girl likes
Derive “a girl with a flower likes the boy”.
Construct CFG for strings with valid parentheses
Problem
Construct a CFG that accepts all strings from the language
L = {, (), ()(), (()), ()()(), (()()), ()(()), (())(), ((())), . . .}
Construct CFG for strings with valid parentheses
Problem
Construct a CFG that accepts all strings from the language
L = {, (), ()(), (()), ()()(), (()()), ()(()), (())(), ((())), . . .}
Solution
Applications. Compilers check for syntactic correctness in:
1. Computer programs written by you that possibly contain
nested code blocks with { }, ( ), and [ ].
2. Web pages written by you that contain nested code blocks
with <div></div>, <table></table>, and <ul></ul>.
Language L = {w | w ∈ {(, )}∗ such that n( (w) = n) (w) and
and in any prefix pi<|w| of w, n( (pi ) ≥ n) (pi )}
What is the CFG?
Construct CFG for strings with valid parentheses
Solution (continued)
Multiple correct ways to write the CFG:
1. S → S(S)S |
2. S → SS | (S) |
3. S → S(S) |
4. S → (S)S |
5. S → SR) |
R → ( | RR)
6. S → (RS |
R →) | (RR
Are some CFG’s better than the others?
If so, better in what?
Construct CFG for valid arithmetic expressions
Problem
Construct a CFG that accepts all valid arithmetic expressions
from Σ = {(, ), +, ×, n}, where n represents any integer.
Construct CFG for valid arithmetic expressions
Problem
Construct a CFG that accepts all valid arithmetic expressions
from Σ = {(, ), +, ×, n}, where n represents any integer.
Solution
Language L = {15 + 85, 57 × 3, (27 + 46) × 10, . . .}
Abstraction: Denote n to mean any integer.
Valid expressions: (n + n) + n × n, etc
Invalid expressions: +n, (n+)n, (), n × n), etc
Hint: Use some ideas from the parenthesis problem
Construct CFG for valid arithmetic expressions
Solution (continued)
Multiple correct ways to write the CFG:
1. E → E + E | E × E | ( E ) | n
2. E → E + T | T B expression
T →T ×F | F B term
F →(E )|n B factor
3. E → T E 0
E 0 → +T E 0 |
T → FT0
T 0 → ×F T 0 |
F →(E )|n
Can you derive (n × n)?
Are some CFG’s better than the others? If so, better in what?
What is a derivation?
Definition
A derivation in a context-free grammar is a leftmost derivation
(LMD) if, at each step, a production is applied to the leftmost
variable-occurrence in the current string. A rightmost deriva-
tion (RMD) is defined similarly.
Example
CFG: E → E + E | E × E | ( E ) | n
Accepting n + (n).
LMD: E ⇒ E + E ⇒ n + E ⇒ n + (E) ⇒ n + (n)
RMD: E ⇒ E + E ⇒ E + (E) ⇒ E + (n) ⇒ n + (n)
What is an ambiguous grammar?
Definition
A context-free grammar G is ambiguous if for at least one w ∈
L(G), w has more than one derivation tree (or, equivalently,
more than one leftmost derivation).
Intuition: A CFG is ambiguous if it generates a string in several
different ways.
Arithmetic expression: Ambiguous grammar
Problem
Show that the following CFG is ambiguous:
E →E+E | E×E | ( E ) | n
Arithmetic expression: Ambiguous grammar
Problem
Show that the following CFG is ambiguous:
E →E+E | E×E | ( E ) | n
Solution
Consider the strings n + n × n or n + n + n.
There are two derivation trees for each of the strings.
Accepting n + n × n.
LMD 1: E ⇒ E + E ⇒ n + E ⇒ n + E × E ⇒ n + n × E
⇒n+n×n
LMD 2: E ⇒ E × E ⇒ E + E × E ⇒ n + E × E ⇒ n + n × E
⇒n+n×n
Accepting n + n + n.
LMD 1: E ⇒ E + E ⇒ n + E ⇒ n + E + E ⇒ n + n + E
⇒n+n+n
LMD 2: E ⇒ E + E ⇒ E + E + E ⇒ n + E + E ⇒ n + n + E
⇒n+n+n
Arithmetic expression: Ambiguous grammar
Solution (continued)
Two derivation (or parse) trees =⇒ Ambiguity
(Reason 1: The precedence of different operators isn’t enforced.)
LMD 1: E ⇒ E + E ⇒ n + E ⇒ n + E × E ⇒ n + n × E
⇒n+n×n
LMD 2: E ⇒ E × E ⇒ E + E × E ⇒ n + E × E ⇒ n + n × E
⇒n+n×n
E E
E + E E × E
n E × E E + E n
n n n n
Arithmetic expression: Ambiguous grammar
Solution (continued)
Two derivation (or parse) trees =⇒ Ambiguity
(Reason 2: Order of operators of same precedence isn’t enforced.)
LMD 1: E ⇒ E + E ⇒ n + E ⇒ n + E + E ⇒ n + n + E
⇒n+n+n
LMD 2: E ⇒ E + E ⇒ E + E + E ⇒ n + E + E ⇒ n + n + E
⇒n+n+n
E E
E + E E + E
n E + E E + E n
n n n n
Arithmetic expression: Ambiguous grammar
Problem
Consider the following ambiguous grammar:
E →E+E | E×E | ( E ) | n
How many different derivations (or LMDs) are possible for the
string n + n + · · · + n, where n is repeated k times?
Arithmetic expression: Ambiguous grammar
Problem
Consider the following ambiguous grammar:
E →E+E | E×E | ( E ) | n
How many different derivations (or LMDs) are possible for the
string n + n + · · · + n, where n is repeated k times?
Solution
Let d(k) = number of derivations for k operands. Then
d(1) = 1
d(2) = 1
d(3) = 2
d(4) = 5 How?
How do you compute d(k)?
d(k) = k−1
i=1 d(i)d(k − i)
P
If-else ladder: Ambiguous grammar
Problem
Show that the following CFG is ambiguous:
S → if ( E ) S | if ( E ) S else S | O
where, S = statement, E = expression, O = other statement.
Solution
Consider the string: if (e1 ) if (e2 ) F(); else G();
There are two derivation trees for the string.
Can you identify the two derivation trees for the string?
If-else ladder: Ambiguous grammar
Solution (continued)
What is the output of this program?
C++ program:
1. #include <iostream>
2. using namespace std;
3.
4. int main()
5. {
6. if (true)
7. if (false)
8. ;
9. else
10. cout << "Hi!";
11.
12. return 0;
13. }
What is the output of this program?
C++ program:
1. #include <iostream>
2. using namespace std;
3.
4. int main()
5. {
6. if (true)
7. if (false)
8. ;
9. else
10. cout << "Hi!";
11.
12. return 0;
13. }
Output:
Hi!
If-else ladder: Unambiguous grammar
Problem
Can you come up with an unambiguous grammar for the lan-
guage accepted by the following ambiguous grammar?
S → if ( E ) S | if ( E ) S else S | O
where, S = statement, E = expression, O = other statement.
Solution
S → S1 | S2
S1 → if ( E ) S1 else S1 | O
S2 → if ( E ) S | if ( E ) S1 else S2
How do you prove that the grammar is really unambiguous?
What is an inherently ambiguous language?
Definition
A context-free language is called inherently ambiguous if there
exists no unambiguous grammar to generate the language.
What is an inherently ambiguous language?
Definition
A context-free language is called inherently ambiguous if there
exists no unambiguous grammar to generate the language.
Examples
Proofs?
L = {ai bj ck | i = j or j = k}
L = {ai bi cj dj } ∪ {ai bj cj di }
Language generated by a grammar
Problem
Prove that the following grammar G generates all strings of
balanced parentheses and only such strings.
S → (S)S |
Language generated by a grammar
Problem
Prove that the following grammar G generates all strings of
balanced parentheses and only such strings.
S → (S)S |
Solution
L(G) = language generated by the grammar G.
L = language of balanced parentheses.
Show that L(G) = L. Two cases.
Case 1. Show that every string derivable from S is balanced.
i.e., L(G) ⊆ L.
Case 2. Show that every balanced string is derivable from S.
i.e., L ⊆ L(G).
Language generated by a grammar
Solution (continued)
Case 1. Show that every string derivable from S is balanced.
Let n = number of steps in derivation.
Basis.
The only string derivable from S in 1 step is and is balanced.
Induction.
Suppose all strings with derivation fewer than n steps produce
balanced parentheses.
Consider a LMD of at most n steps.
That derivation must be of the form
S ⇒ (S)S ⇒∗ (x)S ⇒∗ (x)y (LMD)
Derivations of x and y take fewer than n steps.
So, x and y are balanced.
Therefore, the string (x)y must be balanced.
Language generated by a grammar
Solution (continued)
Case 2. Show that every balanced string is derivable from S.
Let 2n = length of a balanced string.
Basis.
A 0-length string is , which is balanced.
Induction.
Assume that every balanced string of length less than 2n is
derivable from S. Consider a balanced string w of length 2n
such that n ≥ 1. String w must begin with a left parenthesis.
Let (x) be the shortest nonempty prefix of w having an equal
number of left and right parentheses. Then, w can be written
as w = (x)y, where, both x and y are balanced. Since x and
y are of length less than 2n, they are derivable from S. Thus,
we can find a derivation of the form
S ⇒ (S)S ⇒∗ (x)S ⇒∗ (x)y (LMD)
proving that w = (x)y must also be derivable from S.
What is Chomsky normal form (CNF)?
Definition
A context-free grammar is said to be in Chomsky normal form
(CNF) if every production is of one of these three types:
A → BC (where B, C are nonterminals and they cannot be
the start nonterminal S)
A → a (where a is a terminal symbol)
S→
Why should we care for CNF?
For every context-free grammar G, there is another CFG GCNF
in Chomsky normal form such that L(GCNF ) = L(G).
Example
S → AA |
A → AA | a
Converting a CFG to CNF
Solution (continued)
Remove A →
S0 → S
S → ASA | SA | AS | S | aB | a
A→B |S
B→b
Remove A → B
S0 → S
S → ASA | SA | AS | S | aB | a
A→S |b
B→b
Remove S → S B Do nothing
S0 → S
S → ASA | SA | AS | aB | a
A→S |b
B→b
Converting a CFG to CNF
Solution (continued)
Remove A → S
S0 → S
S → ASA | SA | AS | aB | a
A → ASA | SA | AS | aB | a | b
B→b
Remove S0 → S
S0 → ASA | SA | AS | aB | a
S → ASA | SA | AS | aB | a
A → ASA | SA | AS | aB | a | b
B→b
Convert ASA → AA1
S0 → AA1 | SA | AS | aB | a
S → AA1 | SA | AS | aB | a
A → AA1 | SA | AS | aB | a | b
A1 → SA
B→b
Converting a CFG to CNF
Solution (continued)
Introduce A2 → a
S0 → AA1 | SA | AS | A2 B | a
S → AA1 | SA | AS | A2 B | a
A → AA1 | SA | AS | A2 B | a | b
A1 → SA
A2 → a
B→b
This grammar is now in Chomsky normal form.
What is Griebach normal form (GNF)?
Definition
A context-free grammar is said to be in Griebach normal form
(GNF) if every production is of the following type:
A → aA1 A2 . . . Ad (where a is a terminal symbol and
A1 , A2 , . . . , Ad are nonterminals)
S→ (Not always included)
Why should we care for GNF?
For every context-free grammar G, there is another CFG GGNF
in Griebach normal form such that L(GGNF ) = L(G).
A string of length n has a derivation of exactly n steps.
Example
S → aA | bB
B → bB | b
A → aA | a
Equivalence of different computation models
CFG
Context-Free Recursive
Languages automata
PDA
Pushdown Automata (PDA)
Pushdown automaton
Source: Wikipedia
Definition
A pushdown automaton (PDA) P is a 6-tuple
M = (Q, Σ, Γ, δ, q0 , F ), where,
1. Q: A finite set (set of states).
2. Σ: A finite set (input alphabet).
3. Γ: A finite set (stack alphabet).
4. δ : Q × Σ × Γ → P(Q × Γ ) is the transition function.
B Time (computation)
5. q0 : The start state (belongs to Q).
6. F : The set of accepting/final states, where F ⊆ Q.
Definition
A PDA M = (Q, Σ, Γ, δ, q0 , F ) accepts a string w ∈ Σ∗ iff
Problem
Construct a PDA that accepts all strings from the language
L = {an bn }
Construct PDA for L = {an bn }
Problem
Construct a PDA that accepts all strings from the language
L = {an bn }
Solution
PDA()
1. while next input character is a do
2. push a
3. while next input character is b do
4. pop a
Construct PDA for L = {an bn }
Solution (continued)
Transition (i, s1 → s2 ) means that when you see input charac-
ter i, replace s1 with s2 as the top of stack.
, → $
start q0 q1 a, → a
b, a →
q3 q2 b, a →
, $ →
Construct PDA for L = {an bn }
Solution (continued)
PDA P is specified as
Set of states is Q = {q0 , q1 , q2 , q3 }
Set of input symbols is Σ = {a, b}
Set of stack symbols is Γ = {a, $}
Start state is q0
Set of accept states is F = {q0 , q3 }
Transition function δ is: (Empty cell is φ)
Input a b
Stack a $ a $ a $
q0 {(q1 , $)}
q1 {(q1 , a)} {(q2 , )}
q2 {(q2 , )} {(q3 , )}
q3
Construct PDA for L = {an bn }
Solution (continued)
Step State Stack Input Action
1 q0 aaabbb push $
2 q1 $ aaabbb push a
3 q1 $a aabbb push a
4 q1 $aa abbb push a
5 q1 $aaa bbb pop a
6 q2 $aa bb pop a
7 q2 $a b pop a
8 q2 $ pop $
9 q3 accept
Step State Stack Input Action
1 q0 aababb push $
2 q1 $ aababb push a
3 q1 $a ababb push a
4 q1 $aa babb pop a
5 q2 $a abb crash
6 qφ $a bb
7 qφ $a b
8 qφ $a reject
Construct PDA for L = {wwR | w ∈ {a, b}∗ }
Problem
Construct a PDA that accepts all strings from the language
L = {wwR | w ∈ {a, b}∗ }
Construct PDA for L = {wwR | w ∈ {a, b}∗ }
Problem
Construct a PDA that accepts all strings from the language
L = {wwR | w ∈ {a, b}∗ }
Solution
PDA()
1. while next input character is a or b do
2. push the symbol
3. Nondeterministically guess the mid point of the string
4. while next input character is a or b do
5. pop the symbol
Construct PDA for L = {wwR | w ∈ {a, b}∗ }
Problem
Construct a PDA that accepts all strings from the language
L = {wwR | w ∈ {a, b}∗ }
Solution (continued)
a, → a a, a →
b, → b b, b →
, → $ , → , $ →
q0 q1 q2 q3
Construct PDA for L = {ai bj ck | i = j or i = k}
Problem
Construct a PDA that accepts all strings from the language
L = {ai bj ck | i = j or i = k}
Construct PDA for L = {ai bj ck | i = j or i = k}
Problem
Construct a PDA that accepts all strings from the language
L = {ai bj ck | i = j or i = k}
Solution
PDA()
1. while next input character is a do push a
2. Nondeterministically guess whether a’s = b’s or a’s = c’s
Case 1. a’s = b’s.
1. while next input character is b do pop a
2. while next input character is c do nothing
Case 2. a’s = c’s.
1. while next input character is b do nothing
2. while next input character is c do pop a
Construct PDA for L = {ai bj ck | i = j or i = k}
Solution (continued)
a, → a b, a → c, →
, → $ , → , $ →
q0 q1 q2 q3
, →
, → , $ →
q4 q5 q6
b, → c, a →
Non-Context-Free Languages
Pumping lemma for context-free languages
Theorem
Suppose L is a context-free language over alphabet Σ. Then
there is a natural number s so that for every long string w ∈ L
satisfying |w| ≥ s, the string w can be split into five strings
w = uvxyz such that the following three conditions are true.
|vxy| ≤ s.
|vy| ≥ 1.
For every i ≥ 0, the string uv i xy i z also belongs to L.
L = {an bn cn } is a non-CFL
Problem
Prove that L = {an bn cn } is not CFL.
L = {an bn cn } is a non-CFL
Problem
Prove that L = {an bn cn } is not CFL.
Solution
Suppose L is CFL. Then it must satisfy pumping property.
Suppose w = as bs cs .
Let w = uvxyz where |vxy| ≤ s and |vy| ≥ 1.
Then uv i xy i z must belong to L for all i ≥ 0.
We will show that uxz 6∈ L for all possible cases.
Three cases:
Case 1. vxy consists of exactly 1 symbol (a’s or b’s or c’s).
Case 2. vxy consist of exactly 2 symbols (ab’s or bc’s).
Case 3. vxy consist of exactly 3 symbols (abc’s).
This case is impossible. Why?
L = {an bn cn } is a non-CFL
Solution (continued)
Case 1. vxy consists of exactly 1 symbol (a’s or b’s or c’s).
Three subcases:
Subcase i. vxy consists only of a’s.
Let w = uvxyz = as bs cs .
uxz is not in L.
Reason: uxz = as−(|v|+|y|) bs cs 6∈ L as (|v| + |y|) > 0.
uxz has fewer a’s than b’s or c’s.
Subcase ii. vxy consists only of b’s.
Similar to Subcase i.
Subcase iii. vxy consists only of c’s.
Similar to Subcase i.
L = {an bn cn } is a non-CFL
Solution (continued)
Case 2. vxy consist of exactly 2 symbols (ab’s or bc’s).
Two subcases:
Subcase i. vxy consist only of a’s and b’s.
Let w = uvxyz = as bs cs .
uxz is not in L.
Reason: uxz = ak1 bk2 cs 6∈ L
where k1 + k2 = 2s − (|v| + |y|) < 2s as (|v| + |y|) > 0.
uxz has either fewer a’s or fewer b’s than c’s.
Subcase ii. vxy consist only of b’s and c’s.
Similar to Subcase i.
L = {ww | w ∈ {a, b}∗ } is a non-CFL
Problem
Prove that L = {ww | w ∈ {a, b}∗ } is not CFL.
L = {ww | w ∈ {a, b}∗ } is a non-CFL
Problem
Prove that L = {ww | w ∈ {a, b}∗ } is not CFL.
Solution
Suppose L is CFL. Then it must satisfy pumping property.
Suppose w = as bs as bs .
Let w = uvxyz where |vxy| ≤ s and |vy| ≥ 1.
Then uv i xy i z must belong to L for all i ≥ 0.
We will show that uxz 6∈ L for all possible cases.
Two cases:
Case 1. vxy consists of exactly 1 symbol (a’s or b’s).
Case 2. vxy consist of exactly 2 symbols (ab’s or ba’s).
L = {ww | w ∈ {a, b}∗ } is a non-CFL
Solution (continued)
Case 1. vxy consists of exactly 1 symbol (a’s or b’s).
Three subcases:
Subcase i. vxy consists only of a’s.
Let w = uvxyz = as bs as bs .
uxz is not in L.
Reason: uxz = as−(|v|+|y|) bs as bs 6∈ L as (|v| + |y|) > 0.
uxz has fewer a’s than b’s.
Subcase ii. vxy consists only of b’s.
Similar to Subcase i.
L = {ww | w ∈ {a, b}∗ } is a non-CFL
Solution (continued)
Case 2. vxy consist of exactly 2 symbols (ab’s or ba’s).
Two subcases:
Subcase i. vxy consist only of a’s and b’s.
Let w = uvxyz = as bs as bs .
uxz is not in L.
Reason: uxz = ak1 bk2 as bs 6∈ L
where k1 + k2 = 2s − (|v| + |y|) < 2s as (|v| + |y|) > 0.
uxz is not in the form of ww.
Subcase ii. vxy consist only of b’s and a’s.
Similar to Subcase i.
L = {an | n is a square} is a non-CFL
Problem
Prove that L = {an | n is a square} is not CFL.
L = {an | n is a square} is a non-CFL
Problem
Prove that L = {an | n is a square} is not CFL.
Solution
Suppose L is CFL. Then it must satisfy pumping property.
2
Suppose w = as .
Let w = uvxyz where |vxy| ≤ s and |vy| ≥ 1.
Then uv i xy i z must belong to L for all i ≥ 0.
But, uv 2 xy 2 z 6∈ L.
Reason: Let |vy| = k. Then, k ∈ [1, s].
2 2
uv 2 xy 2 z = as +|vy| = as +k 6∈ L.
Because, s2 < s2 + k < (s + 1)2 as k ∈ [1, s].
Contradiction! Hence, L is not CFL.
L = {an | n is a power of 2} is a non-CFL
Problem
Prove that L = {an | n is a power of 2} is not CFL.
L = {an | n is a power of 2} is a non-CFL
Problem
Prove that L = {an | n is a power of 2} is not CFL.
Solution
Suppose L is CFL. Then it must satisfy pumping property.
s
Suppose w = a2 , where s is the pumping length.
Let w = uvxyz where |vxy| ≤ s and |vy| ≥ 1.
Then uv i xy i z must belong to L for all i ≥ 0.
But, uv 2 xy 2 z 6∈ L.
Reason: Let |vy| = k, where k ∈ [1, s].
s
Then, uv 2 xy 2 z = a2 +k 6∈ L.
Because, 2s < 2s + k < 2s+1 .
Contradiction! Hence, L is not CFL.
L = {an | n is prime} is a non-CFL
Problem
Prove that L = {an | n is prime} is not CFL.
L = {an | n is prime} is a non-CFL
Problem
Prove that L = {an | n is prime} is not CFL.
Solution
Suppose L is CFL. Then it must satisfy pumping property.
Suppose w = am , where m is prime and m ≥ s.
Let w = uvxyz where |vxy| ≤ s and |vy| ≥ 1.
Then uv i xy i z must belong to L for all i ≥ 0.
But, uv m+1 xy m+1 z 6∈ L.
Reason: Let |vy| = k. Then, k ∈ [1, s].
uv m+1 xy m+1 z = am+m|vy| = am+mk = am(k+1) 6∈ L.
Contradiction! Hence, L is not CFL.
Membership problem: A decision problem on CFL’s
Problem
Given a CFG G and a string w, is w ∈ L(G)?
Membership problem: A decision problem on CFL’s
Problem
Given a CFG G and a string w, is w ∈ L(G)?
Solution
This is a difficult problem. Why?
Nondeterminism cannot be eliminated unlike in finite automata.
Algorithmically solvable.
CYK algorithm (for grammars in CNF)
Earley parser
GLR parser
More decision problems involving CFL’s
Decision problems
Algorithmically solvable.
Given a CFG G, is L(G) nonempty?
Given a CFG G, is L(G) infinite?
Given a CFG G, is G a regular grammar?
Given a CFG G, is L(G) a regular language?
Algorithmically unsolvable.
Given a CFG G, is L(G) = Σ∗ ?
Given a CFG G, is G ambiguous?
Given a CFG G, is L(G) inherently ambiguous?
Given two CFG’s G1 and G2 , is L(G1 ) = L(G2 )?
Given two CFG’s G1 and G2 , is L(G1 ) ⊆ L(G2 )?
Given two CFG’s G1 and G2 , is L(G1 ) ∩ L(G2 ) nonempty?