Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
9 views42 pages

Transitive Closure

This document defines context-free grammars and context-free languages. It begins by defining a context-free grammar as a quadruple consisting of a vocabulary set, terminal symbol set, start symbol, and set of productions. It then discusses derivations using productions as rewrite rules to generate strings from the start symbol. Finally, it defines the language generated by a context-free grammar as the set of all strings that can be derived from the start symbol.

Uploaded by

studentiiitv69
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views42 pages

Transitive Closure

This document defines context-free grammars and context-free languages. It begins by defining a context-free grammar as a quadruple consisting of a vocabulary set, terminal symbol set, start symbol, and set of productions. It then discusses derivations using productions as rewrite rules to generate strings from the start symbol. Finally, it defines the language generated by a context-free grammar as the set of all strings that can be derived from the start symbol.

Uploaded by

studentiiitv69
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 42

Chapter 3

Context-Free Grammars,
Context-Free Languages, Parse Trees
and Ogden’s Lemma

3.1 Context-Free Grammars


A context-free grammar basically consists of a finite set of grammar rules. In order to define
grammar rules, we assume that we have two kinds of symbols: the terminals, which are the
symbols of the alphabet underlying the languages under consideration, and the nonterminals,
which behave like variables ranging over strings of terminals. A rule is of the form A → α,
where A is a single nonterminal, and the right-hand side α is a string of terminal and/or
nonterminal symbols. As usual, first we need to define what the object is (a context-free
grammar), and then we need to explain how it is used. Unlike automata, grammars are used
to generate strings, rather than recognize strings.

Definition 3.1.1 A context-free grammar (for short, CFG) is a quadruple G = (V, Σ, P, S),
where

• V is a finite set of symbols called the vocabulary (or set of grammar symbols);

• Σ ⊆ V is the set of terminal symbols (for short, terminals);

• S ∈ (V − Σ) is a designated symbol called the start symbol ;

• P ⊆ (V − Σ) × V ∗ is a finite set of productions (or rewrite rules, or rules).

The set N = V − Σ is called the set of nonterminal symbols (for short, nonterminals). Thus,
P ⊆ N × V ∗ , and every production &A, α' is also denoted as A → α. A production of the
form A → " is called an epsilon rule, or null rule.

45
46 CHAPTER 3. CONTEXT-FREE GRAMMARS AND LANGUAGES

Remark : Context-free grammars are sometimes defined as G = (VN , VT , P, S). The


correspondence with our definition is that Σ = VT and N = VN , so that V = VN ∪ VT . Thus,
in this other definition, it is necessary to assume that VT ∩ VN = ∅.
Example 1. G1 = ({E, a, b}, {a, b}, P, E), where P is the set of rules
E −→ aEb,
E −→ ab.

As we will see shortly, this grammar generates the language L1 = {an bn | n ≥ 1}, which
is not regular.
Example 2. G2 = ({E, +, ∗, (, ), a}, {+, ∗, (, ), a}, P, E), where P is the set of rules
E −→ E + E,
E −→ E ∗ E,
E −→ (E),
E −→ a.

This grammar generates a set of arithmetic expressions.

3.2 Derivations and Context-Free Languages


The productions of a grammar are used to derive strings. In this process, the productions
are used as rewrite rules. Formally, we define the derivation relation associated with a
context-free grammar. First, let us review the concepts of transitive closure and reflexive
and transitive closure of a binary relation.
Given a set A, a binary relation R on A is any set of ordered pairs, i.e. R ⊆ A × A. For
short, instead of binary relation, we often simply say relation. Given any two relations R, S
on A, their composition R ◦ S is defined as
R ◦ S = {(x, y) ∈ A × A | ∃z ∈ A, (x, z) ∈ R and (z, y) ∈ S}.
The identity relation IA on A is the relation IA defined such that
IA = {(x, x) | x ∈ A}.
For short, we often denote IA as I. Note that
R◦I =I ◦R=R
for every relation R on A. Given a relation R on A, for any n ≥ 0 we define Rn as follows:
R0 = I,
Rn+1 = Rn ◦ R.
3.2. DERIVATIONS AND CONTEXT-FREE LANGUAGES 47

It is obvious that R1 = R. It is also easily verified by induction that Rn ◦ R = R ◦ Rn .


The transitive closure R+ of the relation R is defined as
!
R+ = Rn .
n≥1

It is easily verified that R+ is the smallest transitive relation containing R, and that
(x, y) ∈ R+ iff there is some n ≥ 1 and some x0 , x1 , . . . , xn ∈ A such that x0 = x, xn = y,
and (xi , xi+1 ) ∈ R for all i, 0 ≤ i ≤ n − 1. The transitive and reflexive closure R∗ of the
relation R is defined as !
R∗ = Rn .
n≥0

Clearly, R∗ = R+ ∪ I. It is easily verified that R∗ is the smallest transitive and reflexive


relation containing R.

Definition 3.2.1 Given a context-free grammar G = (V, Σ, P, S), the (one-step) derivation
relation =⇒G associated with G is the binary relation =⇒G ⊆ V ∗ × V ∗ defined as follows:
for all α, β ∈ V ∗ , we have
α =⇒G β
iff there exist λ, ρ ∈ V ∗ , and some production (A → γ) ∈ P , such that

α = λAρ and β = λγρ.


+
The transitive closure of =⇒G is denoted as =⇒G and the reflexive and transitive closure of

=⇒G is denoted as =⇒G .

When the grammar G is clear from the context, we usually omit the subscript G in =⇒G ,
+ ∗
=⇒G , and =⇒G .

A string α ∈ V ∗ such that S =⇒ α is called a sentential form, and a string w ∈ Σ∗ such
∗ ∗
that S =⇒ w is called a sentence. A derivation α =⇒ β involving n steps is denoted as
n
α =⇒ β.
Note that a derivation step
α =⇒G β
is rather nondeterministic. Indeed, one can choose among various occurrences of nontermi-
nals A in α, and also among various productions A → γ with left-hand side A.
For example, using the grammar G1 = ({E, a, b}, {a, b}, P, E), where P is the set of rules

E −→ aEb,
E −→ ab,
48 CHAPTER 3. CONTEXT-FREE GRAMMARS AND LANGUAGES

every derivation from E is of the form



E =⇒ an Ebn =⇒ an abbn = an+1 bn+1 ,

or

E =⇒ an Ebn =⇒ an aEbbn = an+1 Ebn+1 ,
where n ≥ 0.
Grammar G1 is very simple: every string an bn has a unique derivation. This is usually
not the case. For example, using the grammar G2 = ({E, +, ∗, (, ), a}, {+, ∗, (, ), a}, P, E),
where P is the set of rules

E −→ E + E,
E −→ E ∗ E,
E −→ (E),
E −→ a,

the string a + a ∗ a has the following distinct derivations, where the boldface indicates which
occurrence of E is rewritten:

E =⇒ E ∗ E =⇒ E + E ∗ E
=⇒ a + E ∗ E =⇒ a + a ∗ E =⇒ a + a ∗ a,

and

E =⇒ E + E =⇒ a + E
=⇒ a + E ∗ E =⇒ a + a ∗ E =⇒ a + a ∗ a.

In the above derivations, the leftmost occurrence of a nonterminal is chosen at each step.
Such derivations are called leftmost derivations. We could systematically rewrite the right-
most occurrence of a nonterminal, getting rightmost derivations. The string a + a ∗ a also
has the following two rightmost derivations, where the boldface indicates which occurrence
of E is rewritten:

E =⇒ E + E =⇒ E + E ∗ E
=⇒ E + E ∗ a =⇒ E + a ∗ a =⇒ a + a ∗ a,

and

E =⇒ E ∗ E =⇒ E ∗ a
=⇒ E + E ∗ a =⇒ E + a ∗ a =⇒ a + a ∗ a.

The language generated by a context-free grammar is defined as follows.


3.2. DERIVATIONS AND CONTEXT-FREE LANGUAGES 49

Definition 3.2.2 Given a context-free grammar G = (V, Σ, P, S), the language generated
by G is the set
+
L(G) = {w ∈ Σ∗ | S =⇒ w}.
A language L ⊆ Σ∗ is a context-free language (for short, CFL) iff L = L(G) for some
context-free grammar G.

It is technically very useful to consider derivations in which the leftmost nonterminal is


always selected for rewriting, and dually, derivations in which the rightmost nonterminal is
always selected for rewriting.

Definition 3.2.3 Given a context-free grammar G = (V, Σ, P, S), the (one-step) leftmost
derivation relation =⇒ associated with G is the binary relation =⇒ ⊆ V ∗ × V ∗ defined as
lm lm
follows: for all α, β ∈ V ∗ , we have
α =⇒ β
lm
∗ ∗
iff there exist u ∈ Σ , ρ ∈ V , and some production (A → γ) ∈ P , such that

α = uAρ and β = uγρ.


+
The transitive closure of =⇒ is denoted as =⇒ and the reflexive and transitive closure of
lm lm

=⇒ is denoted as =⇒. The (one-step) rightmost derivation relation =⇒ associated with
lm lm rm
G is the binary relation =⇒ ⊆ V ∗ × V ∗ defined as follows: for all α, β ∈ V ∗ , we have
rm

α =⇒ β
rm

iff there exist λ ∈ V ∗ , v ∈ Σ∗ , and some production (A → γ) ∈ P , such that

α = λAv and β = λγv.


+
The transitive closure of =⇒ is denoted as =⇒ and the reflexive and transitive closure of
rm rm

=⇒ is denoted as =⇒.
rm rm

Remarks: It is customary to use the symbols a, b, c, d, e for terminal symbols, and the
symbols A, B, C, D, E for nonterminal symbols. The symbols u, v, w, x, y, z denote terminal
strings, and the symbols α, β, γ, λ, ρ, µ denote strings in V ∗ . The symbols X, Y, Z usually
denote symbols in V .
Given a context-free grammar G = (V, Σ, P, S), parsing a string w consists in finding
out whether w ∈ L(G), and if so, in producing a derivation for w. The following lemma is
technically very important. It shows that leftmost and rightmost derivations are “universal”.
This has some important practical implications for the complexity of parsing algorithms.
50 CHAPTER 3. CONTEXT-FREE GRAMMARS AND LANGUAGES

Lemma 3.2.4 Let G = (V, Σ, P, S) be a context-free grammar. For every w ∈ Σ∗ , for


+ +
every derivation S =⇒ w, there is a leftmost derivation S =⇒ w, and there is a rightmost
lm
+
derivation S =⇒ w.
rm

Proof . Of course, we have to somehow use induction on derivations, but this is a little
tricky, and it is necessary to prove a stronger fact. We treat leftmost derivations, rightmost
derivations being handled in a similar way.
n
Claim: For every w ∈ Σ∗ , for every α ∈ V + , for every n ≥ 1, if α =⇒ w, then there is a
n
leftmost derivation α =⇒ w.
lm

The claim is proved by induction on n.


For n = 1, there exist some λ, ρ ∈ V ∗ and some production A → γ, such that α = λAρ
and w = λγρ. Since w is a terminal string, λ, ρ, and γ, are terminal strings. Thus, A is the
1
only nonterminal in α, and the derivation step α =⇒ w is a leftmost step (and a rightmost
step!).
n
If n > 1, then the derivation α =⇒ w is of the form
n−1
α =⇒ α1 =⇒ w.

There are two subcases.


Case 1. If the derivation step α =⇒ α1 is a leftmost step α =⇒ α1 , by the induction
lm
n−1
hypothesis, there is a leftmost derivation α1 =⇒ w, and we get the leftmost derivation
lm

n−1
α =⇒ α1 =⇒ w.
lm lm

Case 2. The derivation step α =⇒ α1 is a not a leftmost step. In this case, there must
be some u ∈ Σ∗ , µ, ρ ∈ V ∗ , some nonterminals A and B, and some production B → δ, such
that
α = uAµBρ and α1 = uAµδρ,
n−1
where A is the leftmost nonterminal in α. Since we have a derivation α1 =⇒ w of length
n − 1, by the induction hypothesis, there is a leftmost derivation
n−1
α1 =⇒ w.
lm

Since α1 = uAµδρ where A is the leftmost terminal in α1 , the first step in the leftmost
n−1
derivation α1 =⇒ w is of the form
lm

uAµδρ =⇒ uγµδρ,
lm
3.2. DERIVATIONS AND CONTEXT-FREE LANGUAGES 51

for some production A → γ. Thus, we have a derivation of the form


n−2
α = uAµBρ =⇒ uAµδρ =⇒ uγµδρ =⇒ w.
lm lm

We can commute the first two steps involving the productions B → δ and A → γ, and we
get the derivation
n−2
α = uAµBρ =⇒ uγµBρ =⇒ uγµδρ =⇒ w.
lm lm
This may no longer be a leftmost derivation, but the first step is leftmost, and we are
back in case 1. Thus, we conclude by applying the induction hypothesis to the derivation
n−1
uγµBρ =⇒ w, as in case 1.
Lemma 3.2.4 implies that
+ +
L(G) = {w ∈ Σ∗ | S =⇒ w} = {w ∈ Σ∗ | S =⇒ w}.
lm rm

We observed that if we consider the grammar G2 = ({E, +, ∗, (, ), a}, {+, ∗, (, ), a}, P, E),
where P is the set of rules
E −→ E + E,
E −→ E ∗ E,
E −→ (E),
E −→ a,
the string a + a ∗ a has the following two distinct leftmost derivations, where the boldface
indicates which occurrence of E is rewritten:
E =⇒ E ∗ E =⇒ E + E ∗ E
=⇒ a + E ∗ E =⇒ a + a ∗ E =⇒ a + a ∗ a,
and
E =⇒ E + E =⇒ a + E
=⇒ a + E ∗ E =⇒ a + a ∗ E =⇒ a + a ∗ a.
When this happens, we say that we have an ambiguous grammars. In some cases, it is
possible to modify a grammar to make it unambiguous. For example, the grammar G2 can
be modified as follows.
Let G3 = ({E, T, F, +, ∗, (, ), a}, {+, ∗, (, ), a}, P, E), where P is the set of rules
E −→ E + T,
E −→ T,
T −→ T ∗ F,
T −→ F,
F −→ (E),
F −→ a.
52 CHAPTER 3. CONTEXT-FREE GRAMMARS AND LANGUAGES

We leave as an exercise to show that L(G3 ) = L(G2 ), and that every string in L(G3 ) has
a unique leftmost derivation. Unfortunately, it is not always possible to modify a context-
free grammar to make it unambiguous. There exist context-free languages that have no
unambiguous context-free grammars. For example, the language
L3 = {am bm cn | m, n ≥ 1} ∪ {am bn cn | m, n ≥ 1}
is context-free, since it is generated by the following context-free grammar:
S → S1 ,
S → S2 ,
S1 → XC,
S2 → AY,
X → aXb,
X → ab,
Y → bY c,
Y → bc,
A → aA,
A → a,
C → cC,
C → c.
However, it can be shown that L3 has no unambiguous grammars. All this motivates the
following definition.
Definition 3.2.5 A context-free grammar G = (V, Σ, P, S) is ambiguous if there is some
string w ∈ L(G) that has two distinct leftmost derivations (or two distinct rightmost deriva-
tions). Thus, a grammar G is unambiguous if every string w ∈ L(G) has a unique leftmost
derivation (or a unique rightmost derivation). A context-free language L is inherently am-
biguous if every CFG G for L is ambiguous.

Whether or not a grammar is ambiguous affects the complexity of parsing. Parsing algo-
rithms for unambiguous grammars are more efficient than parsing algorithms for ambiguous
grammars.
We now consider various normal forms for context-free grammars.

3.3 Normal Forms for Context-Free Grammars, Chom-


sky Normal Form
One of the main goals of this section is to show that every CFG G can be converted to an
equivalent grammar in Chomsky Normal Form (for short, CNF). A context-free grammar
3.3. NORMAL FORMS FOR CONTEXT-FREE GRAMMARS 53

G = (V, Σ, P, S) is in Chomsky Normal Form iff its productions are of the form

A → BC,
A → a, or
S → ",

where A, B, C ∈ N, a ∈ Σ, S → " is in P iff " ∈ L(G), and S does not occur on the
right-hand side of any production.
Note that a grammar in Chomsky Normal Form does not have "-rules, i.e., rules of the
form A → ", except when " ∈ L(G), in which case S → " is the only "-rule. It also does not
have chain rules, i.e., rules of the form A → B, where A, B ∈ N. Thus, in order to convert
a grammar to Chomsky Normal Form, we need to show how to eliminate "-rules and chain
rules. This is not the end of the story, since we may still have rules of the form A → α where
either |α| ≥ 3 or |α| ≥ 2 and α contains terminals. However, dealing with such rules is a
simple recoding matter, and we first focus on the elimination of "-rules and chain rules. It
turns out that "-rules must be eliminated first.
The first step to eliminate "-rules is to compute the set E(G) of erasable (or nullable)
nonterminals
+
E(G) = {A ∈ N | A =⇒ "}.
The set E(G) is computed using a sequence of approximations Ei defined as follows:

E0 = {A ∈ N | (A → ") ∈ P },
Ei+1 = Ei ∪ {A | ∃(A → B1 . . . Bj . . . Bk ) ∈ P, Bj ∈ Ei , 1 ≤ j ≤ k}.

Clearly, the Ei form an ascending chain

E0 ⊆ E1 ⊆ · · · ⊆ Ei ⊆ Ei+1 ⊆ · · · ⊆ N,

and since N is finite, there is a least i, say i0 , such that Ei0 = Ei0 +1 . We claim that
E(G) = Ei0 . Actually, we prove the following lemma.

Lemma 3.3.1 Given a context-free grammar G = (V, Σ, P, S), one can construct a context-
free grammar G$ = (V $ , Σ, P $, S $ ) such that:

(1) L(G$ ) = L(G);

(2) P $ contains no "-rules other than S $ → ", and S $ → " ∈ P $ iff " ∈ L(G);

(3) S $ does not occur on the right-hand side of any production in P $.

Proof . We begin by proving that E(G) = Ei0 . For this, we prove that E(G) ⊆ Ei0 and
Ei0 ⊆ E(G).
54 CHAPTER 3. CONTEXT-FREE GRAMMARS AND LANGUAGES

To prove that Ei0 ⊆ E(G), we proceed by induction on i. Since E0 = {A ∈ N | (A →


1
") ∈ P }, we have A =⇒ ", and thus A ∈ E(G). By the induction hypothesis, Ei ⊆
E(G). If A ∈ Ei+1 , either A ∈ Ei and then A ∈ E(G), or there is some production
(A → B1 . . . Bj . . . Bk ) ∈ P , such that Bj ∈ Ei for all j, 1 ≤ j ≤ k. By the induction
+
hypothesis, Bj =⇒ " for each j, 1 ≤ j ≤ k, and thus

+ + +
A =⇒ B1 . . . Bj . . . Bk =⇒ B2 . . . Bj . . . Bk =⇒ Bj . . . Bk =⇒ ",

which shows that A ∈ E(G).

To prove that E(G) ⊆ Ei0 , we also proceed by induction, but on the length of a derivation
+ 1
A =⇒ ". If A =⇒ ", then A → " ∈ P , and thus A ∈ E0 since E0 = {A ∈ N | (A → ") ∈ P }.
n+1
If A =⇒ ", then
n
A =⇒ α =⇒ ",

for some production A → α ∈ P . If α contains terminals of nonterminals not in E(G), it is


impossible to derive " from α, and thus, we must have α = B1 . . . Bj . . . Bk , with Bj ∈ E(G),
nj
for all j, 1 ≤ j ≤ k. However, Bj =⇒ " where nj ≤ n, and by the induction hypothesis,
Bj ∈ Ei0 . But then, we get A ∈ Ei0 +1 = Ei0 , as desired.

Having shown that E(G) = Ei0 , we construct the grammar G$ . Its set of production P $
is defined as follows. First, we create the production S $ → S where S $ ∈
/ V , to make sure
$ $
that S does not occur on the right-hand side of any rule in P . Let

P1 = {A → α ∈ P | α ∈ V + } ∪ {S $ → S},

and let P2 be the set of productions

P2 = {A → α1 α2 . . . αk αk+1 | ∃α1 ∈ V ∗ , . . . , ∃αk+1 ∈ V ∗ , ∃B1 ∈ E(G), . . . , ∃Bk ∈ E(G)


A → α1 B1 α2 . . . αk Bk αk+1 ∈ P, k ≥ 1, α1 . . . αk+1 1= "}.

Note that " ∈ L(G) iff S ∈ E(G). If S ∈/ E(G), then let P $ = P1 ∪ P2 , and if S ∈ E(G), then
let P $ = P1 ∪ P2 ∪ {S $ → "}. We claim that L(G$ ) = L(G), which is proved by showing that
every derivation using G can be simulated by a derivation using G$ , and vice-versa. All the
conditions of the lemma are now met.

From a practical point of view, the construction or lemma 3.3.1 is very costly. For
3.3. NORMAL FORMS FOR CONTEXT-FREE GRAMMARS 55

example, given a grammar containing the productions

S → ABCDEF,
A → ",
B → ",
C → ",
D → ",
E → ",
F → ",
... → ...,

eliminating "-rules will create 26 − 1 = 63 new rules corresponding to the 63 nonempty


subsets of the set {A, B, C, D, E, F }. We now turn to the elimination of chain rules.
It turns out that matters are greatly simplified if we first apply lemma 3.3.1 to the input
grammar G, and we explain the construction assuming that G = (V, Σ, P, S) satisfies the
conditions of lemma 3.3.1. For every nonterminal A ∈ N, we define the set
+
IA = {B ∈ N | A =⇒ B}.

The sets IA are computed using approximations IA,i defined as follows:

IA,0 = {B ∈ N | (A → B) ∈ P },
IA,i+1 = IA,i ∪ {C ∈ N | ∃(B → C) ∈ P, and B ∈ IA,i }.

Clearly, for every A ∈ N, the IA,i form an ascending chain

IA,0 ⊆ IA,1 ⊆ · · · ⊆ IA,i ⊆ IA,i+1 ⊆ · · · ⊆ N,

and since N is finite, there is a least i, say i0 , such that IA,i0 = IA,i0 +1 . We claim that
IA = IA,i0 . Actually, we prove the following lemma.

Lemma 3.3.2 Given a context-free grammar G = (V, Σ, P, S), one can construct a context-
free grammar G$ = (V $ , Σ, P $, S $ ) such that:

(1) L(G$ ) = L(G);

(2) Every rule in P $ is of the form A → α where |α| ≥ 2, or A → a where a ∈ Σ, or


S $ → " iff " ∈ L(G);

(3) S $ does not occur on the right-hand side of any production in P $.


56 CHAPTER 3. CONTEXT-FREE GRAMMARS AND LANGUAGES

Proof . First, we apply lemma 3.3.1 to the grammar G, obtaining a grammar G1 =


(V1 , Σ, S1 , P1 ). The proof that IA = IA,i0 is similar to the proof that E(G) = Ei0 . First,
we prove that IA,i ⊆ IA by induction on i. This is staightforward. Next, we prove that
+
IA ⊆ IA,i0 by induction on derivations of the form A =⇒ B. In this part of the proof, we
use the fact that G1 has no "-rules except perhaps S1 → ", and that S1 does not occur on
n+1
the right-hand side of any rule. This implies that a derivation A =⇒ C is necessarily of the
n
form A =⇒ B =⇒ C for some B ∈ N. Then, in the induction step, we have B ∈ IA,i0 , and
thus C ∈ IA,i0 +1 = IA,i0 .
We now define the following sets of rules. Let
P2 = P1 − {A → B | A → B ∈ P1 },
and let
P3 = {A → α | B → α ∈ P1 , α ∈
/ N1 , B ∈ IA }.
$
We claim that G = (V1 , Σ, P2 ∪ P3 , S1 ) satisfies the conditions of the lemma. For example,
S1 does not appear on the right-hand side of any production, since the productions in P3
have right-hand sides from P1 , and S1 does not appear on the right-hand side in P1 . It is
also easily shown that L(G$ ) = L(G1 ) = L(G).
Let us apply the method of lemma 3.3.2 to the grammar
G3 = ({E, T, F, +, ∗, (, ), a}, {+, ∗, (, ), a}, P, E),
where P is the set of rules
E −→ E + T,
E −→ T,
T −→ T ∗ F,
T −→ F,
F −→ (E),
F −→ a.
We get IE = {T, F }, IT = {F }, and IF = ∅. The new grammar G$3 has the set of rules
E −→ E + T,
E −→ T ∗ F,
E −→ (E),
E −→ a,
T −→ T ∗ F,
T −→ (E),
T −→ a,
F −→ (E),
F −→ a.
3.3. NORMAL FORMS FOR CONTEXT-FREE GRAMMARS 57

At this stage, the grammar obtained in lemma 3.3.2 no longer has "-rules (except perhaps
$
S → " iff " ∈ L(G)) or chain rules. However, it may contain rules A → α with |α| ≥ 3, or
with |α| ≥ 2 and where α contains terminals(s). To obtain the Chomsky Normal Form. we
need to eliminate such rules. This is not difficult, but notationally a bit messy.

Lemma 3.3.3 Given a context-free grammar G = (V, Σ, P, S), one can construct a context-
free grammar G$ = (V $ , Σ, P $ , S $) such that L(G$ ) = L(G) and G$ is in Chomsky Normal
Form, that is, a grammar whose productions are of the form

A → BC,
A → a, or
S $ → ",

where A, B, C ∈ N $ , a ∈ Σ, S $ → " is in P $ iff " ∈ L(G), and S $ does not occur on the
right-hand side of any production in P $.

Proof . First, we apply lemma 3.3.2, obtaining G1 . Let Σr be the set of terminals
occurring on the right-hand side of rules A → α ∈ P1 , with |α| ≥ 2. For every a ∈ Σr , let
Xa be a new nonterminal not in V1 . Let

P2 = {Xa → a | a ∈ Σr }.

Let P1,r be the set of productions

A → α1 a1 α2 · · · αk ak αk+1 ,

where a1 , . . . , ak ∈ Σr and αi ∈ N1∗ . For every production

A → α1 a1 α2 · · · αk ak αk+1

in P1,r , let
A → α1 Xa1 α2 · · · αk Xak αk+1
be a new production, and let P3 be the set of all such productions. Let P4 = (P1 − P1,r ) ∪
P2 ∪ P3 . Now, productions A → α in P4 with |α| ≥ 2 do not contain terminals. However, we
may still have productions A → α ∈ P4 with |α| ≥ 3. We can perform some recoding using
some new nonterminals. For every production of the form

A → B1 · · · Bk ,

where k ≥ 3, create the new nonterminals

[B1 · · · Bk−1 ], [B1 · · · Bk−2 ], · · · , [B1 B2 B3 ], [B1 B2 ],


58 CHAPTER 3. CONTEXT-FREE GRAMMARS AND LANGUAGES

and the new productions

A → [B1 · · · Bk−1]Bk ,
[B1 · · · Bk−1 ] → [B1 · · · Bk−2]Bk−1 ,
··· → ··· ,
[B1 B2 B3 ] → [B1 B2 ]B3 ,
[B1 B2 ] → B1 B2 .

All the productions are now in Chomsky Normal Form, and it is clear that the same language
is generated.

Applying the first phase of the method of lemma 3.3.3 to the grammar G$3 , we get the
rules

E −→ EX+ T,
E −→ T X∗ F,
E −→ X( EX) ,
E −→ a,
T −→ T X∗ F,
T −→ X( EX) ,
T −→ a,
F −→ X( EX) ,
F −→ a,
X+ −→ +,
X∗ −→ ∗,
X( −→ (,
X) −→).

After applying the second phase of the method, we get the following grammar in Chomsky
3.3. NORMAL FORMS FOR CONTEXT-FREE GRAMMARS 59

Normal Form:
E −→ [EX+ ]T,
[EX+ ] −→ EX+ ,
E −→ [T X∗ ]F,
[T X∗ ] −→ T X∗ ,
E −→ [X( E]X) ,
[X( E] −→ X( E,
E −→ a,
T −→ [T X∗ ]F,
T −→ [X( E]X) ,
T −→ a,
F −→ [X( E]X) ,
F −→ a,
X+ −→ +,
X∗ −→ ∗,
X( −→ (,
X) −→).

For large grammars, it is often convenient to use the abbreviation which consists in group-
ing productions having a common left-hand side, and listing the right-hand sides separated
by the symbol |. Thus, a group of productions
A → α1 ,
A → α2 ,
··· → ··· ,
A → αk ,
may be abbreviated as
A → α1 | α2 | · · · | αk .

An interesting corollary of the CNF is the following decidability result. There is an


algorithm which, given a context-free grammar G, given any string w ∈ Σ∗ , decides whether
w ∈ L(G). Indeed, we first convert G to a grammar G$ in Chomsky Normal Form. If w = ",
we can test whether " ∈ L(G), since this is the case iff S $ → " ∈ P $ . If w 1= ", letting n = |w|,
note that since the rules are of the form A → BC or A → a, where a ∈ Σ, any derivation
for w has n − 1 + n = 2n − 1 steps. Thus, we enumerate all (leftmost) derivations of length
2n − 1.
There are much better parsing algorithms than this naive algorithm. We now show that
every regular language is context-free.
60 CHAPTER 3. CONTEXT-FREE GRAMMARS AND LANGUAGES

3.4 Regular Languages are Context-Free


The regular languages can be characterized in terms of very special kinds of context-free
grammars, right-linear (and left-linear) context-free grammars.

Definition 3.4.1 A context-free grammar G = (V, Σ, P, S) is left-linear iff its productions


are of the form

A → Ba,
A → a,
A → ".

where A, B ∈ N, and a ∈ Σ. A context-free grammar G = (V, Σ, P, S) is right-linear iff its


productions are of the form

A → aB,
A → a,
A → ".

where A, B ∈ N, and a ∈ Σ.

The following lemma shows the equivalence between NFA’s and right-linear grammars.

Lemma 3.4.2 A language L is regular if and only if it is generated by some right-linear


grammar.

Proof . Let L = L(D) for some DFA D = (Q, Σ, δ, q0 , F ). We construct a right-linear


grammar G as follows. Let V = Q ∪ Σ, S = q0 , and let P be defined as follows:

P = {p → aq | q = δ(p, a), p, q ∈ Q, a ∈ Σ} ∪ {p → " | p ∈ F }.

It is easily shown by induction on the length of w that



p =⇒ wq iff q = δ ∗ (p, w),

and thus, L(D) = L(G).


Conversely, let G = (V, Σ, P, S) be a right-linear grammar. First, let G = (V $ , Σ, P $ , S) be
the right-linear grammar obtained from G by adding the new nonterminal E to N, replacing
every rule in P of the form A → a where a ∈ Σ by the rule A → aE, and adding the
rule E → ". It is immediately verified that L(G$ ) = L(G). Next, we construct the NFA
M = (Q, Σ, δ, q0 , F ) as follows: Q = N $ = N ∪ {E}, q0 = S, F = {A ∈ N $ | A → "}, and

δ(A, a) = {B ∈ N $ | A → aB ∈ P $ },
3.5. USELESS PRODUCTIONS IN CONTEXT-FREE GRAMMARS 61

for all A ∈ N and all a ∈ Σ. It is easily shown by induction on the length of w that

A =⇒ wB iff B ∈ δ ∗ (A, w),

and thus, L(M) = L(G$ ) = L(G).


A similar lemma holds for left-linear grammars. It is also easily shown that the regular
languages are exactly the languages generated by context-free grammars whose rules are of
the form

A → Bu,
A → u,

where A, B ∈ N, and u ∈ Σ∗ .

3.5 Useless Productions in Context-Free Grammars


Given a context-free grammar G = (V, Σ, P, S), it may contain rules that are useless for
a number of reasons. For example, consider the grammar G3 = ({E, A, a, b}, {a, b}, P, E),
where P is the set of rules

E −→ aEb,
E −→ ab,
E −→ A,
A −→ bAa.

The problem is that the nonterminal A does not derive any terminal strings, and thus, it
is useless, as well as the last two productions. Let us now consider the grammar G4 =
({E, A, a, b, c, d}, {a, b, c, d}, P, E), where P is the set of rules

E −→ aEb,
E −→ ab,
A −→ cAd,
A −→ cd.

This time, the nonterminal A generates strings of the form cn dn , but there is no derivation
+
E =⇒ α from E where A occurs in α. The nonterminal A is not connected to E, and the last
two rules are useless. Fortunately, it is possible to find such useless rules, and to eliminate
them.
Let T (G) be the set of nonterminals that actually derive some terminal string, i.e.

T (G) = {A ∈ (V − Σ) | ∃w ∈ Σ∗ , A =⇒+ w}.


62 CHAPTER 3. CONTEXT-FREE GRAMMARS AND LANGUAGES

The set T (G) can be defined by stages. We define the sets Tn (n ≥ 1) as follows:
T1 = {A ∈ (V − Σ) | ∃(A −→ w) ∈ P, with w ∈ Σ∗ },
and
Tn+1 = Tn ∪ {A ∈ (V − Σ) | ∃(A −→ β) ∈ P, with β ∈ (Tn ∪ Σ)∗ }.
It is easy to prove that there is some least n such that Tn+1 = Tn , and that for this n,
T (G) = Tn .
If S ∈
/ T (G), then L(G) = ∅, and G is equivalent to the trivial grammar
G$ = ({S}, Σ, ∅, S).

If S ∈ T (G), then let U(G) be the set of nonterminals that are actually useful, i.e.,
U(G) = {A ∈ T (G) | ∃α, β ∈ (T (G) ∪ Σ)∗ , S =⇒∗ αAβ}.
The set U(G) can also be computed by stages. We define the sets Un (n ≥ 1) as follows:
U1 = {A ∈ T (G) | ∃(S −→ αAβ) ∈ P, with α, β ∈ (T (G) ∪ Σ)∗ },
and
Un+1 = Un ∪ {B ∈ T (G) | ∃(A −→ αBβ) ∈ P, with A ∈ Un , α, β ∈ (T (G) ∪ Σ)∗ }.
It is easy to prove that there is some least n such that Un+1 = Un , and that for this n,
U(G) = Un ∪ {S}. Then, we can use U(G) to transform G into an equivalent CFG in
which every nonterminal is useful (i.e., for which V − Σ = U(G)). Indeed, simply delete all
rules containing symbols not in U(G). The details are left as an exercise. We say that a
context-free grammar G is reduced if all its nonterminals are useful, i.e., N = U(G).
It should be noted than although dull, the above considerations are important in practice.
Certain algorithms for constructing parsers, for example, LR-parsers, may loop if useless
rules are not eliminated!
We now consider another normal form for context-free grammars, the Greibach Normal
Form.

3.6 The Greibach Normal Form


Every CFG G can also be converted to an equivalent grammar in Greibach Normal Form
(for short, GNF). A context-free grammar G = (V, Σ, P, S) is in Greibach Normal Form iff
its productions are of the form
A → aBC,
A → aB,
A → a, or
S → ",
3.7. LEAST FIXED-POINTS 63

where A, B, C ∈ N, a ∈ Σ, S → " is in P iff " ∈ L(G), and S does not occur on the
right-hand side of any production.
Note that a grammar in Greibach Normal Form does not have "-rules other than possibly
S → ". More importantly, except for the special rule S → ", every rule produces some
terminal symbol.
An important consequence of the Greibach Normal Form is that every nonterminal is
+
not left recursive. A nonterminal A is left recursive iff A =⇒ Aα for some α ∈ V ∗ . Left
recursive nonterminals cause top-down determinitic parsers to loop. The Greibach Normal
Form provides a way of avoiding this problem.
There are no easy proofs that every CFG can be converted to a Greibach Normal Form.
A particularly elegant method due to Rosenkrantz using least fixed-points and matrices will
be given in section 3.9.

Lemma 3.6.1 Given a context-free grammar G = (V, Σ, P, S), one can construct a context-
free grammar G$ = (V $ , Σ, P $ , S $ ) such that L(G$ ) = L(G) and G$ is in Greibach Normal
Form, that is, a grammar whose productions are of the form

A → aBC,
A → aB,
A → a, or
S $ → ",

where A, B, C ∈ N $ , a ∈ Σ, S $ → " is in P $ iff " ∈ L(G), and S $ does not occur on the
right-hand side of any production in P $.

3.7 Least Fixed-Points


Context-free languages can also be characterized as least fixed-points of certain functions
induced by grammars. This characterization yields a rather quick proof that every context-
free grammar can be converted to Greibach Normal Form. This characterization also reveals
very clearly the recursive nature of the context-free languages.
We begin by reviewing what we need from the theory of partially ordered sets.

Definition 3.7.1 Given a partially ordered set &A, ≤', an ω-chain (an )n≥0 is a sequence
such that an ≤ an+1 for all n ≥ 0. The least-upper bound of an ω-chain (an ) is an element
a ∈ A such that:

(1) an ≤ a, for all n ≥ 0;

(2) For any b ∈ A, if an ≤ b, for all n ≥ 0, then a ≤ b.


64 CHAPTER 3. CONTEXT-FREE GRAMMARS AND LANGUAGES

A partially ordered set &A, ≤' is an ω-chain complete "


poset iff it has a least element ⊥, and
iff every ω-chain has a least upper bound denoted as an .

Remark : The ω in ω-chain means that we are considering countable chains (ω is the
ordinal associated with the order-type of the set of natural numbers). This notation may
seem arcane, but is standard in denotational semantics.
For example, given any set X, the power set 2X ordered by inclusion is an ω-chain
complete poset with least element ∅. The Cartesian product 2# X × ·$%
· · × 2X& ordered such
n
that
(A1 , . . . , An ) ≤ (B1 , . . . , Bn )
iff Ai ⊆ Bi (where Ai , Bi ∈ 2X ) is an ω-chain complete poset with least element (∅, . . . , ∅).
We are interested in functions between partially ordered sets.

Definition 3.7.2 Given any two partially ordered sets &A1 , ≤1 ' and &A2 , ≤2 ', a function
f : A1 → A2 is monotonic iff for all x, y ∈ A1 ,

x ≤1 y implies that f (x) ≤2 f (y).

If &A1 , ≤1 ' and &A2 , ≤2 ' are ω-chain complete posets, a function f : A1 → A2 is ω-continuous
iff it is monotonic, and for every ω-chain (an ),
' '
f ( an ) = f (an ).

Remark : Note that we are not requiring that an ω-continuous function f : A1 → A2


preserve least elements, i.e., it is possible that f (⊥1 ) 1=⊥2 .
We now define the crucial concept of a least fixed-point.

Definition 3.7.3 Let &A, ≤' be a partially ordered set, and let f : A → A be a function.
A fixed-point of f is an element a ∈ A such that f (a) = a. The least fixed-point of f is an
element a ∈ A such that f (a) = a, and for every b ∈ A such that f (b) = b, then a ≤ b.

The following lemma gives sufficient conditions for the existence of least fixed-points. It
is one of the key lemmas in denotational semantics.

Lemma 3.7.4 Let &A, ≤' be an ω-chain complete poset with least element ⊥. Every ω-
continuous function f : A → A has a unique least fixed-point x0 given by
'
x0 = f n (⊥).

Furthermore, for any b ∈ A such that f (b) ≤ b, then x0 ≤ b.


3.7. LEAST FIXED-POINTS 65

Proof . First, we prove that the sequence

⊥ , f (⊥) , f 2 (⊥), . . . , f n (⊥), . . .

is an ω-chain. This is shown by induction on n. Since ⊥ is the least element of A, we have


⊥≤ f (⊥). Assuming by induction that f n (⊥) ≤ f n+1 (⊥), since f is ω-continuous, it is
monotonic, and thus we get f n+1 (⊥) ≤ f n+2(⊥), as desired.
Since A is an ω-chain complete poset, the ω-chain (f n (⊥)) has a least upper bound
'
x0 = f n (⊥).

Since f is ω-continuous, we have


' ' '
f (x0 ) = f ( f n (⊥)) = f (f n (⊥)) = f n+1 (⊥) = x0 ,

and x0 is indeed a fixed-point of f .


Clearly, if f (b) ≤ b implies that x0 ≤ b, then f (b) = b implies that x0 ≤ b. Thus, assume
that f (b) ≤ b for some b ∈ A. We prove by induction of n that f n (⊥) ≤ b. Indeed, ⊥≤ b,
since ⊥ is the least element of A. Assuming by induction that f n (⊥) ≤ b, by monotonicity
of f , we get
f (f n (⊥)) ≤ f (b),
and since f (b) ≤ b, this yields
f n+1 (⊥) ≤ b.
Since f n (⊥) ≤ b for all n ≥ 0, we have
'
x0 = f n (⊥) ≤ b.

The second part of lemma 3.7.4 is very useful to prove that functions have the same
least fixed-point. For example, under the conditions of lemma 3.7.4, if g : A → A is another
ω-chain continuous function, letting x0 be the least fixed-point of f and y0 be the least
fixed-point of g, if f (y0) ≤ y0 and g(x0 ) ≤ x0 , we can deduce that x0 = y0 . Indeed, since
f (y0 ) ≤ y0 and x0 is the least fixed-point of f , we get x0 ≤ y0 , and since g(x0 ) ≤ x0 and y0
is the least fixed-point of g, we get y0 ≤ x0 , and therefore x0 = y0 .
Lemma 3.7.4 also shows that the least fixed-point x0 of f can be approximated as much as
desired, using the sequence (f n (⊥)). We will now apply this fact to context-free grammars.
For this, we need to show how a context-free grammar G = (V, Σ, P, S) with m nonterminals
induces an ω-continuous map

· · × 2Σ& → 2# Σ × ·$%
ΦG : #2Σ × ·$% · · × 2Σ& .
∗ ∗ ∗ ∗

m m
66 CHAPTER 3. CONTEXT-FREE GRAMMARS AND LANGUAGES

3.8 Context-Free Languages as Least Fixed-Points


Given a context-free grammar G = (V, Σ, P, S) with m nonterminals A1 , . . . Am , grouping all
the productions having the same left-hand side, the grammar G can be concisely written as
A1 → α1,1 + · · · + α1,n1 ,
··· → ···
Ai → αi,1 + · · · + αi,ni ,
··· → ···
Am → αm,1 + · · · + αm,nn .
Given any set A, let Pf in (A) be the set of finite subsets of A.

Definition 3.8.1 Let G = (V, Σ, P, S) be a context-free grammar with m nonterminals A1 ,


. . ., Am . For any m-tuple Λ = (L1 , . . . , Lm ) of languages Li ⊆ Σ∗ , we define the function
Φ[Λ] : Pf in (V ∗ ) → 2Σ

inductively as follows:
Φ[Λ](∅) = ∅,
Φ[Λ]({"}) = {"},
Φ[Λ]({a}) = {a}, if a ∈ Σ,
Φ[Λ]({Ai }) = Li , if Ai ∈ N,
Φ[Λ]({αX}) = Φ[Λ]({α})Φ[Λ]({X}), if α ∈ V + , X ∈ V,
Φ[Λ](Q ∪ {α}) = Φ[Λ](Q) ∪ Φ[Λ]({α}), if Q ∈ Pf in (V ∗ ), Q 1= ∅, α ∈ V ∗ , α ∈
/ Q.
Then, writing the grammar G as
A1 → α1,1 + · · · + α1,n1 ,
··· → ···
Ai → αi,1 + · · · + αi,ni ,
··· → ···
Am → αm,1 + · · · + αm,nn ,
we define the map
ΦG : #2Σ × ·$%
· · × 2Σ& → 2# Σ × ·$%
· · × 2Σ&
∗ ∗ ∗ ∗

m m
such that
ΦG (L1 , . . . Lm ) = (Φ[Λ]({α1,1 , . . . , α1,n1 }), . . . , Φ[Λ]({αm,1 , . . . , αm,nm }))
for all Λ = (L1 , . . . , Lm ) ∈ 2# Σ × ·$%
· · × 2Σ&.
∗ ∗

m
3.8. CONTEXT-FREE LANGUAGES AS LEAST FIXED-POINTS 67

One should verify that the map Φ[Λ] is well defined, but this is easy. The following
lemma is easily shown:

Lemma 3.8.2 Given a context-free grammar G = (V, Σ, P, S) with m nonterminals A1 , . . .,


Am , the map
ΦG : #2Σ × ·$%
· · × 2Σ& → 2# Σ × ·$%
· · × 2Σ&
∗ ∗ ∗ ∗

m m

is ω-continuous.

Now, 2# Σ × ·$%
· · × 2Σ& is an ω-chain complete poset, and the map ΦG is ω-continous. Thus,
∗ ∗

m
by lemma 3.7.4, the map ΦG has a least-fixed point. It turns out that the components of
this least fixed-point are precisely the languages generated by the grammars (V, Σ, P, Ai ).
Before proving this fact, let us give an example illustrating it.
Example. Consider the grammar G = ({A, B, a, b}, {a, b}, P, A) defined by the rules

A → BB + ab,
B → aBb + ab.

The least fixed-point of ΦG is the least upper bound of the chain

(ΦnG (∅, ∅)) = ((ΦnG,A (∅, ∅), ΦnG,B (∅, ∅)),

where
Φ0G,A (∅, ∅) = Φ0G,B (∅, ∅) = ∅,
and

Φn+1 n n
G,A (∅, ∅) = ΦG,B (∅, ∅)ΦG,B (∅, ∅) ∪ {ab},
Φn+1 n
G,B (∅, ∅) = aΦG,B (∅, ∅)b ∪ {ab}.

It is easy to verify that

Φ1G,A (∅, ∅) = {ab},


Φ1G,B (∅, ∅) = {ab},
Φ2G,A (∅, ∅) = {ab, abab},
Φ2G,B (∅, ∅) = {ab, aabb},
Φ3G,A (∅, ∅) = {ab, abab, abaabb, aabbab, aabbaabb},
Φ3G,B (∅, ∅) = {ab, aabb, aaabbb}.
68 CHAPTER 3. CONTEXT-FREE GRAMMARS AND LANGUAGES

By induction, we can easily prove that the two components of the least fixed-point are
the languages

LA = {am bm an bn | m, n ≥ 1} ∪ {ab} and LB = {an bn | n ≥ 1}.

Letting GA = ({A, B, a, b}, {a, b}, P, A) and GB = ({A, B, a, b}, {a, b}, P, B), it is indeed
true that LA = L(GA ) and LB = L(GB ) .
We have the following theorem due to Ginsburg and Rice:

Theorem 3.8.3 Given a context-free grammar G = (V, Σ, P, S) with m nonterminals A1 ,


. . ., Am , the least fixed-point of the map ΦG is the m-tuple of languages

(L(GA1 ), . . . , L(GAm )),

where GAi = (V, Σ, P, Ai ).

Proof . Writing G as

A1 → α1,1 + · · · + α1,n1 ,
··· → ···
Ai → αi,1 + · · · + αi,ni ,
··· → ···
Am → αm,1 + · · · + αm,nn ,

let M = max{|αi,j |} be the maximum length of right-hand sides of rules in P . Let

ΦnG (∅, . . . , ∅) = (ΦnG,1 (∅, . . . , ∅), . . . , ΦnG,m (∅, . . . , ∅)).

Then, for any w ∈ Σ∗ , observe that

w ∈ Φ1G,i (∅, . . . , ∅)

iff there is some rule Ai → αi,j with w = αi,j , and that

w ∈ ΦnG,i (∅, . . . , ∅)

for some n ≥ 2 iff there is some rule Ai → αi,j with αi,j of the form

αi,j = u1 Aj1 u2 · · · uk Ajk uk+1,

where u1 , . . . , uk+1 ∈ Σ∗ , k ≥ 1, and some w1 , . . . , wk ∈ Σ∗ such that


n−1
wh ∈ ΦG,j h
(∅, . . . , ∅),
3.8. CONTEXT-FREE LANGUAGES AS LEAST FIXED-POINTS 69

and
w = u1w1 u2 · · · uk wk uk+1.

We prove the following two claims.


n
Claim 1: For every w ∈ Σ∗ , if Ai =⇒ w, then w ∈ ΦpG,i (∅, . . . , ∅), for some p ≥ 1.
p
Claim 2: For every w ∈ Σ∗ , if w ∈ ΦnG,i (∅, . . . , ∅), with n ≥ 1, then Ai =⇒ w for some
p ≤ (M + 1)n−1 .
1
Proof of Claim 1. We proceed by induction on n. If Ai =⇒ w, then w = αi,j for some
rule A → αi,j , and by the remark just before the claim, w ∈ Φ1G,i (∅, . . . , ∅).
n+1
If Ai =⇒ w with n ≥ 1, then
n
Ai =⇒ αi,j =⇒ w
for some rule Ai → αi,j . If
αi,j = u1 Aj1 u2 · · · uk Ajk uk+1,
n
where u1 , . . . , uk+1 ∈ Σ∗ , k ≥ 1, then Ajh =⇒
h
wh , where nh ≤ n, and

w = u1 w1 u2 · · · uk wk uk+1

for some w1 , . . . , wk ∈ Σ∗ . By the induction hypothesis,

wh ∈ ΦpG,j
h
h
(∅, . . . , ∅),

for some ph ≥ 1, for every h, 1 ≤ h ≤ k. Letting p = max{p1 , . . . , pk }, since each sequence


(ΦqG,i (∅, . . . , ∅)) is an ω-chain, we have wh ∈ ΦpG,jh (∅, . . . , ∅) for every h, 1 ≤ h ≤ k, and by
the remark just before the claim, w ∈ Φp+1 G,i (∅, . . . , ∅).

Proof of Claim 2. We proceed by induction on n. If w ∈ Φ1G,i (∅, . . . , ∅), by the remark


1
just before the claim, then w = αi,j for some rule A → αi,j , and Ai =⇒ w.
If w ∈ ΦnG,i (∅, . . . , ∅) for some n ≥ 2, then there is some rule Ai → αi,j with αi,j of the
form
αi,j = u1 Aj1 u2 · · · uk Ajk uk+1,
where u1 , . . . , uk+1 ∈ Σ∗ , k ≥ 1, and some w1 , . . . , wk ∈ Σ∗ such that

wh ∈ Φn−1
G,jh (∅, . . . , ∅),

and
w = u1w1 u2 · · · uk wk uk+1.
h p
By the induction hypothesis, Ajh =⇒ wh with ph ≤ (M + 1)n−2, and thus
p1 k p
Ai =⇒ u1 Aj1 u2 · · · uk Ajk uk+1 =⇒ · · · =⇒ w,
70 CHAPTER 3. CONTEXT-FREE GRAMMARS AND LANGUAGES

p
so that Ai =⇒ w with

p ≤ p1 + · · · + pk + 1 ≤ M(M + 1)n−2 + 1 ≤ (M + 1)n−1,

since k ≤ M.
Combining Claim 1 and Claim 2, we have
!
L(GAi ) = ΦnG,i (∅, . . . , ∅),
n

which proves that the least fixed-point of the map ΦG is the m-tuple of languages

(L(GA1 ), . . . , L(GAm )).

We now show how theorem 3.8.3 can be used to give a short proof that every context-free
grammar can be converted to Greibach Normal Form.

3.9 Least Fixed-Points and the Greibach Normal Form


The hard part in converting a grammar G = (V, Σ, P, S) to Greibach Normal Form is to
convert it to a grammar in so-called weak Greibach Normal Form, where the productions
are of the form

A → aα, or
S → ",

where a ∈ Σ, α ∈ V ∗ , and if S → " is a rule, then S does not occur on the right-hand side of
any rule. Indeed, if we first convert G to Chomsky Normal Form, it turns out that we will
get rules of the form A → aBC, A → aB or A → a.
Using the algorithm for eliminating "-rules and chain rules, we can first convert the
original grammar to a grammar with no chain rules and no "-rules except possibly S → ",
in which case, S does not appear on the right-hand side of rules. Thus, for the purpose
of converting to weak Greibach Normal Form, we can assume that we are dealing with
grammars without chain rules and without "-rules. Let us also assume that we computed
the set T (G) of nonterminals that actually derive some terminal string, and that useless
productions involving symbols not in T (G) have been deleted.
Let us explain the idea of the conversion using the following grammar:

A → AaB + BB + b.
B → Bd + BAa + aA + c.
3.9. LEAST FIXED-POINTS AND THE GREIBACH NORMAL FORM 71

The first step is to group the right-hand sides α into two categories: those whose leftmost
symbol is a terminal (α ∈ ΣV ∗ ) and those whose leftmost symbol is a nonterminal (α ∈
NV ∗ ). It is also convenient to adopt a matrix notation, and we can write the above grammar
as
( )
aB ∅
(A, B) = (A, B) + (b, {aA, c})
B {d, Aa}
Thus, we are dealing with matrices (and row vectors) whose entries are finite subsets of

V . For notational simplicity, braces around singleton sets are omitted. The finite subsets of
V ∗ form a semiring, where addition is union, and multiplication is concatenation. Addition
and multiplication of matrices are as usual, except that the semiring operations are used. We
will also consider matrices whose entries are languages over Σ. Again, the languages over Σ
form a semiring, where addition is union, and multiplication is concatenation. The identity
element for addition is ∅, and the identity element for multiplication is {"}. As above,
addition and multiplication of matrices are as usual, except that the semiring operations are
used. For example, given any languages Ai,j and Bi,j over Σ, where i, j ∈ {1, 2}, we have
( )( ) ( )
A1,1 A1,2 B1,1 B1,2 A1,1 B1,1 ∪ A1,2 B2,1 A1,1 B1,2 ∪ A1,2 B2,2
=
A2,1 A2,2 B2,1 B2,2 A2,1 B1,1 ∪ A2,2 B2,1 A2,1 B1,2 ∪ A2,2 B2,2
Letting X = (A, B), K = (b, {aA, c}), and
( )
aB ∅
H=
B {d, Aa}
the above grammar can be concisely written as
X = XH + K.

More generally, given any context-free grammar G = (V, Σ, P, S) with m nonterminals


A1 , . . ., Am , assuming that there are no chain rules, no "-rules, and that every nonterminal
belongs to T (G), letting
X = (A1 , . . . , Am ),
we can write G as
X = XH + K,
for some appropriate m × m matrix H in which every entry contains a set (possibly empty)
of strings in V + , and some row vector K in which every entry contains a set (possibly empty)
of strings α each beginning with a terminal (α ∈ ΣV ∗ ).
Given an m × m square matrix A = (Ai,j ) of languages over Σ, we can define the matrix
A∗ whose entry A∗i,j is given by
!
A∗i,j = Ani,j ,
n≥0
72 CHAPTER 3. CONTEXT-FREE GRAMMARS AND LANGUAGES

where A0 = Idm , the identity matrix, and An is the n-th power of A. Similarly, we define
A+ where !
A+
i,j = Ani,j .
n≥1

Given a matrix A where the entries are finite subset of V ∗ , where N = {A1 , . . . , Am }, for
any m-tuple Λ = (L1 , . . . , Lm ) of languages over Σ, we let

Φ[Λ](A) = (Φ[Λ](Ai,j )).

Given a system X = XH + K where H is an m × m matrix and X, K are row matrices,


if H and K do not contain any nonterminals, we claim that the least fixed-point of the
grammar G associated with X = XH + K is KH ∗ . This is easily seen by computing the
approximations X n = ΦnG (∅, . . . , ∅). Indeed, X 0 = K, and

X n = KH n + KH n−1 + · · · + KH + K = K(H n + H n−1 + · · · + H + Im ).

Similarly, if Y is an m × m matrix of nonterminals, the least fixed-point of the grammar


associated with Y = HY + H is H + (provided that H does not contain any nonterminals).
Given any context-free grammar G = (V, Σ, P, S) with m nonterminals A1 , . . ., Am ,
writing G as X = XH + K as explained earlier, we can form another grammar GH by
creating m2 new nonterminals Yi,j , where the rules of this new grammar are defined by the
system of two matrix equations

X = KY + K,
Y = HY + H,

where Y = (Yi,j ).
The following lemma is the key to the Greibach Normal Form.

Lemma 3.9.1 Given any context-free grammar G = (V, Σ, P, S) with m nonterminals A1 ,


. . ., Am , writing G as
X = XH + K
as explained earlier, if GH is the grammar defined by the system of two matrix equations

X = KY + K,
Y = HY + H,

as explained above, then the components in X of the least-fixed points of the maps ΦG and
ΦGH are equal.
3.9. LEAST FIXED-POINTS AND THE GREIBACH NORMAL FORM 73

Proof . Let U be the least-fixed point of ΦG , and let (V, W ) be the least fixed-point of
ΦGH . We shall prove that U = V . For notational simplicity, let us denote Φ[U](H) as H[U]
and Φ[U](K) as K[U].
Since U is the least fixed-point of X = XH + K, we have
U = UH[U] + K[U].
Since H[U] and K[U] do not contain any nonterminals, by a previous remark, K[U]H ∗ [U] is
the least-fixed point of X = XH[U] + K[U], and thus,
K[U]H ∗ [U] ≤ U.
On the other hand, by monotonicity,
* + * +
K[U]H ∗ [U]H K[U]H ∗ [U] + K K[U]H ∗ [U] ≤ K[U]H ∗ [U]H[U] + K[U] = K[U]H ∗ [U],

and since U is the least fixed-point of X = XH + K,


U ≤ K[U]H ∗ [U].
Therefore, U = K[U]H ∗ [U]. We can prove in a similar manner that W = H[V ]+ .
Let Z = H[U]+ . We have
K[U]Z + K[U] = K[U]H[U]+ + K[U] = K[U]H[U]∗ = U,
and
H[U]Z + H[U] = H[U]H[U]+ + H[U] = H[U]+ = Z,
and since (V, W ) is the least fixed-point of X = KY + K and Y = HY + H, we get V ≤ U
and W ≤ H[U]+ .
We also have
V = K[V ]W + K[V ] = K[V ]H[V ]+ + K[V ] = K[V ]H[V ]∗ ,
and
V H[V ] + K[V ] = K[V ]H[V ]∗ H[V ] + K[V ] = K[V ]H[V ]∗ = V,
and since U is the least fixed-point of X = XH + K, we get U ≤ V . Therefore, U = V , as
claimed.
Note that the above lemma actually applies to any grammar. Applying lemma 3.9.1 to
our example grammar, we get the following new grammar:

( )
Y1 Y2
(A, B) = (b, {aA, c}) + (b, {aA, c}),
Y3 Y4
( ) ( )( ) ( )
Y1 Y2 aB ∅ Y1 Y2 aB ∅
= +
Y3 Y4 B {d, Aa} Y3 Y4 B {d, Aa}
74 CHAPTER 3. CONTEXT-FREE GRAMMARS AND LANGUAGES

There are still some nonterminals appearing as leftmost symbols, but using the equations
defining A and B, we can replace A with
{bY1 , aAY3 , cY3 , b}
and B with
{bY2 , aAY4 , cY4, aA, c},
obtaining a system in weak Greibach Normal Form. This amounts to converting the matrix
( )
aB ∅
H=
B {d, Aa}
to the matrix
( )
aB ∅
L=
{bY2 , aAY4 , cY4, aA, c} {d, bY1 a, aAY3 a, cY3 a, ba}
The weak Greibach Normal Form corresponds to the new system
X = KY + K,
Y = LY + L.

This method works in general for any input grammar with no "-rules, no chain rules, and
such that every nonterminal belongs to T (G). Under these conditions, the row vector K
contains some nonempty entry, all strings in K are in ΣV ∗ , and all strings in H are in V + .
After obtaining the grammar GH defined by the system
X = KY + K,
Y = HY + H,
we use the system X = KY + K to express every nonterminal Ai in terms of expressions
containing strings αi,j involving a terminal as the leftmost symbol (αi,j ∈ ΣV ∗ ), and we
replace all leftmost occurrences of nonterminals in H (occurrences Ai in strings of the form
Ai β, where β ∈ V ∗ ) using the above expressions. In this fashion, we obtain a matrix L, and
it is immediately shown that the system
X = KY + K,
Y = LY + L,
generates the same tuple of languages. Furthermore, this last system corresponds to a weak
Greibach Normal Form.
It we start with a grammar in Chomsky Normal Form (with no production S → ") such
that every nonterminal belongs to T (G), we actually get a Greibach Normal Form (the entries
in K are terminals, and the entries in H are nonterminals). Thus, we have justified lemma
3.6.1. The method is also quite economical, since it introduces only m2 new nonterminals.
However, the resulting grammar may contain some useless nonterminals.
3.10. TREE DOMAINS AND GORN TREES 75

3.10 Tree Domains and Gorn Trees


Derivation trees play a very important role in parsing theory and in the proof of a strong
version of the pumping lemma for the context-free languages known as Ogden’s lemma.
Thus, it is important to define derivation trees rigorously. We do so using Gorn trees.
Let N+ = {1, 2, 3, . . .}.

Definition 3.10.1 A tree domain D is a nonempty subset of strings in N∗+ satisfying the
conditions:

(1) For all u, v ∈ N∗+ , if uv ∈ D, then u ∈ D.

(2) For all u ∈ N∗+ , for every i ∈ N+ , if ui ∈ D then uj ∈ D for every j, 1 ≤ j ≤ i.

The tree domain


D = {", 1, 2, 11, 21, 22, 221, 222, 2211}
is represented as follows:

"
3 4
1 2
3 3 4
11 21 22
3 4
221 222

2211

A tree labeled with symbols from a set ∆ is defined as follows.

Definition 3.10.2 Given a set ∆ of labels, a ∆-tree (for short, a tree) is a total function
t : D → ∆, where D is a tree domain.
The domain of a tree t is denoted as dom(t). Every string u ∈ dom(t) is called a tree
address or a node.

Let ∆ = {f, g, h, a, b}. The tree t : D → ∆, where D is the tree domain of the previous
example and t is the function whose graph is

{(", f ), (1, h), (2, g), (11, a), (21, a), (22, f ), (221, h), (222, b), (2211, a)}

is represented as follows:
76 CHAPTER 3. CONTEXT-FREE GRAMMARS AND LANGUAGES

f
3 4
h g
3 3 4
a a f
3 4
h b

a
The outdegree (sometimes called ramification) r(u) of a node u is the cardinality of the
set
{i | ui ∈ dom(t)}.
Note that the outdegree of a node can be infinite. Most of the trees that we shall consider
will be finite-branching, that is, for every node u, r(u) will be an integer, and hence finite.
If the outdegree of all nodes in a tree is bounded by n, then we can view the domain of the
tree as being defined over {1, 2, . . . , n}∗ .
A node of outdegree 0 is called a leaf . The node whose address is " is called the root of
the tree. A tree is finite if its domain dom(t) is finite. Given a node u in dom(t), every node
of the form ui in dom(t) with i ∈ N+ is called a son (or immediate successor ) of u.
Tree addresses are totally ordered lexicographically: u ≤ v if either u is a prefix of v or,
there exist strings x, y, z ∈ N∗+ and i, j ∈ N+ , with i < j, such that u = xiy and v = xjz.
In the first case, we say that u is an ancestor (or predecessor ) of v (or u dominates v)
and in the second case, that u is to the left of v.
If y = " and z = ", we say that xi is a left brother (or left sibling) of xj, (i < j). Two
tree addresses u and v are independent if u is not a prefix of v and v is not a prefix of u.
Given a finite tree t, the yield of t is the string
t(u1 )t(u2 ) · · · t(uk ),
where u1 , u2, . . . , uk is the sequence of leaves of t in lexicographic order.
For example, the yield of the tree below is aaab:

f
3 4
h g
3 3 4
a a f
3 4
h b

a
3.10. TREE DOMAINS AND GORN TREES 77

Given a finite tree t, the depth of t is the integer


d(t) = max{|u| | u ∈ dom(t)}.

Given a tree t and a node u in dom(t), the subtree rooted at u is the tree t/u, whose
domain is the set
{v | uv ∈ dom(t)}
and such that t/u(v) = t(uv) for all v in dom(t/u).
Another important operation is the operation of tree replacement (or tree substitution).
Definition 3.10.3 Given two trees t1 and t2 and a tree address u in t1 , the result of sub-
stituting t2 at u in t1 , denoted by t1 [u ← t2 ], is the function whose graph is the set of
pairs
{(v, t1 (v)) | v ∈ dom(t1 ), u is not a prefix of v} ∪ {(uv, t2(v)) | v ∈ dom(t2 )}.

Let t1 and t2 be the trees defined by the following diagrams:


Tree t1

f
3 4
h g
3 3 4
a a f
3 4
h b

a

Tree t2

g
3 4
a b
The tree t1 [22 ← t2 ] is defined by the following diagram:

f
3 4
h g
3 3 4
a a g
3 4
a b
78 CHAPTER 3. CONTEXT-FREE GRAMMARS AND LANGUAGES

We can now define derivation trees and relate derivations to derivation trees.

3.11 Derivations Trees


Definition 3.11.1 Given a context-free grammar G = (V, Σ, P, S), for any A ∈ N, an
A-derivation tree for G is a (V ∪ {"})-tree t (a tree with set of labels (V ∪ {"})) such that:
(1) t(") = A;

(2) For every nonleaf node u ∈ dom(t), if u1, . . . , uk are the successors of u, then either
there is a production B → X1 · · · Xk in P such that t(u) = B and t(ui) = Xi for all
i, 1 ≤ i ≤ k, or B → " ∈ P , t(u) = B and t(u1) = ". A complete derivation (or parse
tree) is an S-tree whose yield belongs to Σ∗ .

A derivation tree for the grammar

G3 = ({E, T, F, +, ∗, (, ), a}, {+, ∗, (, ), a}, P, E),

where P is the set of rules

E −→ E + T,
E −→ T,
T −→ T ∗ F,
T −→ F,
F −→ (E),
F −→ a,

is shown in Figure 3.1. The yield of the derivation tree is a + a ∗ a.


E

T
E
+
F
T T

F F
a

a a

Figure 3.1: A complete derivation tree

Derivations trees are associated to derivations inductively as follows.


3.11. DERIVATIONS TREES 79

Definition 3.11.2 Given a context-free grammar G = (V, Σ, P, S), for any A ∈ N, if π :


n
A =⇒ α is a derivation in G, we construct an A-derivation tree tπ with yield α as follows.
(1) If n = 0, then tπ is the one-node tree such that dom(tπ ) = {"} and tπ (") = A.
n−1
(2) If A =⇒ λBρ =⇒ λγρ = α, then if t1 is the A-derivation tree with yield λBρ associated
n−1
with the derivation A =⇒ λBρ, and if t2 is the tree associated with the production
B → γ (that is, if
γ = X1 · · · Xk ,
then dom(t2 ) = {", 1, . . . , k}, t2 (") = B, and t2 (i) = Xi for all i, 1 ≤ i ≤ k, or if γ = ",
then dom(t2 ) = {", 1}, t2 (") = B, and t2 (1) = "), then

tπ = t1 [u ← t2 ],

where u is the address of the leaf labeled B in t1 .


n
The tree tπ is the A-derivation tree associated with the derivation A =⇒ α.

Given the grammar

G2 = ({E, +, ∗, (, ), a}, {+, ∗, (, ), a}, P, E),

where P is the set of rules

E −→ E + E,
E −→ E ∗ E,
E −→ (E),
E −→ a,

the parse trees associated with two derivations of the string a + a ∗ a are shown in Figure
3.2:
E E

E E E
E
+ ∗
E E
E E
a ∗ + a

a a a a

Figure 3.2: Two derivation trees for a + a ∗ a

The following lemma is easily shown.


80 CHAPTER 3. CONTEXT-FREE GRAMMARS AND LANGUAGES

n
Lemma 3.11.3 Let G = (V, Σ, P, S) be a context-free grammar. For any derivation A =⇒
α, there is a unique A-derivation tree associated with this derivation, with yield α. Con-
versely, for any A-derivation tree t with yield α, there is a unique leftmost derivation

A =⇒ α in G having t as its associated derivation tree.
lm

We will now prove a strong version of the pumping lemma for context-free languages due
to Bill Ogden (1968).

3.12 Ogden’s Lemma


Ogden’s lemma states some combinatorial properties of parse trees that are deep enough.
The yield w of such a parse tree can be split into 5 substrings u, v, x, y, z such that

w = uvxyz,

where u, v, x, y, z satisfy certain conditions. It turns out that we get a more powerful version
of the lemma if we allow ourselves to mark certain occurrences of symbols in w before
invoking the lemma. We can imagine that marked occurrences in a nonempty string w are
occurrences of symbols in w in boldface, or red, or any given color (but one color only). For
example, given w = aaababbbaa, we can mark the symbols of even index as follows:

aaababbbaa.

More rigorously, we can define a marking of a nonnull string w : {1, . . . , n} → Σ as any


function m : {1, . . . , n} → {0, 1}. Then, a letter wi in w is a marked occurrence iff m(i) = 1,
and an unmarked occurrence if m(i) = 0. The number of marked occurrences in w is equal
to
, n
m(i).
i=1

Ogden’s lemma only yields useful information for grammars G generating an infinite
language. We could make this hypothesis, but it seems more elegant to use the precondition
that the lemma only applies to strings w ∈ L(D) such that w contains at least K marked
occurrences, for a constant K large enough. If K is large enough, L(G) will indeed be
infinite.

Lemma 3.12.1 For every context-free grammar G, there is some integer K > 1 such that,
for every string w ∈ Σ+ , for every marking of w, if w ∈ L(G) and w contains at least K
marked occurrences, then there exists some decomposition of w as w = uvxyz, and some
A ∈ N, such that the following properties hold:
3.12. OGDEN’S LEMMA 81

+ + +
(1) There are derivations S =⇒ uAz, A =⇒ vAy, and A =⇒ x, so that

uv n xy n z ∈ L(G)

for all n ≥ 0 (the pumping property);

(2) x contains some marked occurrence;

(3) Either (both u and v contain some marked occurrence), or (both y and z contain some
marked occurrence);

(4) vxy contains less than K marked occurrences.

Proof . Let t be any parse tree for w. We call a leaf of t a marked leaf if its label is a
marked occurrence in the marked string w. The general idea is to make sure that K is large
enough so that parse trees with yield w contain enough repeated nonterminals along some
path from the root to some marked leaf. Let r = |N|, and let

p = max{2, max{|α| | (A → α) ∈ P }}.

We claim that K = p2r+3 does the job.


The key concept in the proof is the notion of a B-node. Given a parse tree t, a B-node
is a node with at least two immediate successors u1 , u2 , such that for i = 1, 2, either ui is
a marked leaf, or ui has some marked leaf as a descendant. We construct a path from the
root to some marked leaf, so that for every B-node, we pick the leftmost successor with the
maximum number of marked leaves as descendants. Formally, define a path (s0 , . . . , sn ) from
the root to some marked leaf, so that:

(i) Every node si has some marked leaf as a descendant, and s0 is the root of t;

(ii) If sj is in the path, sj is not a leaf, and sj has a single immediate descendant which is
either a marked leaf or has marked leaves as its descendants, let sj+1 be that unique
immediate descendant of si .

(iii) If sj is a B-node in the path, then let sj+1 be the leftmost immediate successors of sj
with the maximum number of marked leaves as descendants (assuming that if sj+1 is
a marked leaf, then it is its own descendant).

(iv) If sj is a leaf, then it is a marked leaf and n = j.

We will show that the path (s0 , . . . , sn ) contains at least 2r + 3 B-nodes.


Claim: For every i, 0 ≤ i ≤ n, if the path (si , . . . , sn ) contains b B-nodes, then si has at
most pb marked leaves as descendants.
82 CHAPTER 3. CONTEXT-FREE GRAMMARS AND LANGUAGES

Proof . We proceed by “backward induction”, i.e., by induction on n − i. For i = n, there


are no B-nodes, so that b = 0, and there is indeed p0 = 1 marked leaf sn . Assume that the
claim holds for the path (si+1 , . . . , sn ).
If si is not a B-node, then the number b of B-nodes in the path (si+1 , . . . , sn ) is the same
as the number of B-nodes in the path (si , . . . , sn ), and si+1 is the only immediate successor
of si having a marked leaf as descendant. By the induction hypothesis, si+1 has at most pb
marked leaves as descendants, and this is also an upper bound on the number of marked
leaves which are descendants of si .
If si is a B-node, then if there are b B-nodes in the path (si+1 , . . . , sn ), there are b + 1
B-nodes in the path (si , . . . , sn ). By the induction hypothesis, si+1 has at most pb marked
leaves as descendants. Since si is a B-node, si+1 was chosen to be the leftmost immediate
successor of si having the maximum number of marked leaves as descendants. Thus, since
the outdegree of si is at most p, and each of its immediate successors has at most pb marked
leaves as descendants, the node si has at most ppd = pd+1 marked leaves as descendants, as
desired.
Applying the claim to s0 , since w has at least K = p2r+3 marked occurrences, we have
p ≥ p2r+3 , and since p ≥ 2, we have b ≥ 2r + 3, and the path (s0 , . . . , sn ) contains at least
b

2r + 3 B-nodes (Note that this would not follow if we had p = 1).


Let us now select the lowest 2r + 3 B-nodes in the path, (s0 , . . . , sn ), and denote them
(b1 , . . . , b2r+3 ). Every B-node bi has at least two immediate successors ui < vi such that ui
or vi is on the path (s0 , . . . , sn ). If the path goes through ui , we say that bi is a right B-node
and if the path goes through vi , we say that bi is a left B-node. Since 2r + 3 = r + 2 + r + 1,
either there are r + 2 left B-nodes or there are r + 2 right B-nodes in the path (b1 , . . . , b2r+3 ).
Let us assume that there are r + 2 left B-nodes, the other case being similar.
Let (d1 , . . . , dr+2 ) be the lowest r + 2 left B-nodes in the path. Since there are r + 1
B-nodes in the sequence (d2 , . . . , dr+2), and there are only r distinct nonterminals, there are
two nodes di and dj , with 2 ≤ i < j ≤ r + 2, such that t(di ) = t(dj ) = A, for some A ∈ N.
We can assume that di is an ancestor of dj , and thus, dj = di α, for some α 1= ".
If we prune out the subtree t/di rooted at di from t, we get an S-derivation tree having
+
a yield of the form uAz, and we have a derivation of the form S =⇒ uAz, since there are
at least r + 2 left B-nodes on the path, and we are looking at the lowest r + 1 left B-nodes.
Considering the subtree t/di , pruning out the subtree t/dj rooted at α in t/di , we get an
A-derivation tree having a yield of the form vAy, and we have a derivation of the form
+
A =⇒ vAy. Finally, the subtree t/dj is an A-derivation tree with yield x, and we have a
+
derivation A =⇒ x. This proves (1) of the lemma.
Since sn is a marked leaf and a descendant of dj , x contains some marked occurrence,
proving (2).
Since d1 is a left B-node, some left sibbling of the immediate successor of d1 on the path
has some distinguished leaf in u as a descendant. Similarly, since di is a left B-node, some
3.12. OGDEN’S LEMMA 83

left sibbling of the immediate successor of di on the path has some distinguished leaf in v as
a descendant. This proves (3).
(dj , . . . , b2r+3 ) has at most 2r + 1 B-nodes, and by the claim shown earlier, dj has at most
2r+1
p marked leaves as descendants. Since p2r+1 < p2r+3 = K, this proves (4).
Observe that condition (2) implies that x 1= ", and condition (3) implies that either
u 1= " and v 1= ", or y 1= " and z 1= ". Thus, the pumping condition (1) implies that the set
{uv n xy n z | n ≥ 0} is an infinite subset of L(G), and L(G) is indeed infinite, as we mentioned
earlier. Note that K ≥ 3, and in fact, K ≥ 32. The “standard pumping lemma” due to
Bar-Hillel, Perles, and Shamir, is obtained by letting all occurrences be marked in w ∈ L(G).

Lemma 3.12.2 For every context-free grammar G (without "-rules), there is some integer
K > 1 such that, for every string w ∈ Σ+ , if w ∈ L(G) and |w| ≥ K, then there exists some
decomposition of w as w = uvxyz, and some A ∈ N, such that the following properties hold:
+ + +
(1) There are derivations S =⇒ uAz, A =⇒ vAy, and A =⇒ x, so that

uv n xy n z ∈ L(G)

for all n ≥ 0 (the pumping property);

(2) x 1= ";

(3) Either v 1= " or y 1= ";

(4) |vxy| ≤ K.

A stronger version could be stated, and we are just following tradition in stating this
standard version of the pumping lemma.
Ogden’s lemma or the pumping lemma can be used to show that certain languages are
not context-free. The method is to proceed by contradiction, i.e., to assume (contrary to
what we wish to prove) that a language L is indeed context-free, and derive a contradiction
of Ogden’s lemma (or of the pumping lemma). Thus, as in the case of the regular languages,
it would be helpful to see what the negation of Ogden’s lemma is, and for this, we first state
Ogden’s lemma as a logical formula.
For any nonnull string w : {1, . . . , n} → Σ, for any marking m : {1, . . . , n} → {0, 1} of w,
for any substring y of w, where w = xyz, with |x| = h and k = |y|, the number of marked
occurrences in y, denoted as |m(y)|, is defined as

i=h+k
,
|m(y)| = m(i).
i=h+1
84 CHAPTER 3. CONTEXT-FREE GRAMMARS AND LANGUAGES

We will also use the following abbreviations:

nat = {0, 1, 2, . . .},


nat32 = {32, 33, . . .},
A ≡ w = uvxyz,
B ≡ |m(x)| ≥ 1,
C ≡ (|m(u)| ≥ 1 ∧ |m(v)| ≥ 1) ∨ (|m(y)| ≥ 1 ∧ |m(z)| ≥ 1),
D ≡ |m(vxy)| < K,
P ≡ ∀n : nat (uv n xy n z ∈ L(D)).

Ogden’s lemma can then be stated as

∀G : CFG ∃K : nat32 ∀w : Σ∗ ∀m : marking


( )

(w ∈ L(D) ∧ |m(w)| ≥ K) ⊃ (∃u, v, x, y, z : Σ A ∧ B ∧ C ∧ D ∧ P ) .

Recalling that

¬(A ∧ B ∧ C ∧ D ∧ P ) ≡ ¬(A ∧ B ∧ C ∧ D) ∨ ¬P ≡ (A ∧ B ∧ C ∧ D) ⊃ ¬P

and
¬(P ⊃ Q) ≡ P ∧ ¬Q,
the negation of Ogden’s lemma can be stated as

∃G : CFG ∀K : nat32 ∃w : Σ∗ ∃m : marking


( )

(w ∈ L(D) ∧ |m(w)| ≥ K) ∧ (∀u, v, x, y, z : Σ (A ∧ B ∧ C ∧ D) ⊃ ¬P ) .

Since
¬P ≡ ∃n : nat (uv n xy n z ∈
/ L(D)),
in order to show that Ogden’s lemma is contradicted, one needs to show that for some
context-free grammar G, for every K ≥ 2, there is some string w ∈ L(D) and some marking
m of w with at least K marked occurrences in w, such that for every possible decomposition
w = uvxyz satisfying the constraints A ∧ B ∧ C ∧ D, there is some n ≥ 0 such that
uv n xy n z ∈
/ L(D). When proceeding by contradiction, we have a language L that we are
(wrongly) assuming to be context-free and we can use any CFG grammar G generating L.
The creative part of the argument is to pick the right w ∈ L and the right marking of w
(not making any assumption on K).
As an illustration, we show that the language

L = {an bn cn | n ≥ 1}
3.12. OGDEN’S LEMMA 85

is not context-free. Since L is infinite, we will be able to use the pumping lemma.
The proof proceeds by contradiction. If L was context-free, there would be some context-
free grammar G such that L = L(G), and some constant K > 1 as in Ogden’s lemma. Let
w = aK bK cK , and choose the b$ s as marked occurrences. Then by Ogden’s lemma, x contains
some marked occurrence, and either both u, v or both y, z contain some marked occurrence.
Assume that both u and v contain some b. We have the following situation:

# · · · $%
a ab · · · &b b# ·$%
· · &b b# · · · $%
bc · · · &c .
u v xyz

If we consider the string uvvxyyz, the number of a’s is still K, but the number of b’s is strictly
greater than K since v contains at least one b, and thus uvvxyyz ∈ / L, a contradiction.
If both y and z contain some b we will also reach a contradiction because in the string
uvvxyyz, the number of c’s is still K, but the number of b’s is strictly greater than K.
Having reached a contradiction in all cases, we conclude that L is not context-free.
Let us now show that the language

L = {am bn cm dn | m, n ≥ 1}

is not context-free.
Again, we proceed by contradiction. This time, let

w = aK bK cK dK ,

where the b’s and c’s are marked occurrences.


By Ogden’s lemma, either both u, v contain some marked occurrence, or both y, z contain
some marked occurrence, and x contains some marked occurrence. Let us first consider the
case where both u, v contain some marked occurrence.
If v contains some b, since uvvxyyz ∈ L, v must contain only b’s, since otherwise we
would have a bad string in L, and we have the following situation:

# · · · $%
a ab · · · &b b# ·$%
· · &b b# · · · bc ·$%
· · cd · · · d& .
u v xyz

Since uvvxyyz ∈ L, the only way to preserve an equal number of b’s and d’s is to have
y ∈ d+ . But then, vxy contains cK , which contradicts (4) of Ogden’s lemma.
If v contains some c, since x also contains some marked occurrence, it must be some c,
and v contains only c’s and we have the following situation:

a · · bc · · · &c c# ·$%
# · · · ab ·$% · · &c c# · · · cd
$% · · · d& .
u v xyz
86 CHAPTER 3. CONTEXT-FREE GRAMMARS AND LANGUAGES

Since uvvxyyz ∈ L and the number of a’s is still K whereas the number of c’s is strictly
more than K, this case is impossible.
Let us now consider the case where both y, z contain some marked occurrence. Reasoning
as before, the only possibility is that v ∈ a+ and y ∈ c+ :

# ·$%
a · · a& a
# ·$%
· · a& a
# · · · ab ·$%
· · bc · · · &c c# ·$%
· · c& #c · · · cd
$% · · · d& .
u v x y z

But then, vxy contains bK , which contradicts (4) of Ogden’s Lemma. Since a contradiction
was obtained in all cases, L is not context-free.
Ogden’s lemma can also be used to show that the context-free language

{am bn cn | m, n ≥ 1} ∪ {am bm cn | m, n ≥ 1}

is inherently ambiguous. The proof is quite involved.


Another corollary of the pumping lemma is that it is decidable whether a context-free
grammar generates an infinite language.

Lemma 3.12.3 Given any context-free grammar, G, if K is the constant of Ogden’s lemma,
then the following equivalence holds:
L(G) is infinite iff there is some w ∈ L(G) such that K ≤ |w| < 2K.

Proof . Let K = p2r+3 be the constant from the proof of Lemma 3.12.1. If there is some
w ∈ L(G) such that |w| ≥ K, we already observed that the pumping lemma implies that
L(G) contains an infinite subset of the form {uv n xy n z | n ≥ 0}. Conversely, assume that
L(G) is infinite. If |w| < K for all w ∈ L(G), then L(G) is finite. Thus, there is some
w ∈ L(G) such that |w| ≥ K. Let w ∈ L(G) be a minimal string such that |w| ≥ K. By the
pumping lemma, we can write w as w = uvxyxz, where x 1= ", vy 1= ", and |vxy| ≤ K. By
the pumping property, uxz ∈ L(G). If |w| ≥ 2K, then

|uxz| = |uvxyz| − |vy| > |uvxyz| − |vxy| ≥ 2K − K = K,

and |uxz| < |uvxyz|, contradicting the minimality of w. Thus, we must have |w| < 2K.
In particular, if G is in Chomsky Normal Form, it can be shown that we just have to
consider derivations of length at most 4K − 3.

You might also like