Transitive Closure
Transitive Closure
Context-Free Grammars,
Context-Free Languages, Parse Trees
and Ogden’s Lemma
Definition 3.1.1 A context-free grammar (for short, CFG) is a quadruple G = (V, Σ, P, S),
where
• V is a finite set of symbols called the vocabulary (or set of grammar symbols);
The set N = V − Σ is called the set of nonterminal symbols (for short, nonterminals). Thus,
P ⊆ N × V ∗ , and every production &A, α' is also denoted as A → α. A production of the
form A → " is called an epsilon rule, or null rule.
45
46 CHAPTER 3. CONTEXT-FREE GRAMMARS AND LANGUAGES
As we will see shortly, this grammar generates the language L1 = {an bn | n ≥ 1}, which
is not regular.
Example 2. G2 = ({E, +, ∗, (, ), a}, {+, ∗, (, ), a}, P, E), where P is the set of rules
E −→ E + E,
E −→ E ∗ E,
E −→ (E),
E −→ a.
It is easily verified that R+ is the smallest transitive relation containing R, and that
(x, y) ∈ R+ iff there is some n ≥ 1 and some x0 , x1 , . . . , xn ∈ A such that x0 = x, xn = y,
and (xi , xi+1 ) ∈ R for all i, 0 ≤ i ≤ n − 1. The transitive and reflexive closure R∗ of the
relation R is defined as !
R∗ = Rn .
n≥0
Definition 3.2.1 Given a context-free grammar G = (V, Σ, P, S), the (one-step) derivation
relation =⇒G associated with G is the binary relation =⇒G ⊆ V ∗ × V ∗ defined as follows:
for all α, β ∈ V ∗ , we have
α =⇒G β
iff there exist λ, ρ ∈ V ∗ , and some production (A → γ) ∈ P , such that
When the grammar G is clear from the context, we usually omit the subscript G in =⇒G ,
+ ∗
=⇒G , and =⇒G .
∗
A string α ∈ V ∗ such that S =⇒ α is called a sentential form, and a string w ∈ Σ∗ such
∗ ∗
that S =⇒ w is called a sentence. A derivation α =⇒ β involving n steps is denoted as
n
α =⇒ β.
Note that a derivation step
α =⇒G β
is rather nondeterministic. Indeed, one can choose among various occurrences of nontermi-
nals A in α, and also among various productions A → γ with left-hand side A.
For example, using the grammar G1 = ({E, a, b}, {a, b}, P, E), where P is the set of rules
E −→ aEb,
E −→ ab,
48 CHAPTER 3. CONTEXT-FREE GRAMMARS AND LANGUAGES
or
∗
E =⇒ an Ebn =⇒ an aEbbn = an+1 Ebn+1 ,
where n ≥ 0.
Grammar G1 is very simple: every string an bn has a unique derivation. This is usually
not the case. For example, using the grammar G2 = ({E, +, ∗, (, ), a}, {+, ∗, (, ), a}, P, E),
where P is the set of rules
E −→ E + E,
E −→ E ∗ E,
E −→ (E),
E −→ a,
the string a + a ∗ a has the following distinct derivations, where the boldface indicates which
occurrence of E is rewritten:
E =⇒ E ∗ E =⇒ E + E ∗ E
=⇒ a + E ∗ E =⇒ a + a ∗ E =⇒ a + a ∗ a,
and
E =⇒ E + E =⇒ a + E
=⇒ a + E ∗ E =⇒ a + a ∗ E =⇒ a + a ∗ a.
In the above derivations, the leftmost occurrence of a nonterminal is chosen at each step.
Such derivations are called leftmost derivations. We could systematically rewrite the right-
most occurrence of a nonterminal, getting rightmost derivations. The string a + a ∗ a also
has the following two rightmost derivations, where the boldface indicates which occurrence
of E is rewritten:
E =⇒ E + E =⇒ E + E ∗ E
=⇒ E + E ∗ a =⇒ E + a ∗ a =⇒ a + a ∗ a,
and
E =⇒ E ∗ E =⇒ E ∗ a
=⇒ E + E ∗ a =⇒ E + a ∗ a =⇒ a + a ∗ a.
Definition 3.2.2 Given a context-free grammar G = (V, Σ, P, S), the language generated
by G is the set
+
L(G) = {w ∈ Σ∗ | S =⇒ w}.
A language L ⊆ Σ∗ is a context-free language (for short, CFL) iff L = L(G) for some
context-free grammar G.
Definition 3.2.3 Given a context-free grammar G = (V, Σ, P, S), the (one-step) leftmost
derivation relation =⇒ associated with G is the binary relation =⇒ ⊆ V ∗ × V ∗ defined as
lm lm
follows: for all α, β ∈ V ∗ , we have
α =⇒ β
lm
∗ ∗
iff there exist u ∈ Σ , ρ ∈ V , and some production (A → γ) ∈ P , such that
α =⇒ β
rm
Remarks: It is customary to use the symbols a, b, c, d, e for terminal symbols, and the
symbols A, B, C, D, E for nonterminal symbols. The symbols u, v, w, x, y, z denote terminal
strings, and the symbols α, β, γ, λ, ρ, µ denote strings in V ∗ . The symbols X, Y, Z usually
denote symbols in V .
Given a context-free grammar G = (V, Σ, P, S), parsing a string w consists in finding
out whether w ∈ L(G), and if so, in producing a derivation for w. The following lemma is
technically very important. It shows that leftmost and rightmost derivations are “universal”.
This has some important practical implications for the complexity of parsing algorithms.
50 CHAPTER 3. CONTEXT-FREE GRAMMARS AND LANGUAGES
Proof . Of course, we have to somehow use induction on derivations, but this is a little
tricky, and it is necessary to prove a stronger fact. We treat leftmost derivations, rightmost
derivations being handled in a similar way.
n
Claim: For every w ∈ Σ∗ , for every α ∈ V + , for every n ≥ 1, if α =⇒ w, then there is a
n
leftmost derivation α =⇒ w.
lm
n−1
α =⇒ α1 =⇒ w.
lm lm
Case 2. The derivation step α =⇒ α1 is a not a leftmost step. In this case, there must
be some u ∈ Σ∗ , µ, ρ ∈ V ∗ , some nonterminals A and B, and some production B → δ, such
that
α = uAµBρ and α1 = uAµδρ,
n−1
where A is the leftmost nonterminal in α. Since we have a derivation α1 =⇒ w of length
n − 1, by the induction hypothesis, there is a leftmost derivation
n−1
α1 =⇒ w.
lm
Since α1 = uAµδρ where A is the leftmost terminal in α1 , the first step in the leftmost
n−1
derivation α1 =⇒ w is of the form
lm
uAµδρ =⇒ uγµδρ,
lm
3.2. DERIVATIONS AND CONTEXT-FREE LANGUAGES 51
We can commute the first two steps involving the productions B → δ and A → γ, and we
get the derivation
n−2
α = uAµBρ =⇒ uγµBρ =⇒ uγµδρ =⇒ w.
lm lm
This may no longer be a leftmost derivation, but the first step is leftmost, and we are
back in case 1. Thus, we conclude by applying the induction hypothesis to the derivation
n−1
uγµBρ =⇒ w, as in case 1.
Lemma 3.2.4 implies that
+ +
L(G) = {w ∈ Σ∗ | S =⇒ w} = {w ∈ Σ∗ | S =⇒ w}.
lm rm
We observed that if we consider the grammar G2 = ({E, +, ∗, (, ), a}, {+, ∗, (, ), a}, P, E),
where P is the set of rules
E −→ E + E,
E −→ E ∗ E,
E −→ (E),
E −→ a,
the string a + a ∗ a has the following two distinct leftmost derivations, where the boldface
indicates which occurrence of E is rewritten:
E =⇒ E ∗ E =⇒ E + E ∗ E
=⇒ a + E ∗ E =⇒ a + a ∗ E =⇒ a + a ∗ a,
and
E =⇒ E + E =⇒ a + E
=⇒ a + E ∗ E =⇒ a + a ∗ E =⇒ a + a ∗ a.
When this happens, we say that we have an ambiguous grammars. In some cases, it is
possible to modify a grammar to make it unambiguous. For example, the grammar G2 can
be modified as follows.
Let G3 = ({E, T, F, +, ∗, (, ), a}, {+, ∗, (, ), a}, P, E), where P is the set of rules
E −→ E + T,
E −→ T,
T −→ T ∗ F,
T −→ F,
F −→ (E),
F −→ a.
52 CHAPTER 3. CONTEXT-FREE GRAMMARS AND LANGUAGES
We leave as an exercise to show that L(G3 ) = L(G2 ), and that every string in L(G3 ) has
a unique leftmost derivation. Unfortunately, it is not always possible to modify a context-
free grammar to make it unambiguous. There exist context-free languages that have no
unambiguous context-free grammars. For example, the language
L3 = {am bm cn | m, n ≥ 1} ∪ {am bn cn | m, n ≥ 1}
is context-free, since it is generated by the following context-free grammar:
S → S1 ,
S → S2 ,
S1 → XC,
S2 → AY,
X → aXb,
X → ab,
Y → bY c,
Y → bc,
A → aA,
A → a,
C → cC,
C → c.
However, it can be shown that L3 has no unambiguous grammars. All this motivates the
following definition.
Definition 3.2.5 A context-free grammar G = (V, Σ, P, S) is ambiguous if there is some
string w ∈ L(G) that has two distinct leftmost derivations (or two distinct rightmost deriva-
tions). Thus, a grammar G is unambiguous if every string w ∈ L(G) has a unique leftmost
derivation (or a unique rightmost derivation). A context-free language L is inherently am-
biguous if every CFG G for L is ambiguous.
Whether or not a grammar is ambiguous affects the complexity of parsing. Parsing algo-
rithms for unambiguous grammars are more efficient than parsing algorithms for ambiguous
grammars.
We now consider various normal forms for context-free grammars.
G = (V, Σ, P, S) is in Chomsky Normal Form iff its productions are of the form
A → BC,
A → a, or
S → ",
where A, B, C ∈ N, a ∈ Σ, S → " is in P iff " ∈ L(G), and S does not occur on the
right-hand side of any production.
Note that a grammar in Chomsky Normal Form does not have "-rules, i.e., rules of the
form A → ", except when " ∈ L(G), in which case S → " is the only "-rule. It also does not
have chain rules, i.e., rules of the form A → B, where A, B ∈ N. Thus, in order to convert
a grammar to Chomsky Normal Form, we need to show how to eliminate "-rules and chain
rules. This is not the end of the story, since we may still have rules of the form A → α where
either |α| ≥ 3 or |α| ≥ 2 and α contains terminals. However, dealing with such rules is a
simple recoding matter, and we first focus on the elimination of "-rules and chain rules. It
turns out that "-rules must be eliminated first.
The first step to eliminate "-rules is to compute the set E(G) of erasable (or nullable)
nonterminals
+
E(G) = {A ∈ N | A =⇒ "}.
The set E(G) is computed using a sequence of approximations Ei defined as follows:
E0 = {A ∈ N | (A → ") ∈ P },
Ei+1 = Ei ∪ {A | ∃(A → B1 . . . Bj . . . Bk ) ∈ P, Bj ∈ Ei , 1 ≤ j ≤ k}.
E0 ⊆ E1 ⊆ · · · ⊆ Ei ⊆ Ei+1 ⊆ · · · ⊆ N,
and since N is finite, there is a least i, say i0 , such that Ei0 = Ei0 +1 . We claim that
E(G) = Ei0 . Actually, we prove the following lemma.
Lemma 3.3.1 Given a context-free grammar G = (V, Σ, P, S), one can construct a context-
free grammar G$ = (V $ , Σ, P $, S $ ) such that:
(2) P $ contains no "-rules other than S $ → ", and S $ → " ∈ P $ iff " ∈ L(G);
Proof . We begin by proving that E(G) = Ei0 . For this, we prove that E(G) ⊆ Ei0 and
Ei0 ⊆ E(G).
54 CHAPTER 3. CONTEXT-FREE GRAMMARS AND LANGUAGES
+ + +
A =⇒ B1 . . . Bj . . . Bk =⇒ B2 . . . Bj . . . Bk =⇒ Bj . . . Bk =⇒ ",
To prove that E(G) ⊆ Ei0 , we also proceed by induction, but on the length of a derivation
+ 1
A =⇒ ". If A =⇒ ", then A → " ∈ P , and thus A ∈ E0 since E0 = {A ∈ N | (A → ") ∈ P }.
n+1
If A =⇒ ", then
n
A =⇒ α =⇒ ",
Having shown that E(G) = Ei0 , we construct the grammar G$ . Its set of production P $
is defined as follows. First, we create the production S $ → S where S $ ∈
/ V , to make sure
$ $
that S does not occur on the right-hand side of any rule in P . Let
P1 = {A → α ∈ P | α ∈ V + } ∪ {S $ → S},
Note that " ∈ L(G) iff S ∈ E(G). If S ∈/ E(G), then let P $ = P1 ∪ P2 , and if S ∈ E(G), then
let P $ = P1 ∪ P2 ∪ {S $ → "}. We claim that L(G$ ) = L(G), which is proved by showing that
every derivation using G can be simulated by a derivation using G$ , and vice-versa. All the
conditions of the lemma are now met.
From a practical point of view, the construction or lemma 3.3.1 is very costly. For
3.3. NORMAL FORMS FOR CONTEXT-FREE GRAMMARS 55
S → ABCDEF,
A → ",
B → ",
C → ",
D → ",
E → ",
F → ",
... → ...,
IA,0 = {B ∈ N | (A → B) ∈ P },
IA,i+1 = IA,i ∪ {C ∈ N | ∃(B → C) ∈ P, and B ∈ IA,i }.
and since N is finite, there is a least i, say i0 , such that IA,i0 = IA,i0 +1 . We claim that
IA = IA,i0 . Actually, we prove the following lemma.
Lemma 3.3.2 Given a context-free grammar G = (V, Σ, P, S), one can construct a context-
free grammar G$ = (V $ , Σ, P $, S $ ) such that:
At this stage, the grammar obtained in lemma 3.3.2 no longer has "-rules (except perhaps
$
S → " iff " ∈ L(G)) or chain rules. However, it may contain rules A → α with |α| ≥ 3, or
with |α| ≥ 2 and where α contains terminals(s). To obtain the Chomsky Normal Form. we
need to eliminate such rules. This is not difficult, but notationally a bit messy.
Lemma 3.3.3 Given a context-free grammar G = (V, Σ, P, S), one can construct a context-
free grammar G$ = (V $ , Σ, P $ , S $) such that L(G$ ) = L(G) and G$ is in Chomsky Normal
Form, that is, a grammar whose productions are of the form
A → BC,
A → a, or
S $ → ",
where A, B, C ∈ N $ , a ∈ Σ, S $ → " is in P $ iff " ∈ L(G), and S $ does not occur on the
right-hand side of any production in P $.
Proof . First, we apply lemma 3.3.2, obtaining G1 . Let Σr be the set of terminals
occurring on the right-hand side of rules A → α ∈ P1 , with |α| ≥ 2. For every a ∈ Σr , let
Xa be a new nonterminal not in V1 . Let
P2 = {Xa → a | a ∈ Σr }.
A → α1 a1 α2 · · · αk ak αk+1 ,
A → α1 a1 α2 · · · αk ak αk+1
in P1,r , let
A → α1 Xa1 α2 · · · αk Xak αk+1
be a new production, and let P3 be the set of all such productions. Let P4 = (P1 − P1,r ) ∪
P2 ∪ P3 . Now, productions A → α in P4 with |α| ≥ 2 do not contain terminals. However, we
may still have productions A → α ∈ P4 with |α| ≥ 3. We can perform some recoding using
some new nonterminals. For every production of the form
A → B1 · · · Bk ,
A → [B1 · · · Bk−1]Bk ,
[B1 · · · Bk−1 ] → [B1 · · · Bk−2]Bk−1 ,
··· → ··· ,
[B1 B2 B3 ] → [B1 B2 ]B3 ,
[B1 B2 ] → B1 B2 .
All the productions are now in Chomsky Normal Form, and it is clear that the same language
is generated.
Applying the first phase of the method of lemma 3.3.3 to the grammar G$3 , we get the
rules
E −→ EX+ T,
E −→ T X∗ F,
E −→ X( EX) ,
E −→ a,
T −→ T X∗ F,
T −→ X( EX) ,
T −→ a,
F −→ X( EX) ,
F −→ a,
X+ −→ +,
X∗ −→ ∗,
X( −→ (,
X) −→).
After applying the second phase of the method, we get the following grammar in Chomsky
3.3. NORMAL FORMS FOR CONTEXT-FREE GRAMMARS 59
Normal Form:
E −→ [EX+ ]T,
[EX+ ] −→ EX+ ,
E −→ [T X∗ ]F,
[T X∗ ] −→ T X∗ ,
E −→ [X( E]X) ,
[X( E] −→ X( E,
E −→ a,
T −→ [T X∗ ]F,
T −→ [X( E]X) ,
T −→ a,
F −→ [X( E]X) ,
F −→ a,
X+ −→ +,
X∗ −→ ∗,
X( −→ (,
X) −→).
For large grammars, it is often convenient to use the abbreviation which consists in group-
ing productions having a common left-hand side, and listing the right-hand sides separated
by the symbol |. Thus, a group of productions
A → α1 ,
A → α2 ,
··· → ··· ,
A → αk ,
may be abbreviated as
A → α1 | α2 | · · · | αk .
A → Ba,
A → a,
A → ".
A → aB,
A → a,
A → ".
where A, B ∈ N, and a ∈ Σ.
The following lemma shows the equivalence between NFA’s and right-linear grammars.
δ(A, a) = {B ∈ N $ | A → aB ∈ P $ },
3.5. USELESS PRODUCTIONS IN CONTEXT-FREE GRAMMARS 61
for all A ∈ N and all a ∈ Σ. It is easily shown by induction on the length of w that
∗
A =⇒ wB iff B ∈ δ ∗ (A, w),
A → Bu,
A → u,
where A, B ∈ N, and u ∈ Σ∗ .
E −→ aEb,
E −→ ab,
E −→ A,
A −→ bAa.
The problem is that the nonterminal A does not derive any terminal strings, and thus, it
is useless, as well as the last two productions. Let us now consider the grammar G4 =
({E, A, a, b, c, d}, {a, b, c, d}, P, E), where P is the set of rules
E −→ aEb,
E −→ ab,
A −→ cAd,
A −→ cd.
This time, the nonterminal A generates strings of the form cn dn , but there is no derivation
+
E =⇒ α from E where A occurs in α. The nonterminal A is not connected to E, and the last
two rules are useless. Fortunately, it is possible to find such useless rules, and to eliminate
them.
Let T (G) be the set of nonterminals that actually derive some terminal string, i.e.
The set T (G) can be defined by stages. We define the sets Tn (n ≥ 1) as follows:
T1 = {A ∈ (V − Σ) | ∃(A −→ w) ∈ P, with w ∈ Σ∗ },
and
Tn+1 = Tn ∪ {A ∈ (V − Σ) | ∃(A −→ β) ∈ P, with β ∈ (Tn ∪ Σ)∗ }.
It is easy to prove that there is some least n such that Tn+1 = Tn , and that for this n,
T (G) = Tn .
If S ∈
/ T (G), then L(G) = ∅, and G is equivalent to the trivial grammar
G$ = ({S}, Σ, ∅, S).
If S ∈ T (G), then let U(G) be the set of nonterminals that are actually useful, i.e.,
U(G) = {A ∈ T (G) | ∃α, β ∈ (T (G) ∪ Σ)∗ , S =⇒∗ αAβ}.
The set U(G) can also be computed by stages. We define the sets Un (n ≥ 1) as follows:
U1 = {A ∈ T (G) | ∃(S −→ αAβ) ∈ P, with α, β ∈ (T (G) ∪ Σ)∗ },
and
Un+1 = Un ∪ {B ∈ T (G) | ∃(A −→ αBβ) ∈ P, with A ∈ Un , α, β ∈ (T (G) ∪ Σ)∗ }.
It is easy to prove that there is some least n such that Un+1 = Un , and that for this n,
U(G) = Un ∪ {S}. Then, we can use U(G) to transform G into an equivalent CFG in
which every nonterminal is useful (i.e., for which V − Σ = U(G)). Indeed, simply delete all
rules containing symbols not in U(G). The details are left as an exercise. We say that a
context-free grammar G is reduced if all its nonterminals are useful, i.e., N = U(G).
It should be noted than although dull, the above considerations are important in practice.
Certain algorithms for constructing parsers, for example, LR-parsers, may loop if useless
rules are not eliminated!
We now consider another normal form for context-free grammars, the Greibach Normal
Form.
where A, B, C ∈ N, a ∈ Σ, S → " is in P iff " ∈ L(G), and S does not occur on the
right-hand side of any production.
Note that a grammar in Greibach Normal Form does not have "-rules other than possibly
S → ". More importantly, except for the special rule S → ", every rule produces some
terminal symbol.
An important consequence of the Greibach Normal Form is that every nonterminal is
+
not left recursive. A nonterminal A is left recursive iff A =⇒ Aα for some α ∈ V ∗ . Left
recursive nonterminals cause top-down determinitic parsers to loop. The Greibach Normal
Form provides a way of avoiding this problem.
There are no easy proofs that every CFG can be converted to a Greibach Normal Form.
A particularly elegant method due to Rosenkrantz using least fixed-points and matrices will
be given in section 3.9.
Lemma 3.6.1 Given a context-free grammar G = (V, Σ, P, S), one can construct a context-
free grammar G$ = (V $ , Σ, P $ , S $ ) such that L(G$ ) = L(G) and G$ is in Greibach Normal
Form, that is, a grammar whose productions are of the form
A → aBC,
A → aB,
A → a, or
S $ → ",
where A, B, C ∈ N $ , a ∈ Σ, S $ → " is in P $ iff " ∈ L(G), and S $ does not occur on the
right-hand side of any production in P $.
Definition 3.7.1 Given a partially ordered set &A, ≤', an ω-chain (an )n≥0 is a sequence
such that an ≤ an+1 for all n ≥ 0. The least-upper bound of an ω-chain (an ) is an element
a ∈ A such that:
Remark : The ω in ω-chain means that we are considering countable chains (ω is the
ordinal associated with the order-type of the set of natural numbers). This notation may
seem arcane, but is standard in denotational semantics.
For example, given any set X, the power set 2X ordered by inclusion is an ω-chain
complete poset with least element ∅. The Cartesian product 2# X × ·$%
· · × 2X& ordered such
n
that
(A1 , . . . , An ) ≤ (B1 , . . . , Bn )
iff Ai ⊆ Bi (where Ai , Bi ∈ 2X ) is an ω-chain complete poset with least element (∅, . . . , ∅).
We are interested in functions between partially ordered sets.
Definition 3.7.2 Given any two partially ordered sets &A1 , ≤1 ' and &A2 , ≤2 ', a function
f : A1 → A2 is monotonic iff for all x, y ∈ A1 ,
If &A1 , ≤1 ' and &A2 , ≤2 ' are ω-chain complete posets, a function f : A1 → A2 is ω-continuous
iff it is monotonic, and for every ω-chain (an ),
' '
f ( an ) = f (an ).
Definition 3.7.3 Let &A, ≤' be a partially ordered set, and let f : A → A be a function.
A fixed-point of f is an element a ∈ A such that f (a) = a. The least fixed-point of f is an
element a ∈ A such that f (a) = a, and for every b ∈ A such that f (b) = b, then a ≤ b.
The following lemma gives sufficient conditions for the existence of least fixed-points. It
is one of the key lemmas in denotational semantics.
Lemma 3.7.4 Let &A, ≤' be an ω-chain complete poset with least element ⊥. Every ω-
continuous function f : A → A has a unique least fixed-point x0 given by
'
x0 = f n (⊥).
The second part of lemma 3.7.4 is very useful to prove that functions have the same
least fixed-point. For example, under the conditions of lemma 3.7.4, if g : A → A is another
ω-chain continuous function, letting x0 be the least fixed-point of f and y0 be the least
fixed-point of g, if f (y0) ≤ y0 and g(x0 ) ≤ x0 , we can deduce that x0 = y0 . Indeed, since
f (y0 ) ≤ y0 and x0 is the least fixed-point of f , we get x0 ≤ y0 , and since g(x0 ) ≤ x0 and y0
is the least fixed-point of g, we get y0 ≤ x0 , and therefore x0 = y0 .
Lemma 3.7.4 also shows that the least fixed-point x0 of f can be approximated as much as
desired, using the sequence (f n (⊥)). We will now apply this fact to context-free grammars.
For this, we need to show how a context-free grammar G = (V, Σ, P, S) with m nonterminals
induces an ω-continuous map
· · × 2Σ& → 2# Σ × ·$%
ΦG : #2Σ × ·$% · · × 2Σ& .
∗ ∗ ∗ ∗
m m
66 CHAPTER 3. CONTEXT-FREE GRAMMARS AND LANGUAGES
inductively as follows:
Φ[Λ](∅) = ∅,
Φ[Λ]({"}) = {"},
Φ[Λ]({a}) = {a}, if a ∈ Σ,
Φ[Λ]({Ai }) = Li , if Ai ∈ N,
Φ[Λ]({αX}) = Φ[Λ]({α})Φ[Λ]({X}), if α ∈ V + , X ∈ V,
Φ[Λ](Q ∪ {α}) = Φ[Λ](Q) ∪ Φ[Λ]({α}), if Q ∈ Pf in (V ∗ ), Q 1= ∅, α ∈ V ∗ , α ∈
/ Q.
Then, writing the grammar G as
A1 → α1,1 + · · · + α1,n1 ,
··· → ···
Ai → αi,1 + · · · + αi,ni ,
··· → ···
Am → αm,1 + · · · + αm,nn ,
we define the map
ΦG : #2Σ × ·$%
· · × 2Σ& → 2# Σ × ·$%
· · × 2Σ&
∗ ∗ ∗ ∗
m m
such that
ΦG (L1 , . . . Lm ) = (Φ[Λ]({α1,1 , . . . , α1,n1 }), . . . , Φ[Λ]({αm,1 , . . . , αm,nm }))
for all Λ = (L1 , . . . , Lm ) ∈ 2# Σ × ·$%
· · × 2Σ&.
∗ ∗
m
3.8. CONTEXT-FREE LANGUAGES AS LEAST FIXED-POINTS 67
One should verify that the map Φ[Λ] is well defined, but this is easy. The following
lemma is easily shown:
m m
is ω-continuous.
Now, 2# Σ × ·$%
· · × 2Σ& is an ω-chain complete poset, and the map ΦG is ω-continous. Thus,
∗ ∗
m
by lemma 3.7.4, the map ΦG has a least-fixed point. It turns out that the components of
this least fixed-point are precisely the languages generated by the grammars (V, Σ, P, Ai ).
Before proving this fact, let us give an example illustrating it.
Example. Consider the grammar G = ({A, B, a, b}, {a, b}, P, A) defined by the rules
A → BB + ab,
B → aBb + ab.
where
Φ0G,A (∅, ∅) = Φ0G,B (∅, ∅) = ∅,
and
Φn+1 n n
G,A (∅, ∅) = ΦG,B (∅, ∅)ΦG,B (∅, ∅) ∪ {ab},
Φn+1 n
G,B (∅, ∅) = aΦG,B (∅, ∅)b ∪ {ab}.
By induction, we can easily prove that the two components of the least fixed-point are
the languages
Letting GA = ({A, B, a, b}, {a, b}, P, A) and GB = ({A, B, a, b}, {a, b}, P, B), it is indeed
true that LA = L(GA ) and LB = L(GB ) .
We have the following theorem due to Ginsburg and Rice:
Proof . Writing G as
A1 → α1,1 + · · · + α1,n1 ,
··· → ···
Ai → αi,1 + · · · + αi,ni ,
··· → ···
Am → αm,1 + · · · + αm,nn ,
w ∈ Φ1G,i (∅, . . . , ∅)
w ∈ ΦnG,i (∅, . . . , ∅)
for some n ≥ 2 iff there is some rule Ai → αi,j with αi,j of the form
and
w = u1w1 u2 · · · uk wk uk+1.
w = u1 w1 u2 · · · uk wk uk+1
wh ∈ ΦpG,j
h
h
(∅, . . . , ∅),
wh ∈ Φn−1
G,jh (∅, . . . , ∅),
and
w = u1w1 u2 · · · uk wk uk+1.
h p
By the induction hypothesis, Ajh =⇒ wh with ph ≤ (M + 1)n−2, and thus
p1 k p
Ai =⇒ u1 Aj1 u2 · · · uk Ajk uk+1 =⇒ · · · =⇒ w,
70 CHAPTER 3. CONTEXT-FREE GRAMMARS AND LANGUAGES
p
so that Ai =⇒ w with
since k ≤ M.
Combining Claim 1 and Claim 2, we have
!
L(GAi ) = ΦnG,i (∅, . . . , ∅),
n
which proves that the least fixed-point of the map ΦG is the m-tuple of languages
We now show how theorem 3.8.3 can be used to give a short proof that every context-free
grammar can be converted to Greibach Normal Form.
A → aα, or
S → ",
where a ∈ Σ, α ∈ V ∗ , and if S → " is a rule, then S does not occur on the right-hand side of
any rule. Indeed, if we first convert G to Chomsky Normal Form, it turns out that we will
get rules of the form A → aBC, A → aB or A → a.
Using the algorithm for eliminating "-rules and chain rules, we can first convert the
original grammar to a grammar with no chain rules and no "-rules except possibly S → ",
in which case, S does not appear on the right-hand side of rules. Thus, for the purpose
of converting to weak Greibach Normal Form, we can assume that we are dealing with
grammars without chain rules and without "-rules. Let us also assume that we computed
the set T (G) of nonterminals that actually derive some terminal string, and that useless
productions involving symbols not in T (G) have been deleted.
Let us explain the idea of the conversion using the following grammar:
A → AaB + BB + b.
B → Bd + BAa + aA + c.
3.9. LEAST FIXED-POINTS AND THE GREIBACH NORMAL FORM 71
The first step is to group the right-hand sides α into two categories: those whose leftmost
symbol is a terminal (α ∈ ΣV ∗ ) and those whose leftmost symbol is a nonterminal (α ∈
NV ∗ ). It is also convenient to adopt a matrix notation, and we can write the above grammar
as
( )
aB ∅
(A, B) = (A, B) + (b, {aA, c})
B {d, Aa}
Thus, we are dealing with matrices (and row vectors) whose entries are finite subsets of
∗
V . For notational simplicity, braces around singleton sets are omitted. The finite subsets of
V ∗ form a semiring, where addition is union, and multiplication is concatenation. Addition
and multiplication of matrices are as usual, except that the semiring operations are used. We
will also consider matrices whose entries are languages over Σ. Again, the languages over Σ
form a semiring, where addition is union, and multiplication is concatenation. The identity
element for addition is ∅, and the identity element for multiplication is {"}. As above,
addition and multiplication of matrices are as usual, except that the semiring operations are
used. For example, given any languages Ai,j and Bi,j over Σ, where i, j ∈ {1, 2}, we have
( )( ) ( )
A1,1 A1,2 B1,1 B1,2 A1,1 B1,1 ∪ A1,2 B2,1 A1,1 B1,2 ∪ A1,2 B2,2
=
A2,1 A2,2 B2,1 B2,2 A2,1 B1,1 ∪ A2,2 B2,1 A2,1 B1,2 ∪ A2,2 B2,2
Letting X = (A, B), K = (b, {aA, c}), and
( )
aB ∅
H=
B {d, Aa}
the above grammar can be concisely written as
X = XH + K.
where A0 = Idm , the identity matrix, and An is the n-th power of A. Similarly, we define
A+ where !
A+
i,j = Ani,j .
n≥1
Given a matrix A where the entries are finite subset of V ∗ , where N = {A1 , . . . , Am }, for
any m-tuple Λ = (L1 , . . . , Lm ) of languages over Σ, we let
X = KY + K,
Y = HY + H,
where Y = (Yi,j ).
The following lemma is the key to the Greibach Normal Form.
X = KY + K,
Y = HY + H,
as explained above, then the components in X of the least-fixed points of the maps ΦG and
ΦGH are equal.
3.9. LEAST FIXED-POINTS AND THE GREIBACH NORMAL FORM 73
Proof . Let U be the least-fixed point of ΦG , and let (V, W ) be the least fixed-point of
ΦGH . We shall prove that U = V . For notational simplicity, let us denote Φ[U](H) as H[U]
and Φ[U](K) as K[U].
Since U is the least fixed-point of X = XH + K, we have
U = UH[U] + K[U].
Since H[U] and K[U] do not contain any nonterminals, by a previous remark, K[U]H ∗ [U] is
the least-fixed point of X = XH[U] + K[U], and thus,
K[U]H ∗ [U] ≤ U.
On the other hand, by monotonicity,
* + * +
K[U]H ∗ [U]H K[U]H ∗ [U] + K K[U]H ∗ [U] ≤ K[U]H ∗ [U]H[U] + K[U] = K[U]H ∗ [U],
( )
Y1 Y2
(A, B) = (b, {aA, c}) + (b, {aA, c}),
Y3 Y4
( ) ( )( ) ( )
Y1 Y2 aB ∅ Y1 Y2 aB ∅
= +
Y3 Y4 B {d, Aa} Y3 Y4 B {d, Aa}
74 CHAPTER 3. CONTEXT-FREE GRAMMARS AND LANGUAGES
There are still some nonterminals appearing as leftmost symbols, but using the equations
defining A and B, we can replace A with
{bY1 , aAY3 , cY3 , b}
and B with
{bY2 , aAY4 , cY4, aA, c},
obtaining a system in weak Greibach Normal Form. This amounts to converting the matrix
( )
aB ∅
H=
B {d, Aa}
to the matrix
( )
aB ∅
L=
{bY2 , aAY4 , cY4, aA, c} {d, bY1 a, aAY3 a, cY3 a, ba}
The weak Greibach Normal Form corresponds to the new system
X = KY + K,
Y = LY + L.
This method works in general for any input grammar with no "-rules, no chain rules, and
such that every nonterminal belongs to T (G). Under these conditions, the row vector K
contains some nonempty entry, all strings in K are in ΣV ∗ , and all strings in H are in V + .
After obtaining the grammar GH defined by the system
X = KY + K,
Y = HY + H,
we use the system X = KY + K to express every nonterminal Ai in terms of expressions
containing strings αi,j involving a terminal as the leftmost symbol (αi,j ∈ ΣV ∗ ), and we
replace all leftmost occurrences of nonterminals in H (occurrences Ai in strings of the form
Ai β, where β ∈ V ∗ ) using the above expressions. In this fashion, we obtain a matrix L, and
it is immediately shown that the system
X = KY + K,
Y = LY + L,
generates the same tuple of languages. Furthermore, this last system corresponds to a weak
Greibach Normal Form.
It we start with a grammar in Chomsky Normal Form (with no production S → ") such
that every nonterminal belongs to T (G), we actually get a Greibach Normal Form (the entries
in K are terminals, and the entries in H are nonterminals). Thus, we have justified lemma
3.6.1. The method is also quite economical, since it introduces only m2 new nonterminals.
However, the resulting grammar may contain some useless nonterminals.
3.10. TREE DOMAINS AND GORN TREES 75
Definition 3.10.1 A tree domain D is a nonempty subset of strings in N∗+ satisfying the
conditions:
"
3 4
1 2
3 3 4
11 21 22
3 4
221 222
↓
2211
Definition 3.10.2 Given a set ∆ of labels, a ∆-tree (for short, a tree) is a total function
t : D → ∆, where D is a tree domain.
The domain of a tree t is denoted as dom(t). Every string u ∈ dom(t) is called a tree
address or a node.
Let ∆ = {f, g, h, a, b}. The tree t : D → ∆, where D is the tree domain of the previous
example and t is the function whose graph is
{(", f ), (1, h), (2, g), (11, a), (21, a), (22, f ), (221, h), (222, b), (2211, a)}
is represented as follows:
76 CHAPTER 3. CONTEXT-FREE GRAMMARS AND LANGUAGES
f
3 4
h g
3 3 4
a a f
3 4
h b
↓
a
The outdegree (sometimes called ramification) r(u) of a node u is the cardinality of the
set
{i | ui ∈ dom(t)}.
Note that the outdegree of a node can be infinite. Most of the trees that we shall consider
will be finite-branching, that is, for every node u, r(u) will be an integer, and hence finite.
If the outdegree of all nodes in a tree is bounded by n, then we can view the domain of the
tree as being defined over {1, 2, . . . , n}∗ .
A node of outdegree 0 is called a leaf . The node whose address is " is called the root of
the tree. A tree is finite if its domain dom(t) is finite. Given a node u in dom(t), every node
of the form ui in dom(t) with i ∈ N+ is called a son (or immediate successor ) of u.
Tree addresses are totally ordered lexicographically: u ≤ v if either u is a prefix of v or,
there exist strings x, y, z ∈ N∗+ and i, j ∈ N+ , with i < j, such that u = xiy and v = xjz.
In the first case, we say that u is an ancestor (or predecessor ) of v (or u dominates v)
and in the second case, that u is to the left of v.
If y = " and z = ", we say that xi is a left brother (or left sibling) of xj, (i < j). Two
tree addresses u and v are independent if u is not a prefix of v and v is not a prefix of u.
Given a finite tree t, the yield of t is the string
t(u1 )t(u2 ) · · · t(uk ),
where u1 , u2, . . . , uk is the sequence of leaves of t in lexicographic order.
For example, the yield of the tree below is aaab:
f
3 4
h g
3 3 4
a a f
3 4
h b
↓
a
3.10. TREE DOMAINS AND GORN TREES 77
Given a tree t and a node u in dom(t), the subtree rooted at u is the tree t/u, whose
domain is the set
{v | uv ∈ dom(t)}
and such that t/u(v) = t(uv) for all v in dom(t/u).
Another important operation is the operation of tree replacement (or tree substitution).
Definition 3.10.3 Given two trees t1 and t2 and a tree address u in t1 , the result of sub-
stituting t2 at u in t1 , denoted by t1 [u ← t2 ], is the function whose graph is the set of
pairs
{(v, t1 (v)) | v ∈ dom(t1 ), u is not a prefix of v} ∪ {(uv, t2(v)) | v ∈ dom(t2 )}.
f
3 4
h g
3 3 4
a a f
3 4
h b
↓
a
Tree t2
g
3 4
a b
The tree t1 [22 ← t2 ] is defined by the following diagram:
f
3 4
h g
3 3 4
a a g
3 4
a b
78 CHAPTER 3. CONTEXT-FREE GRAMMARS AND LANGUAGES
We can now define derivation trees and relate derivations to derivation trees.
(2) For every nonleaf node u ∈ dom(t), if u1, . . . , uk are the successors of u, then either
there is a production B → X1 · · · Xk in P such that t(u) = B and t(ui) = Xi for all
i, 1 ≤ i ≤ k, or B → " ∈ P , t(u) = B and t(u1) = ". A complete derivation (or parse
tree) is an S-tree whose yield belongs to Σ∗ .
E −→ E + T,
E −→ T,
T −→ T ∗ F,
T −→ F,
F −→ (E),
F −→ a,
T
E
+
F
T T
∗
F F
a
a a
tπ = t1 [u ← t2 ],
E −→ E + E,
E −→ E ∗ E,
E −→ (E),
E −→ a,
the parse trees associated with two derivations of the string a + a ∗ a are shown in Figure
3.2:
E E
E E E
E
+ ∗
E E
E E
a ∗ + a
a a a a
n
Lemma 3.11.3 Let G = (V, Σ, P, S) be a context-free grammar. For any derivation A =⇒
α, there is a unique A-derivation tree associated with this derivation, with yield α. Con-
versely, for any A-derivation tree t with yield α, there is a unique leftmost derivation
∗
A =⇒ α in G having t as its associated derivation tree.
lm
We will now prove a strong version of the pumping lemma for context-free languages due
to Bill Ogden (1968).
w = uvxyz,
where u, v, x, y, z satisfy certain conditions. It turns out that we get a more powerful version
of the lemma if we allow ourselves to mark certain occurrences of symbols in w before
invoking the lemma. We can imagine that marked occurrences in a nonempty string w are
occurrences of symbols in w in boldface, or red, or any given color (but one color only). For
example, given w = aaababbbaa, we can mark the symbols of even index as follows:
aaababbbaa.
Ogden’s lemma only yields useful information for grammars G generating an infinite
language. We could make this hypothesis, but it seems more elegant to use the precondition
that the lemma only applies to strings w ∈ L(D) such that w contains at least K marked
occurrences, for a constant K large enough. If K is large enough, L(G) will indeed be
infinite.
Lemma 3.12.1 For every context-free grammar G, there is some integer K > 1 such that,
for every string w ∈ Σ+ , for every marking of w, if w ∈ L(G) and w contains at least K
marked occurrences, then there exists some decomposition of w as w = uvxyz, and some
A ∈ N, such that the following properties hold:
3.12. OGDEN’S LEMMA 81
+ + +
(1) There are derivations S =⇒ uAz, A =⇒ vAy, and A =⇒ x, so that
uv n xy n z ∈ L(G)
(3) Either (both u and v contain some marked occurrence), or (both y and z contain some
marked occurrence);
Proof . Let t be any parse tree for w. We call a leaf of t a marked leaf if its label is a
marked occurrence in the marked string w. The general idea is to make sure that K is large
enough so that parse trees with yield w contain enough repeated nonterminals along some
path from the root to some marked leaf. Let r = |N|, and let
(i) Every node si has some marked leaf as a descendant, and s0 is the root of t;
(ii) If sj is in the path, sj is not a leaf, and sj has a single immediate descendant which is
either a marked leaf or has marked leaves as its descendants, let sj+1 be that unique
immediate descendant of si .
(iii) If sj is a B-node in the path, then let sj+1 be the leftmost immediate successors of sj
with the maximum number of marked leaves as descendants (assuming that if sj+1 is
a marked leaf, then it is its own descendant).
left sibbling of the immediate successor of di on the path has some distinguished leaf in v as
a descendant. This proves (3).
(dj , . . . , b2r+3 ) has at most 2r + 1 B-nodes, and by the claim shown earlier, dj has at most
2r+1
p marked leaves as descendants. Since p2r+1 < p2r+3 = K, this proves (4).
Observe that condition (2) implies that x 1= ", and condition (3) implies that either
u 1= " and v 1= ", or y 1= " and z 1= ". Thus, the pumping condition (1) implies that the set
{uv n xy n z | n ≥ 0} is an infinite subset of L(G), and L(G) is indeed infinite, as we mentioned
earlier. Note that K ≥ 3, and in fact, K ≥ 32. The “standard pumping lemma” due to
Bar-Hillel, Perles, and Shamir, is obtained by letting all occurrences be marked in w ∈ L(G).
Lemma 3.12.2 For every context-free grammar G (without "-rules), there is some integer
K > 1 such that, for every string w ∈ Σ+ , if w ∈ L(G) and |w| ≥ K, then there exists some
decomposition of w as w = uvxyz, and some A ∈ N, such that the following properties hold:
+ + +
(1) There are derivations S =⇒ uAz, A =⇒ vAy, and A =⇒ x, so that
uv n xy n z ∈ L(G)
(2) x 1= ";
(4) |vxy| ≤ K.
A stronger version could be stated, and we are just following tradition in stating this
standard version of the pumping lemma.
Ogden’s lemma or the pumping lemma can be used to show that certain languages are
not context-free. The method is to proceed by contradiction, i.e., to assume (contrary to
what we wish to prove) that a language L is indeed context-free, and derive a contradiction
of Ogden’s lemma (or of the pumping lemma). Thus, as in the case of the regular languages,
it would be helpful to see what the negation of Ogden’s lemma is, and for this, we first state
Ogden’s lemma as a logical formula.
For any nonnull string w : {1, . . . , n} → Σ, for any marking m : {1, . . . , n} → {0, 1} of w,
for any substring y of w, where w = xyz, with |x| = h and k = |y|, the number of marked
occurrences in y, denoted as |m(y)|, is defined as
i=h+k
,
|m(y)| = m(i).
i=h+1
84 CHAPTER 3. CONTEXT-FREE GRAMMARS AND LANGUAGES
Recalling that
¬(A ∧ B ∧ C ∧ D ∧ P ) ≡ ¬(A ∧ B ∧ C ∧ D) ∨ ¬P ≡ (A ∧ B ∧ C ∧ D) ⊃ ¬P
and
¬(P ⊃ Q) ≡ P ∧ ¬Q,
the negation of Ogden’s lemma can be stated as
Since
¬P ≡ ∃n : nat (uv n xy n z ∈
/ L(D)),
in order to show that Ogden’s lemma is contradicted, one needs to show that for some
context-free grammar G, for every K ≥ 2, there is some string w ∈ L(D) and some marking
m of w with at least K marked occurrences in w, such that for every possible decomposition
w = uvxyz satisfying the constraints A ∧ B ∧ C ∧ D, there is some n ≥ 0 such that
uv n xy n z ∈
/ L(D). When proceeding by contradiction, we have a language L that we are
(wrongly) assuming to be context-free and we can use any CFG grammar G generating L.
The creative part of the argument is to pick the right w ∈ L and the right marking of w
(not making any assumption on K).
As an illustration, we show that the language
L = {an bn cn | n ≥ 1}
3.12. OGDEN’S LEMMA 85
is not context-free. Since L is infinite, we will be able to use the pumping lemma.
The proof proceeds by contradiction. If L was context-free, there would be some context-
free grammar G such that L = L(G), and some constant K > 1 as in Ogden’s lemma. Let
w = aK bK cK , and choose the b$ s as marked occurrences. Then by Ogden’s lemma, x contains
some marked occurrence, and either both u, v or both y, z contain some marked occurrence.
Assume that both u and v contain some b. We have the following situation:
# · · · $%
a ab · · · &b b# ·$%
· · &b b# · · · $%
bc · · · &c .
u v xyz
If we consider the string uvvxyyz, the number of a’s is still K, but the number of b’s is strictly
greater than K since v contains at least one b, and thus uvvxyyz ∈ / L, a contradiction.
If both y and z contain some b we will also reach a contradiction because in the string
uvvxyyz, the number of c’s is still K, but the number of b’s is strictly greater than K.
Having reached a contradiction in all cases, we conclude that L is not context-free.
Let us now show that the language
L = {am bn cm dn | m, n ≥ 1}
is not context-free.
Again, we proceed by contradiction. This time, let
w = aK bK cK dK ,
# · · · $%
a ab · · · &b b# ·$%
· · &b b# · · · bc ·$%
· · cd · · · d& .
u v xyz
Since uvvxyyz ∈ L, the only way to preserve an equal number of b’s and d’s is to have
y ∈ d+ . But then, vxy contains cK , which contradicts (4) of Ogden’s lemma.
If v contains some c, since x also contains some marked occurrence, it must be some c,
and v contains only c’s and we have the following situation:
a · · bc · · · &c c# ·$%
# · · · ab ·$% · · &c c# · · · cd
$% · · · d& .
u v xyz
86 CHAPTER 3. CONTEXT-FREE GRAMMARS AND LANGUAGES
Since uvvxyyz ∈ L and the number of a’s is still K whereas the number of c’s is strictly
more than K, this case is impossible.
Let us now consider the case where both y, z contain some marked occurrence. Reasoning
as before, the only possibility is that v ∈ a+ and y ∈ c+ :
# ·$%
a · · a& a
# ·$%
· · a& a
# · · · ab ·$%
· · bc · · · &c c# ·$%
· · c& #c · · · cd
$% · · · d& .
u v x y z
But then, vxy contains bK , which contradicts (4) of Ogden’s Lemma. Since a contradiction
was obtained in all cases, L is not context-free.
Ogden’s lemma can also be used to show that the context-free language
{am bn cn | m, n ≥ 1} ∪ {am bm cn | m, n ≥ 1}
Lemma 3.12.3 Given any context-free grammar, G, if K is the constant of Ogden’s lemma,
then the following equivalence holds:
L(G) is infinite iff there is some w ∈ L(G) such that K ≤ |w| < 2K.
Proof . Let K = p2r+3 be the constant from the proof of Lemma 3.12.1. If there is some
w ∈ L(G) such that |w| ≥ K, we already observed that the pumping lemma implies that
L(G) contains an infinite subset of the form {uv n xy n z | n ≥ 0}. Conversely, assume that
L(G) is infinite. If |w| < K for all w ∈ L(G), then L(G) is finite. Thus, there is some
w ∈ L(G) such that |w| ≥ K. Let w ∈ L(G) be a minimal string such that |w| ≥ K. By the
pumping lemma, we can write w as w = uvxyxz, where x 1= ", vy 1= ", and |vxy| ≤ K. By
the pumping property, uxz ∈ L(G). If |w| ≥ 2K, then
and |uxz| < |uvxyz|, contradicting the minimality of w. Thus, we must have |w| < 2K.
In particular, if G is in Chomsky Normal Form, it can be shown that we just have to
consider derivations of length at most 4K − 3.