1 Chomsky Hierarchy
Grammars for each task
Figure 1: Noam Chomsky
• Different types of rules, allow one to describe different aspects of natural language
• These grammars form a hierarchy
Grammars in General
All grammars we consider will be of the form G = (V, Σ, R, S)
• V is a finite set of variables
• Σ is a finite set of terminals
• R is a finite set of rules
• S is the start symbol
The different grammars will be determined by the form of the rules in R.
1.1 Regular Languages
Type 3 Grammars
The rules in a type 3 grammar are of the form
A → aB or A→a
where A, B ∈ V and a ∈ Σ ∪ {}.
∗
We say αAβ ⇒G αγβ iff A → γ ∈ R. L(G) = {w ∈ Σ∗ | S ⇒G w}
1
1.1.1 Type 3 Grammars and Regularity
Type 3 Grammars and Regularity
Proposition 1. If G is Type 3 grammar then L(G) is regular. Conversely, if L is regular then
there is a Type 3 grammar G such that L = L(G).
Proof. Let G = (V, Σ, R, S) be a type 3 grammar. Consider the NFA M = (Q, Σ, δ, q0 , F ) where
• Q = V ∪ {qF }, where qF 6∈ V
• q0 = S
• F = {qF }
• δ(A, a) = {B | if A → aB ∈ R} ∪ {qF | if A → a ∈ R} for A ∈ V . And δ(qF , a) = ∅ for all a.
∗ w
L(M ) = L(G) as ∀A ∈ V , ∀w ∈ Σ∗ , A ⇒G w iff A −→M qF .
Conversely, let M = (Q, Σ, δ, q0 , F ) be a NFA recognizing L. Consider G = (V, Σ, R, S) where
• V =Q
• S = q0
• q1 → aq2 ∈ R iff q2 ∈ δ(q1 , a) and q → ∈ R iff q ∈ F .
w ∗
We can show, for any q, q 0 ∈ Q and w ∈ Σ∗ , q −→M q 0 iff q ⇒G wq 0 . Thus, L(M ) = L(G).
1.2 Context-free Languages
Type 2 Grammars
The rules in a type 2 grammar are of the form
A→β
where A ∈ V and β ∈ (Σ ∪ V )∗ .
∗
We say αAβ ⇒G αγβ iff A → γ ∈ R. L(G) = {w ∈ Σ∗ | S ⇒G w}
By definition, Type 2 grammars describe exactly the class of context-free languages.
2
1.3 Beyond Context-Free Languages
1.3.1 Type 0 Grammars
Type 0 Grammars
The rules in a type 0 grammar are of the form
α→β
where α, β ∈ (Σ ∪ V )∗ .
∗
We say γ1 αγ2 ⇒G γ1 βγ2 iff α → β ∈ R. L(G) = {w ∈ Σ∗ | S ⇒G w}
Example of Type 0 Grammar
Example 2. Consider the grammar G with Σ = {a} with
S → $Ca# | a | Ca → aaC $D → $C
C# → D# | E aD → Da aE → Ea
$E →
The following are derivations in this grammar
S ⇒ $Ca# ⇒ $aaC# ⇒ $aaE ⇒ $aEa ⇒ $Eaa ⇒ aa
S ⇒ $Ca# ⇒ $aaC# ⇒ $aaD# ⇒ $aDa# ⇒ $Daa# ⇒ $Caa#
⇒ $aaCa# ⇒ $aaaaC# ⇒ $aaaaE ⇒ $aaaEa ⇒ $aaEaa
⇒ $aEaaa ⇒ $Eaaaa ⇒ aaaa
L(G) = {ai | i is a power of 2}
Expressive Power of Type 0 Grammars
Recall that any decision problem can be thought of as a formal language L, where x ∈ L iff the
answer on input x is “yes”.
Proposition 3. A decision problem L can be “solved on computers” iff L can be described by a
Type 0 grammar.
Proof. Need to develop some theory, that we will see in the next few weeks.
3
1.3.2 Type 1 Grammars
Type 1 Grammars
The rules in a type 1 grammar are of the form
α→β
where α, β ∈ (Σ ∪ V )∗ and |α| ≤ |β|.
∗
We say γ1 αγ2 ⇒G γ1 βγ2 iff α → β ∈ R. L(G) = {w ∈ Σ∗ | S ⇒G w}
Normal Form for Type 1 Grammars
We can define a normal form for Type 1 grammars where all rules are of the form
α1 Aα2 → α1 βα2
Thus, the rules in Type 1, can be seen as rules of a CFG where a variable A is replaced by a
string β in one step, with the only difference being that rule can be applied only in the context
α1 α2 .
Thus, languages described by Type 1 grammars are called context-sensitive languages.
1.3.3 Hierarchy
Chomsky Hierarchy
Theorem 4. Type 0, Type 1, Type 2, and Type 3 grammars define a strict hierarchy of formal
languages.
Proof. Clearly a Type 3 grammar is a special Type 2 grammar, a Type 2 grammar is a special
Type 1 grammar, and a Type 1 grammar is special Type 0 grammar.
Moreover, there is a language that has a Type 2 grammar but no Type 3 grammar (L =
{0n 1n | n ≥ 0}), a language that has a Type 1 grammar but no Type 2 grammar (L = {an bn cn | n ≥
0}), and a language with a Type 0 grammar but no Type 1 grammar.
Overview of Languages
4
Languages
Type 0
CSL
= Type 1 Lanbncn
CFL
= Type 2 L0n1n
Regular
= Type 3