MTL783: Theory of Computation 1st Semester, 2025-2026
Lecture 2 — 28/07/2025
Lecturer: Prof Minati De Scribe: 2
Scribed by:
1. Abhishek Singh (2022MT11934)
2. Adarsh Singh (2022MT11285)
3. Mehta Maalav Pranav (2022MT11265)
4. Tirth Golwala (2022MT11967)
5. Vagesh Mahajan (2022MT11260)
1 Overview
In the last lecture, we were introduced to the course and learned basic concepts such as alphabet
(a set of symbols), symbol, string (a sequence of symbols), and the length of a string.
In this lecture, we build on those ideas and define what a language is : a set of strings formed using
an alphabet. We cover important operations on strings, including concatenation, reverse, and
length, and see how these behave. We also introduce the special sets Σ∗ , Σ+ , and give examples of
finite and infinite languages.
Later, we discuss how to describe languages using grammars. A grammar is a set of rules that
define how strings can be generated. We learn the formal definition of a grammar, how derivations
work, and see several examples of grammars generating specific types of languages, like an bn and
an bn+1 .
2 Languages
We begin with a finite, nonempty set Σ of symbols, called the alphabet.
Definition 2.1 (String). A string (or word) on Σ is a finite sequence of symbols from Σ. The
empty string, denoted λ, has length zero.
If Σ = {a, b}, then for example w = abaaa is a string on Σ.
Definition 2.2 (Concatenation). Given strings w = a1 a2 · · · an and v = b1 b2 · · · bm , their concate-
nation is
wv = a1 a2 · · · an b1 b2 · · · bm .
1
Definition 2.3 (Reverse and Length). The reverse of w, denoted wR , is the string with symbols of
w in reverse order. The length of w, denoted |w|, is the number of symbols in w. We have |λ| = 0
and for all w,
|λw| = |wλ| = |w|.
A fundamental property is:
|uv| = |u| + |v|.
Example 2.1 (Proof of length additivity). Show that |uv| = |u| + |v| for any strings u, v. Use the
recursive definition:
|a| = 1, |wa| = |w| + 1.
The proof proceeds by induction on |v|.
Next, for any string w and integer n ≥ 0, define
wn = ww · · · w},
| {z w0 = λ.
n times
Definition 2.4. For alphabet Σ,
Σ∗ = {all strings over Σ},
Σ+ = Σ∗ \ {λ}.
A language L over Σ is any subset L ⊆ Σ∗ .
Example 2.2. If Σ = {a, b} then
Σ∗ = {λ, a, b, aa, ab, ba, bb, . . .}.
The finite language {a, aa, aab} is an example, and also
L = {an bn : n ≥ 0}
is an infinite language.
Language operations: union, intersection, complement (wrt Σ∗ ), concatenation
L1 L2 = {xy : x ∈ L1 , y ∈ L2 },
reverse LR = {wR : w ∈ L}, and closures
[ [
L∗ = Ln , L+ = Ln .
n≥0 n≥1
Example 2.3. Let L = {an bn : n ≥ 0}. Then
L2 = {an bn am bm : n, m ≥ 0}.
Reverse gives
LR = {bn an : n ≥ 0}.
2
3 Grammars
To describe languages, we use grammars.
Definition 3.1 (Grammar). A grammar is a quadruple G = (V, T, S, P ) where:
• V is a finite set of objects called variables,
• T is a finite set of terminal symbols,
• S ∈ V is a special symbol called the start variable,
• P is a finite set of productions.
It is assumed that the sets V and T are nonempty and disjoint.
The production rules define how the grammar transforms one string into another, thereby defining
a language associated with the grammar.
Each production rule is of the form:
x → y,
where x ∈ (V ∪ T )+ and y ∈ (V ∪ T )∗ .
Given a string w = uxv, the production x → y is applicable and produces the new string z = uyv.
This is denoted as:
w ⇒ z.
We say that w derives z, or z is derived from w.
Successive derivations are written as:
w1 ⇒ w2 ⇒ · · · ⇒ wn ,
and the notation:
w1 ⇒∗ wn
means that w1 derives wn in zero or more steps.
Definition 3.2. Let G = (V, T, S, P ) be a grammar. Then the language generated by G is
defined as:
L(G) = {w ∈ T ∗ : S ⇒∗ w}.
That is, L(G) consists of all terminal strings that can be derived from the start symbol S using the
production rules in P .
Example 3.1. Consider the grammar:
G = ({S}, {a, b}, S, P ),
with production rules P given by:
S → aSb, S → λ.
3
We can derive strings using this grammar as follows:
S ⇒ aSb ⇒ aaSbb ⇒ aabb.
Thus,
S ⇒∗ aabb.
The grammar G completely defines the language L(G), though it may not always be easy to explicitly
describe. In this case, we can conjecture:
L(G) = {an bn : n ≥ 0},
and it is straightforward to prove this by induction.
We observe that the rule S → aSb is recursive. Let us prove by induction that all sentential forms
(intermediate steps in derivation) are of the form:
wi = ai Sbi . (1.7)
Assume that (1.7) holds for all sentential forms wi of length 2i + 1 or less. Then applying the rule
S → aSb yields:
ai Sbi ⇒ ai+1 Sbi+1 ,
which is of the same form as (1.7), now with i + 1. Since the base case i = 1 is trivially true (aSb),
by induction the form holds for all i.
To produce a sentence (a terminal string), we finally apply the rule S → λ. This gives:
S ⇒∗ an Sbn ⇒ an bn .
Thus, the grammar G derives only strings of the form an bn .
We must also show that every string of the form an bn can be derived by G. This is evident, since
we can apply S → aSb exactly n times, followed by S → λ.
S ⇒ aSb ⇒ aaSbb ⇒ · · · ⇒ an Sbn ⇒ an bn .
Example 3.2. We wish to construct a grammar that generates the language:
L = {an bn+1 : n ≥ 0}.
We build upon the idea from the earlier example that generated an bn , and simply add one extra b.
This can be achieved by defining:
G = ({S, A}, {a, b}, S, P )
with productions:
S → Ab
A → aAb
A→λ
4
1. Any string derived by G is in L
The derivation always starts with S → Ab, guaranteeing one final b. Each application of A → aAb
adds one a and one more b, so after n such steps followed by A → λ, we obtain an bn+1 .
2. Any string in L can be derived
Given any n ≥ 0, we can:
• Apply A → aAb n times to get an Abn ,
• Then apply A → λ,
• The outermost rule S → Ab gives one more b,
yielding:
S ⇒ Ab ⇒ an Abn b ⇒ an bn+1 .
The grammar G generates exactly:
L(G) = {an bn+1 : n ≥ 0}.
Example 3.3. Let Σ = {a, b}, and let na (w) and nb (w) denote the number of occurrences of a and
b in a string w, respectively. We define:
L = {w ∈ Σ∗ : na (w) = nb (w)}.
The grammar G = ({S}, {a, b}, S, P ) has the following productions:
S → SS
S → aSb
S → bSa
S→λ
We aim to show that:
L(G) = {w ∈ Σ∗ : na (w) = nb (w)}.
Observation: All strings generated by G are in L. This is straightforward. All production rules that
introduce an a also introduce a b. Hence, every sentential form — and therefore every terminal
string derived — must contain equal numbers of as and bs.
Thus:
∀w ∈ L(G), na (w) = nb (w).
Let us examine how to construct such a derivation based on the structure of w ∈ L:
5
Case 1: w = aw1 b If w begins with a and ends with b, then it can be derived via:
S ⇒ aSb ⇒ aw1 b
provided w1 ∈ L. The number of as and bs in w1 must be equal.
Case 2: w = bw1 a Similarly, we can derive:
S ⇒ bSa ⇒ bw1 a
provided w1 ∈ L.
Case 3: w = w1 w2 If w starts and ends with the same letter (e.g., a...a or b...b), we use a
counting argument: define a running sum where we add +1 for a and −1 for b. Since na (w) =
nb (w), the total sum is 0, and the prefix-sum must cross 0 at some point before the end. This gives
us:
w = w1 w2 with na (w1 ) = nb (w1 ), na (w2 ) = nb (w2 )
Thus, we can derive:
S ⇒ SS ⇒ w1 S ⇒ w1 w2
Let P (n) be the statement: every string w ∈ L of length ≤ 2n can be derived from S.
Base Case: n = 0 The only string of length 0 is λ, which satisfies na = nb = 0, and S ⇒ λ.
Inductive Step: Assume P (n) holds. Let w ∈ L with |w| = 2n + 2.
• If w = aw1 b or w = bw1 a, then w1 ∈ L, |w1 | = 2n, so by the inductive hypothesis S ⇒∗ w1 .
Then:
S ⇒ aSb ⇒∗ aw1 b = w or S ⇒ bSa ⇒∗ bw1 a = w
• If w starts and ends with the same letter, use the prefix-sum method to split w = w1 w2 , where
both w1 , w2 ∈ L, and |w1 |, |w2 | ≤ 2n. By the inductive hypothesis:
S ⇒ SS ⇒∗ w1 S ⇒∗ w1 w2 = w
Thus, P (n + 1) holds, and by induction, every w ∈ L can be derived.
L(G) = {w ∈ {a, b}∗ : na (w) = nb (w)}
Grammar Equivalence
Normally, a given language has many grammars that generate it. Even though these grammars are
different, they are equivalent in some sense. We say that two grammars G1 and G2 are equivalent
if they generate the same language, that is, if
L(G1 ) = L(G2 ).
As we will see later, it is not always easy to determine whether two grammars are equivalent.
6
Example 3.4. Consider the grammar G1 = ({A, S}, {a, b}, S, P1 ), with P1 consisting of the pro-
ductions:
S → aAb | λ
A → aAb | λ
Here we introduce a convenient shorthand notation in which several production rules with the same
left-hand side are written on the same line, with alternative right-hand sides separated by |. For
example:
S → aAb | λ
is equivalent to the two separate productions:
S → aAb and S→λ
This grammar is equivalent to the grammar in Example 2.1
The equivalence is shown by proving that:
L(G1 ) = {an bn : n ≥ 0}
Recall that the grammar in Example 2.1 is:
G = ({S}, {a, b}, S, P ), where
S → aSb and S→λ
which generates the language:
L = {an bn : n ≥ 0}
We now show that the grammar G1 also generates the same language.
L(G1 ) = {an bn : n ≥ 0}
(1) Every string derived from G1 is in L:
Each use of aAb in S and in A adds one a to the left and one b to the right. Eventually, A → λ,
so the number of a’s and b’s will always match. Thus:
∀w ∈ L(G1 ), w = an bn
(2) Every string in L can be derived by G1 :
We use induction on n:
Base Case: n = 0: S → λ
Inductive Step: Assume true for n = k. For n = k + 1, do:
S ⇒ aAb
and apply the inductive hypothesis inside A to get ak bk , so total string becomes:
aak bk b = ak+1 bk+1
Hence, G1 can derive all an bn , and:
L(G1 ) = {an bn : n ≥ 0} = L(G)
7
References
[1] Peter Linz, An Introduction to Formal Languages and Automata, 6th Edition, Jones & Bartlett
Learning, 2016.