15-453
FORMAL LANGUAGES,
AUTOMATA AND
COMPUTABILITY
THE PUMPING LEMMA FOR
REGULAR LANGUAGES
and
REGULAR EXPRESSIONS
TUESDAY Jan 21
WHICH OF THESE ARE REGULAR ?
B = {0n1n | n ≥ 0}
C = { w | w has equal number of
occurrences of 01 and 10 }
D = { w | w has equal number of 1s and 0s}
THE PUMPING LEMMA
Let L be a regular language with |L| = ∞
Then there is a positive integer P s.t.
if w ∈ L and |w| ≥ P
then can write w = xyz, where:
1. |y| > 0 (y isn’t ε)
2. |xy| ≤ P
3. For every i ≥ 0, xyiz ∈ L
Why is it called the pumping lemma? The word w
gets PUMPED into something longer…
Proof: Let M be a DFA that recognizes L
Let P be the number of states in M
Assume w ∈ L is such that |w| ≥ P
We show: w = xyz 1. |y| > 0
2. |xy| ≤ P
3. xyiz ∈ L for all i ≥ 0
…
r0 rj rk r|w|
There must be j and k such that
j < k ≤ P, and rj = rk (why?) (Note: k - j > 0)
Proof: Let M be a DFA that recognizes L
Let P be the number of states in M
Assume w ∈ L is such that |w| ≥ P
We show: w = xyz 1. |y| > 0
2. |xy| ≤ P
y
3. xyiz ∈ L for all i ≥ 0
…
r0 rj= rk r|w|
There must be j and k such that
j < k ≤ P, and rj = rk
Proof: Let M be a DFA that recognizes L
Let P be the number of states in M
Assume w ∈ L is such that |w| ≥ P
We show: w = xyz 1. |y| > 0
2. |xy| ≤ P
y
3. xyiz ∈ L for all i ≥ 0
x
z
…
r0 rj= rk r|w|
There must be j and k such that
j < k ≤ P, and rj = rk
USING THE PUMPING LEMMA
Let’s prove that
B = {0n1n | n ≥ 0} is not regular
Assume B is regular. Let w = 0P1P
If B is regular, can write w = xyz, |y| > 0,
|xy| ≤ P, and for any i ≥ 0, xyiz is also in B
y must be all 0s: Why? |xy| ≤ P
xyyz has more 0s than 1s
Contradiction!
USING THE PUMPING LEMMA
D = { w | w has equal
number of 1s and 0s}
is not regular
Assume D is regular. Let w = 0P1P (w is in D!)
If D is regular, can write w = xyz, |y| > 0,
|xy| ≤ P, where for any i ≥ 0, xyiz is also in D
y must be all 0s: Why? |xy| ≤ P
xyyz has more 0s than 1s
Contradiction!
WHAT DOES D LOOK LIKE?
D = { w | w has equal number of
occurrences of 01 and 10}
WHAT DOES C LOOK LIKE?
C = { w | w has equal number of
occurrences of 01 and 10}
= { w | w = 1, w = 0, w = ε or
w starts with a 0 and ends with a 0 or
w starts with a 1 and ends with a 1 }
1 ∪ 0 ∪ ε ∪ 0(0∪1)*0 ∪ 1(0∪1)*1
REGULAR EXPRESSIONS
(expressions representing languages)
σ is a regexp representing {σ}
ε is a regexp representing {ε}
∅ is a regexp representing ∅
If R1 and R2 are regular expressions
representing L1 and L2 then:
(R1R2) represents L1 ⋅ L2
(R1 ∪ R2) represents L1 ∪ L2
(R1)* represents L1*
PRECEDENCE
* ⋅ ∪
EXAMPLE
R1*R2 ∪ R3 = ( ( R1* ) R2 ) ∪ R3
{ w | w has exactly a single 1 }
0*10*
What language does ∅* represent?
What language does ∅* represent?
{ε}
{ w | w has length ≥ 3 and its 3rd symbol is 0 }
{ w | w has length ≥ 3 and its 3rd symbol is 0 }
(0∪1)(0∪1)0(0∪1)*
{ w | every odd position of w is a 1 }
{ w | every odd position of w is a 1 }
(1(0 ∪ 1))*(1 ∪ ε)
EQUIVALENCE
L can be represented by a regexp
⇔ L is regular
1. L can be represented by a regexp
⇒ L is regular
2. L can be represented by a regexp
⇐
L is a regular language
1. Given regular expression R, we show there
exists NFA N such that R represents L(N)
Induction on the length of R:
Base Cases (R has length 1):
σ
R=σ
R=ε
R=∅
Inductive Step:
Assume R has length k > 1,
and that every regular expression of length < k
represents a regular language
Three possibilities for R:
R = R1 ∪ R2 (Union Theorem!)
R = R1 R2 (Concatenation)
R = (R1)* (Star)
Therefore: L can be represented by a regexp
⇒ L is regular
Give an NFA that accepts the
language represented by (1(0 ∪ 1))*
ε 1 1,0
ε
2. L can be represented by a regexp
⇐
⇒
L is a regular language
Proof idea: Transform an NFA for L into a
regular expression by removing states and
re-labeling arrows with regular expressions
ε
ε
ε
ε NFA
ε
Add unique and distinct start and accept states
While machine has more than 2 states:
Pick an internal state, rip it out and
re-label the arrows with regexps,
to account for the missing state
0 0
1
ε
ε
ε
ε NFA
ε
Add unique and distinct start and accept states
While machine has more than 2 states:
Pick an internal state, rip it out and
re-label the arrows with regexps,
to account for the missing state
01*0
ε
ε
ε
ε GNFA
ε
While machine has more than 2 states:
More generally:
R(q1,q3)
R(q1,q2) R(q2,q3)
q1 q2 q3
R(q2,q2)
ε
ε
ε
ε GNFA
ε
While machine has more than 2 states:
More generally:
R(q1,q2)R(q2,q2)*R(q2,q3)
q1 q3
∪ R(q1,q3)
a a,b
ε b ε
q0 q1 q2 q3
R(q0,q3) =
represents L(N)
a a,b
ε b ε
q0 q1 q2 q3
R(q0,q3) = (a*b)(a∪b)*
represents L(N)
Formally: Add qstart and qaccept to create G (GNFA)
Run CONVERT(G): (Outputs a regexp)
If #states = 2
return the expression on the arrow
going from qstart to qaccept
Formally: Add qstart and qaccept to create G (GNFA)
Run CONVERT(G): (Outputs a regexp)
If #states > 2
select qrip∈Q different from qstart and qaccept
define Q′ = Q – {qrip}
define R′ as:
} Defines: G′ (GNFA)
R′(qi,qj) = R(qi,qrip)R(qrip,qrip)*R(qrip,qj) ∪ R(qi,qj)
(R′ = the regexps for edges in G′)
We note that G and G′ are equivalent
return CONVERT(G′)
Claim: CONVERT(G) is equivalent to G
Proof by induction on k (number of states in G)
Base Case:
k=2
Inductive Step:
Assume claim is true for k-1 state GNFAs
Recall that G and G′ are equivalent
But, by the induction hypothesis, G′ is
equivalent to CONVERT(G′)
Thus: CONVERT(G′) equivalent to CONVERT(G)
QED
b
a
q1 q2
a
b
b a
q3
b
a
q1 q2
a
ε
b
ε
b a
q3 ε
b
bb
a
q1 q2
a
ε
b
ε
a
ε
bb ∪ (a ∪ ba)b*a
q1
ε
b ∪ (a ∪ ba)b*
(bb ∪ (a ∪ ba)b*a)* (b ∪ (a ∪ ba)b*)
Convert the NFA to a regular expression
a, b
q1 q2 b
b
a b
q3
DFA NFA
DEFINITION
Regular Regular
Language Expression
WWW.FLAC.WS
Finish Chapter 1 of the book for next time