Regular Expressions
Regular expressions describe regular
languages
(a b c) *
Example:
describes the language
a, bc* , a, bc, aa, abc, bca,...
Courtesy Costas
Busch - RPI 1
Recursive Definition
Primitive regular expressions: , ,
Given regular expressions r1 and r2
r1 r2
r1 r2
Are regular expressions
r1 *
r1
Courtesy Costas
Busch - RPI 2
Examples
A regular expression: a b c * (c )
Not a regular expression: a b
Courtesy Costas
Busch - RPI 3
Languages of Regular Expressions
Lr : language of regular expression
r
Example
L(a b c) * , a, bc, aa, abc, bca,...
Courtesy Costas
Busch - RPI 4
Definition
For primitive regular expressions:
L
L
La a
Courtesy Costas
Busch - RPI 5
Definition (continued)
r1
For regular expressions r2
and
Lr1 r2 Lr1 Lr2
Lr1 r2 Lr1 Lr2
Lr1 * Lr1 *
Lr1 Lr1
Courtesy Costas
Busch - RPI 6
Example
Regular expression:a b a *
La b a * La b La *
La b La *
La Lb La *
a b a*
a, b , a, aa, aaa,...
a, aa, aaa,..., b, ba, baa,...
Courtesy Costas
Busch - RPI 7
Example
r
Regular expression a b * a bb
Lr a, bb, aa, abb, ba, bbb,...
Courtesy Costas
Busch - RPI 8
Example
r
Regular expression aa * bb * b
2n 2m
Lr {a b b : n, m 0}
Courtesy Costas
Busch - RPI 9
Example
r (0 1) * 00 (0 1) *
Regular expression
L(r ) = { all strings with at least two consecutive 0 }
Courtesy Costas
Busch - RPI 10
Example
r (1 01) * (0 )
Regular expression
L(r ) = { all strings without two consecutive 0 }
Courtesy Costas
Busch - RPI 11
Equivalent Regular Expressions
Definition:
Regular expressionsr1 r2
and
are equivalent Lif( r1 ) L( r2 )
Courtesy Costas
Busch - RPI 12
Example
L = { all strings without two consecutive 0 }
r1 (1 01) * (0 )
r2 (1* 011*) * (0 ) 1* (0 )
r1 and r2
L(r1 ) L(r2 ) L are equivalent
regular expr.
Courtesy Costas
Busch - RPI 13
Task
Q)
1) Let S={ab, bb} and T={ab, bb, bbbb} Show
that S* = T* [Hint S* T* and T* S*]
2) Let S={ab, bb} and T={ab, bb, bbb} Show
that S* ≠ T* But S* T*
14
3) Let S={a, bb, bab, abaab} be a set of strings.
Are abbabaabab and baabbbabbaabb in S *? Does
any word in S* have odd number of b’s?
Solution: since abbabaabab can be grouped as
(a)(bb)(abaab)ab , which shows that the last
member of the group does not belong to S, so
abbabaabab is not in S*, while baabbbabbaabb
can not be grouped as members of S, hence
baabbbabbaabb is not in S*. Since each string in S
has even number of b’s so there is no possiblity
of any string with odd number of b’s to be in S*.
15
Task
Q1)Is there any case when S+ contains Λ?
If yes then justify your answer.
16
Q2) Prove that for any set of strings S
i. (S+)*=(S*)*
Solution: In general Λ is not in S+ , while
Λ does belong to S*. Obviously Λ will
now be in (S+)*, while (S*)* and S*
generate the same set of strings.
Hence (S+)*=(S*)*.
17
Q2) continued…
ii) (S+)+=S+
Solution: since S+ generates all possible
strings that can be obtained by
concatenating the strings of S, so (S+)+
generates all possible strings that can
be obtained by concatenating the
strings of S+ , will not generate any
new string.
Hence (S+)+=S+
18
Q2) continued…
iii) Is (S*)+=(S+)*
Solution: since Λ belongs to S* ,so Λ
will belong to (S*)+ as member of
S* .Moreover Λ may not belong to S+,
in general, while Λ will automatically
belong to (S+)*.
Hence (S*)+=(S+)*
19
Regular Expression
As discussed earlier that a* generates
Λ, a, aa, aaa, …
and a+ generates a, aa, aaa, aaaa, …, so
the language L1 = {Λ, a, aa, aaa, …} and
L2 = {a, aa, aaa, aaaa, …} can simply
be expressed by a* and a+, respectively.
a* and a+ are called the regular expressions
(RE) for L1 and L2 respectively.
Note: a+, aa* and a*a generate L2.
20
Recursive definition of Regular
Expression(RE)
Step 1: Every letter of Σ including Λ is a regular
expression.
Step 2: If r1 and r2 are regular expressions then
1.(r1)
2.r1 r2
3.r1 + r2 and
4. r1*
are also regular expressions.
Step 3: Nothing else is a regular expression.
21
Defining Languages (continued)…
Method 3 (Regular Expressions)
Consider the language L={Λ, x, xx, xxx,…}
of strings, defined over Σ = {x}.
We can write this language as the Kleene
star closure of alphabet Σ or L=Σ*={x}*
this language can also be expressed by the
regular expression x*.
Similarly the language L={x, xx, xxx,…},
defined over Σ = {x}, can be expressed by
the regular expression x+.
22
Now consider another language L,
consisting of all possible strings, defined
over Σ = {a, b}. This language can
also be expressed by the regular
expression
(a + b)*.
Now consider another language L, of
strings having exactly double a, defined
over Σ = {a, b}, then it’s regular
expression may be
b*aab*
23
Now consider another language L, of
even length, defined over Σ = {a, b},
then it’s regular expression may be
((a+b)(a+b))*
Now consider another language L, of
odd length, defined over Σ = {a, b},
then it’s regular expression may be
(a+b)((a+b)(a+b))* or
((a+b)(a+b))*(a+b)
24
Remark
It may be noted that a language may be
expressed by more than one regular
expressions, while given a regular
expression there exist a unique language
generated by that regular expression.
25
Example:
Consider the language, defined over
Σ={a , b} of words having at least one a,
may be expressed by a regular
expression
(a+b)*a(a+b)*.
Consider the language, defined over
Σ = {a, b} of words having at least one a
and one b, may be expressed by a
regular expression
(a+b)*a(a+b)*b(a+b)*+ (a+b)*b(a+b)*a(a+b)*. 26
Consider the language, defined over
Σ={a, b}, of words starting with double
a and ending in double b then its
regular expression may be aa(a+b)*bb
Consider the language, defined over
Σ={a, b} of words starting with a and
ending in b OR starting with b and
ending in a, then its regular expression
may be a(a+b)*b+b(a+b)*a
27
TASK
Consider the language, defined over
Σ={a, b} of words beginning with a,
then its regular expression may be a(a+b)*
Consider the language, defined over
Σ={a, b} of words beginning and
ending in same letter, then its regular
expression may be (a+b)
+a(a+b)*a+b(a+b)*b
28
TASK
Consider the language, defined over
Σ={a, b} of words ending in b, then
its regular expression may be (a+b) *b.
Consider the language, defined over
Σ={a, b} of words not ending in a,
then its regular expression may be (a+b) *b
+ Λ. It is to be noted that this language
may also be expressed by ((a+b) *b)*.
29
More Practical RE Examples
30
Algebraic Properties
31
Using R.E for Tokenization
32
R.E. and Tokens – Our job
Assign a token type to a subset of input stream, based on
its matching with a RE
The fifth line recognizes comments or white
space but does not report back to the parser.
White space is discarded and the lexer resumed.
The comments for this lexer begin with two
dashes, contain only alphabetic characters, and
end with newline.
33
SummingUP Lecture 2
RE, Recursive definition of RE, defining
languages by RE, { x}*, { x}+, {a+b}*,
Language of strings having exactly one aa,
Language of strings of even length,
Language of strings of odd length, RE defines
unique language (as Remark), Language of
strings having at least one a, Language of
strings havgin at least one a and one b,
Language of strings starting with aa and
ending in bb, Language of strings starting
with and ending in different letters.
34