Lecture # 1
CS-312
Theory of Automata and Formal
Languages
Books
1. Introduction to Computer Theory, by Daniel I. Cohen,
Latest Edition
2. An Introduction to Formal Languages and Automata,
Peter Linz, 6th edition,
3. Reference Book: Introduction to Languages and
Theory of Computation, by J. C. Martin, McGraw
Hill, Latest Edition
Agenda 1
Objective of Course
Introduction
Theory of Automata
Evolution of languages
Type of Languages
Difference between Formal and Natural languages
Alphabets, Strings, Words
Valid and invalid alphabet
Length of a String, Reverse of a String
Objective of Course
This course is about the fundamental capabilities and
limitations of computers.
This theory is very much relevant to practice, for example,
in the design of new programming languages, compilers,
string searching, etc., etc.
This course helps you to learn problem solving skills.
Every time we introduce a new machine, we will learn
its language; and every time we develop a new language,
we will try to find a machine that corresponds to it.
Introduction
Psychologists, mathematicians, engineers and some of
the first computer scientists shared a common
interest:
To model the human thought process.
Whether in the brain or in a computer.
Warren McCulloch and Walter Pitts, two
neurophysiologists, were the first to present a
description of finite automata in 1943.
What does Theory of Automata means?
The word “Theory” means that this subject is more
mathematical subject and less practical.
It is not like other courses such as programing but
this subject is foundation of many other practical
subjects.
Automata is Greek letter.
It is plural of automaton, and it means “something
that works automatically”
It accepts input, produce output, many have some
temporary storage and can make decisions in
transforming the input into the output.
Theory of Automata
Theory of Automata is the study of:
• Abstract machines ('mathematical' machines or systems)
• and computational problems (languages) that can be solved using
these machines.
These abstract machines are called automata.
This subject play major role in:
• Theory of Computation
• Compiler Construction
• Formal Verification
• Defining Computer Languages
• Parsing
Introduction to Languages
Language: Collection of different types of inputs through which a
machine is operated is said to be language of machine
Basically, a language is comprises of three different types of entities:
letters, words, sentences
Letters are from a finite alphabet { a, b, c, . . . , z }
Words are made up of certain combinations of letters from the
alphabet.
Not all combinations of letters lead to a valid English word.
Sentences are made up of certain combinations of words.
Not all combinations of words lead to a valid English sentence.
So we see that some basic units are combined to make bigger units.
Introduction to Languages
There are two types of languages
• Informal Language (Semantic language)
• Formal Language (Syntactic language)
Informal/Semantic Language
• Concerned with the interpretation or meaning of a sentence
(what output to produce in context of machines)
• Affected by ambiguity the most.
Formal/Syntactic Language
Defines rules for combining the units to form valid sentences
(computer programs in context of machines)
Informal languages
Natural languages are generally defined informally
Human brain
Capable to understand incoherent even invalid sentences.
You mangoes like
Rectify grammatical errors etc.
Resolve ambiguity
Interpret according to context
Supporting aids such as Facial expressions and body language etc.
How to communicate with machines ?
Need a language: what sort
Machines don’t have human mind though may have its
partial limitation
Would fail on incorrect or ambiguous input
Thus need a precise, explicit and universal definition of
communication language
Formal languages
Rules defined explicitly and clearly
No ambiguities
Lets the machine
Interpret an input uniformly every time. i.e.
always produces same output for a particular
input
Explicitly reject invalid input
Formal Languages
Need uniformly understandable notation
Representations
Alphabet
={a,b,….z.} ={A,…Z}
Binary Numbers
∑ = {0,1}
Whole Numbers
∑ = {0,1,2,3,4,5,6,7,8,9}
Alphabets
Definition:
A finite non-empty set of symbols (letters), is called an
alphabet. It is denoted by Σ ( Greek letter sigma).
Example:
Σ={a,b}
Σ={0,1} //important as this is the language
//which the computer understands.
Σ={i,j,k}
Strings
Definition:
Concatenation of finite symbols from the alphabet is
called a string.
Example:
If Σ= {a,b} then
{a, abab, aaabb, ababababababababab,..}
NOTE:
EMPTY STRING or NULL STRING
Sometimes a string with no symbol at all is used, denoted
by (Small Greek letter Lambda) λ or (Capital Greek
letter Lambda) Λ, is called an empty string or null string.
The capital lambda will mostly be used to denote the
empty string, in further discussion.
Words
Definition:
Words are strings belonging to some language.
Example:
If Σ= {x} then a language L can be defined as
L={xn : n=1,2,3,…..} or L={x,xx,xxx,….}
Here x,xx,… are the words of L
NOTE:
Words are strings that belong to some specific language.
All words are strings, but not all strings are
words.
Alphabets Guideline
The following three important rules for defining
alphabets for a language:
• Should not contain empty symbol Λ
• Should be finite
• Should not be ambiguous
Valid/In-valid Alphabets
While defining an alphabet, an alphabet may contain
letters consisting of group of symbols for example
Σ1= {B, aB, bab}.
Now consider an alphabet
Σ2= {B, Ba, bab} and
a string BababB.
Valid/In-valid alphabets
String : BababB , Σ2= {B, Ba, bab, d}
This string can be factored in two different ways
(Ba), (bab), (B)
(B), (abab), (B)
Which shows that the second group cannot be identified
as a string, defined over
Σ = {a, b}.
This is due to ambiguity in the defined alphabet Σ2
Valid/In-valid alphabets
As when this string is scanned by the compiler
(Lexical Analyzer), first symbol B is identified as a
letter belonging to Σ,
While for the second letter the lexical analyzer
would not be able to identify,
So while defining an alphabet it should be kept in
mind that ambiguity should not be created.
Remarks:
While defining an alphabet of letters consisting of
more than one symbols, no letter should be started
with the letter of the same alphabet i.e. one letter
should not be the prefix of another. {B,Ba}
However, a letter may be ended in the letter of same
alphabet i.e. one letter may be the suffix of
another.{B,aB}
Conclusion
Σ1= {B, aB, bab, d}
Σ2= {B, Ba, bab, d}
Σ1 is a valid alphabet while Σ2 is an in-valid alphabet.
Ambiguity Example
Σ1= {A, aA, bab, d}.
Σ1= {A, Aa, bab, d}.
Σ1 is valid alphabet while Σ2 is an invalid alphabet.
Similarly,
Σ1= {a, ab, ac}.
Σ1= {a, ba, ca}.
In this case, Σ1 is invalid alphabet while Σ2 is a valid
alphabet.
Length of Strings
Definition:
The length of string s, denoted by |s|,
is the number of letters in the string.
Example:
Σ={a,b}
s=ababa
Tokenize s = (a), (b),(a),(b),(a)
|s|=5
Length of Strings
Example:
Σ= {B, aB, bab, d}
s=BaBbabBd
Tokenizing=(B), (aB), (bab), (B), (d)
|s|=5
One important point to note here is that aB has length 1
not 2, Similarly bab has length 1 not 3.
Example:
If s=xxxx then length of s is 4
Length(428) is 3
Length(Λ) is 0
Reverse of a String
Definition:
The reverse of a string s denoted by Rev(s) or sr, is
obtained by writing the letters of s in reverse order.
Example:
If s=abc is a string defined over Σ={a,b,c}
then Rev(s) or sr = cba
Reverse of a String
Example:
Σ= {B, aB, bab, d}
s=BaBbabBd
Tokenize=(B),(aB),(bab),(B),(d)
Rev(s)=dBbabaBB
Note:
Wrong Rev(s)= dBbabBaB
Agenda 2
Introduction to defining languages
Recursive definition of a language
Regular Expression
Kleene Closure
Union and Concatenation
Introduction to Defining Languages
In theory of Automata languages can be defined in
different ways , such as
Descriptive definition
Recursive definition
Using Regular Expressions(RE)
Using Finite Automaton(FA) etc.
Descriptive definition of language
The language is defined, describing the conditions
imposed on its words.
Example:
The language L of strings of odd length, defined over
Σ={a}, can be written as
L={a, aaa, aaaaa,…..}
Example:
The language L of strings that does not start with a,
defined over Σ={a,b,c}, can be written as
L={b, c, ba, bb, bc, ca, cb, cc, …}
Example of Descriptive Language
Example:
The language L of strings of length 2, defined over
Σ={0,1,2}, can be written as
L={00, 01, 02,10, 11,12,20,21,22}
Example:
The language L of strings ending in 0, defined over
Σ ={0,1}, can be written as
L={0,00,10,000,010,100,110,…}
Example of Descriptive Language
Example: The language EQUAL, of strings with number of
a’s equal to number of b’s, defined over Σ={a,b}, can be
written as
L= {Λ ,ab, aabb, abab,baba,abba,…}
Example: The language EVEN-EVEN, of strings with even
number of a’s and even number of b’s, defined over
Σ={a,b}, can be written as
L={Λ, aa, bb, aaaa,aabb,abab, abba, baab, baba, bbaa,
bbbb,…}
Example of Descriptive Language
Example: The language {anbn }, of strings defined over
Σ={a,b}, as
L={an bn : n=1,2,3,…}, can be written as
L={ab, aabb, aaabbb,aaaabbbb,…}
Example: The language {anbnan }, of strings defined over
Σ={a,b}, as
L={an bn an: n=1,2,3,…}, can be written as
L= {aba, aabbaa, aaabbbaaa,aaaabbbbaaaa,…}
Example of Descriptive Language
Example:
The language PRIME, of strings defined over
Σ={a}, as
p
L={a : p is prime}, can be written as
L={aa,aaa,aaaaa,aaaaaaa,aaaaaaaaaaa…}
An Important language
PALINDROME:
The language consisting of Λ and the strings s
defined over Σ such that Rev(s)=s.
It is to be denoted that the words of
PALINDROME are called palindromes.
Example:For Σ={a,b},
PALINDROME={Λ , a, b, aa, bb, aaa, aba, bab,
bbb, ...}
Kleene Star Closure
Given an alphabet Σ, then the Kleene Star Closure of
the alphabet Σ, denoted by Σ*, is the collection of all
strings defined over Σ, including Λ.
It is to be noted that Kleene Star Closure can be
defined over any set of strings.
38
Examples
If Σ = {x}
Then Σ* = {Λ, x, xx, xxx, xxxx, ….}
If Σ = {0,1}
Then Σ* = {Λ, 0, 1, 00, 01, 10, 11, ….}
If Σ = {a, b, c}
Then Σ* = {Λ, a, b, c, aa, ab, ac, ba, bb, bc, ca, cb,
cc, aaa, aab, …… }
39
Note
Languages generated by Kleene Star Closure of set of
strings, are infinite languages.
(By infinite language, it is supposed that the language
contains infinite many words, each of finite length).
40
PLUS Operation (+)
Plus Operation is same as Kleene Star Closure except that it
does not generate Λ (null string), automatically.
Example:
If Σ = {0,1}
Then Σ+ = {0, 1, 00, 01, 10, 11, ….}
If Σ = {a, b, c}
Then Σ+ = {a, b, c, aa, ab, ac, ba, bb, bc,….}
41
Remark
It is to be noted that Kleene Star can also be operated on any
string i.e. a* can be considered to be all possible strings
defined over {a}, which shows that a* generates
{Λ, a, aa, aaa, …}
It may also be noted that a+ can be considered to be all
possible non empty strings defined over {a}, which shows
that a+ generates
{a, aa, aaa, aaaa, …}
42