FLAT
FLAT
Automata Theory
(ITPC-210)
Nisha Chaurasia
FORMAL LANGUAGES
• An abstraction of the notion of a “problem”.
• Problems are cast either as Languages (= sets of “Strings”)
”Solutions” determine if a given “string” is in the set or not
e.g., Is a given integer, n, prime?
• Memory is crucial:
• Finite Memory
• Infinite Memory
• Limited Access
• Unlimited Access
• We have different types of automata for different classes of languages.
• They differ in
• the amount of memory then have (finite vs infinite)
• what kind of access to the memory they allow.
• What does computational power depend on? (it turns out, not “speed”)
• What does it mean for a problem to be computable ?
• Are there any uncomputable functions or unsolvable problems?
• What does this mean?
• Why do we care?
FORMAL LANGUAGE
• A language can be seen as a system suitable for expression of certain ideas, facts and concepts.
• For formalizing the notion of a language one must cover all the varieties of languages such as
natural (human) languages and programming languages.
• One may broadly see that a language is a collection of sentences; a sentence is a sequence of
words; and a word is a combination of syllables.
• If one considers a language that has a script, then it can be observed that a word is a sequence
of symbols of its underlying alphabet.
• It is observed that a formal learning of a language has the following three steps.
• Learning its alphabet - the symbols that are used in the language.
• Its words - as various sequences of symbols of its alphabet.
• Formation of sentences - sequence of various words that follow certain rules of the language.
Strings
• We formally define an alphabet as a non-empty finite set.
• We normally use the symbols a, b, c, . . . with or without subscripts or 0, 1, 2, . . ., etc. for the elements
of an alphabet.
• A string over an alphabet Σ is a finite sequence of symbols of Σ. Although one writes a sequence as
(a1, a2, . . . , an), in the present context, we prefer to write it as a1a2 · · · an, i.e. by put together the
symbols in that order. Thus, a string is also known as a word or a sentence.
• Normally, we use lower case letters towards the end of English alphabet, namely z, y, x, w, etc., to
denote strings.
• Let Σ = {a, b} be an alphabet; then aa, ab, bba, baaba, . . . are some examples of strings over Σ.
• We use ε, to denote the empty string
• The set of all strings over an alphabet Σ is denoted by Σ∗. For example, if Σ = {0, 1}, then Σ∗ = {ε, 0,
1, 00, 01, 10, 11, 000, 001, . . .}.
• Although the set Σ∗ is infinite, it is a countable set. In fact, Σ∗ is countably infinite for any
alphabet Σ.
• One of the most fundamental operations used for string manipulation is concatenation.
• The binary operation concatenation on Σ∗ is associative. Also, since ε is the empty string, it
satisfies the property εx = xε = x, for any sting x ∈ Σ∗.
• For a string x and an integer n ≥ 0,
we write xn+1 = xn x with the base condition x0 = ε.
• Let x be a string over an alphabet Σ. For a ∈ Σ, the number of occurrences of a in x shall be
denoted by |x|a. The length of a string x denoted by |x| is defined as
|x| = ∑ ∑ |x|a.
Essentially, the length of a string is obtained by counting the number of symbols in the
string.
For example, |aab| = 3, |a| = 1. Note that |ε| = 0.
• If we denote An to be the set of all strings of length n over Σ, then one can easily ascertain
that
Σ∗ =∪ 𝐴
And hence, being An a finite set, Σ∗ is a countably infinite set.
• We say that x is a substring of y if x occurs in y, that is y = uxv for some strings u and v. The
substring x is said to be a prefix of y if u = ε. Similarly, x is a suffix of y if v = ε.
• Generalizing the notation used for number of occurrences of symbol a in a string x, we adopt
the notation |y|x as the number of occurrences of a string x in y.
Languages
• In order to define the notion of a language in a broad spectrum, it is felt that it can be any
collection of strings over an alphabet. Thus we define a language over an alphabet Σ as a
subset of Σ∗.
For example,
1. The empty set ∅ is a language over any alphabet. Similarly, {ε} is also a language over any
alphabet.
2. The set of all strings over {0, 1} that start with 0.
3. The set of all strings over {a, b, c} having ac as a substring.
• Note that ∅ ≠ {ε}, because the language ∅ does not contain any string but {ε} contains a string,
namely ε. Also it is evident that |∅| = 0; whereas, |{ε}| = 1.
• Since languages are sets, we can apply various well known set operations such as union,
intersection, complement, difference on languages.
• The notion of concatenation of strings can be extended to languages as follows.
The concatenation of a pair of languages L1, L2 is
L1L2 = {xy | x ∈ L1 ∧ y ∈ L2}.
For example,
1. If L1 = {0, 1, 01} and L2 = {1, 00}, then L1L2 = {01, 11, 011, 000, 100, 0100}.
2. For L1 = {b, ba, bab} and L2 = {ε, b, bb, abb}, we have
L1L2 = {b, ba, bb, bab, bbb, babb, baabb, babbb, bababb}.
• Since concatenation of strings is associative, so is the concatenation of languages. That is, for all
languages L1, L2 and L3,
(L1L2)L3 = L1(L2L3).
Hence, (L1L2)L3 may simply be written as L1L2L3.
• The number of strings in L1L2 is always less than or equal to the product of individual numbers,
i.e.
|L1L2| ≤ |L1||L2|.
• L1 ⊆ L1L2 if and only if ε ∈ L2.
• ε ∈ L1 if and only if L2 ⊆ L1L2.
We write Ln to denote the language which is obtained by concatenating n copies of L. More
formally,
L0 = {ε} and L n = L n−1 L, for n ≥ 1.
• In the context of formal languages, another important operation is Kleene star. Kleene star
or Kleene closure of a language L, denoted by L*, is defined as
L∗ =∪ 𝐿
For example,
1. Kleene star of the language {01} is {ε, 01, 0101, 010101, . . .} = {(01)n | n ≥ 0}.
2. If L = {0, 10}, then L ∗ = {ε, 0, 10, 00, 010, 100, 1010, 000, . . .}
Since an arbitrary string in Ln is of the form x1x2 · · · xn, for xi ∈ L and L∗ =∪ 𝐿 , one can
easily observe that
L∗ = {x1x2 · · · xn | n ≥ 0 and xi ∈ L, for 1 ≤ i ≤ n}
Thus, a typical string in L∗ is a concatenation of finitely many strings of L.
• Note that, the Kleene star of the language L = {0, 1} over the alphabet Σ = {0, 1} is
L∗ = L0 ∪ L1 ∪ L2 ∪ · · ·
= {ε} ∪ {0, 1} ∪ {00, 01, 10, 11} ∪ · · ·
= {ε, 0, 1, 00, 01, 10, 11, · · · }
= the set of all strings over Σ.
Thus, the earlier introduced notation Σ∗ is consistent with the notation of Kleene star by
considering Σ as a language over Σ.
• The positive closure of a language L is denoted by L+ is defined as L+ =∪ 𝐿 .
Thus, L∗ = L+ ∪ {ε}.
Consider the set of all strings over {0, 1} that start with 0. Note that each such string can be
seen as 0x for some x ∈ {0, 1}∗ . Thus the language can be represented by {0x | x ∈ {0, 1}∗}.
Few examples,
1. The set of all strings over {a, b, c} that have ac as substring can be written as
{xacy | x, y ∈ {a, b, c}∗}.
This can also be written as
{x ∈ {a, b, c}∗ | |x|ac ≥ 1},
stating that the set of all strings over {a, b, c} in which the number of occurrences of substring ac is at
least 1.
2. The set of all strings over some alphabet Σ with even number of a’s is
{x ∈ Σ∗ | |x|a = 2n, for some n ∈ N}.
Equivalently,
{x ∈ Σ∗ | |x|a ≡ 0 mod 2}.
3. The set of all strings over some alphabet Σ with equal number of a’s and b’s can be written as
{x ∈ Σ∗ | |x|a = |x|b}.
4. The set of all palindromes over an alphabet Σ can be written as
{x Σ | x = xR },
where xR is the string obtained by reversing x.
5. The set of all strings over some alphabet Σ that have an a in the 5th position from the right can be
written as
{xay | x, y ∈ Σ∗ and |y| = 4}.
6. The set of all strings over some alphabet Σ with no consecutive a’s can be written as
{x ∈ Σ∗ | |x|aa = 0}.
7. The set of all strings over {a, b} in which every occurrence of b is not before an occurrence of a
can be written as
{am bn | m, n ≥ 0}.
Note that, this is the set of all strings over {a, b} which do not contain ba as a substring.
Properties
The usual set theoretic properties with respect to union, intersection, complement, difference, etc. hold
even in the context of languages.
Certain properties of languages with respect to the newly introduced operations concatenation, Kleene
closure, and positive closure. In what follows, L, L1, L2, L3 and L4 are languages.
P1 Recall that concatenation of languages is associative.
P2 Since concatenation of strings is not commutative, we have L1L2 ≠ L2L1, in general.
P3 L{ε} = {ε}L = L.
P4 L∅ = ∅L = ∅.
P5 Distributive Properties:
1. (L1 ∪ L2)L3 = L1L3 ∪ L2L3.
2. L1(L2 ∪ L3) = L1L2 ∪ L1L3.
19
P6 If L1 ⊆ L2 and L3 ⊆ L4, then L1L3 ⊆ L2L4.
P7 ∅∗ = {ε}.
P8 {ε}∗ = {ε}.
P9 If ε ∈ L, then L∗ = L+.
P10 L∗ L = LL∗ = L+.
P11 (L∗ )∗ = L∗ .
P12 L∗ L∗ = L∗ .
P13 (L1L2) ∗L1 = L1(L2L1)∗ .
P14 (L1 ∪ L2) ∗ = (L1 ∗ L2 ∗ )∗ .
20
FINITE REPRESENTATION
Proficiency in a language does not expect one to know all the sentences of the language; rather with
some limited information one should be able to come up with all possible sentences of the language.
Even in case of programming languages, a compiler validates a program - a sentence in the
programming language - with a finite set of instructions incorporated in it.
Thus, we are interested in a finite representation of a language - that is, by giving a finite amount of
information, all the strings of a language shall be enumerated/validated.
Given an alphabet Σ, to start with, the languages with single string {x} and ∅ can have finite
representation, say x and ∅, respectively. In this way, finite languages can also be given a finite
representation; say, by enumerating all the strings.
For example, the infinite language {ε; ab; abab; ababab; …} can be considered as the Kleene star of
the language {ab}, that is {ab}*. Thus, using Kleene star operation we can have finite representation
for some infinite languages.
To construct {x}, for x ∈ Σ* , we can use the operation concatenation over the basis elements. For
example, if x = aba then choose {a} and {b}; and concatenate {a}{b}{a} to get {aba}. Any finite
language over Σ, say {x1, . . . , xn} can be obtained by considering the union {x1} ∪ · · · ∪ {xn}.
21
Regular Expressions
The class of languages obtained by applying union, concatenation, and Kleene star for finitely many times
on the basis elements. These languages are known as regular languages and the corresponding finite
representations are known as regular expressions.
Definition 1-Regular Expression
We define a regular expression over an alphabet Σ recursively as follows.
1. ∅, ε, and a, for each a ∈ Σ, are regular expressions representing the languages ∅, {ε}, and
{a}, respectively.
2. 2. If r and s are regular expressions representing the languages R and S, respectively, then
so are
a. (r + s) representing the language R ∪ S,
b. (rs) representing the language RS, and
c. (r ∗ ) representing the language R∗ .
In a regular expression we keep a minimum number of parenthesis which are required to avoid
ambiguity in the expression. For example, we may simply write r + st in case of (r + (st)).
Similarly, r + s + t for ((r + s) + t).
22
Definition 2-Regular Expression
If r is a regular expression, then the language represented by r is denoted by L(r). Further, a
language L is said to be regular if there is a regular expression r such that L = L(r).
Note that,
1. A regular language over an alphabet Σ is the one that can be obtained from the
emptyset, {ε}, and {a}, for a ∈ Σ, by finitely many applications of union, concatenation
and Kleene star.
2. The smallest class of languages over an alphabet Σ which contains ∅, {ε}, and {a} and is
closed with respect to union, concatenation, and Kleene star is the class of all regular
languages over Σ.
23
Few Examples of Regular Expressions
• Example 1: As we observed earlier that the languages ∅, {ε}, {a}, and all finite sets are regular.
• Example 2: {an | n ≥ 0} is regular as it can be represented by the expression a∗ . \
• Example 3: Σ∗ , the set of all strings over an alphabet Σ, is regular. For instance, if Σ = {a1, a2, . . . ,
an}, then Σ∗ can be represented as (a1 + a2 + · · · + an)∗ .
• Example 4: The set of all strings over {a, b} which contain ab as a substring is regular. For instance, the
set can be written as {x ∈ {a, b} ∗ | ab is a substring of x}
= {yabz | y, z ∈ {a, b} ∗ }
= {a, b} ∗ {ab}{a, b} ∗
Hence, the corresponding regular expression is (a + b) ∗ab(a + b) ∗ .
24
• Example 5: The language L over {0, 1} that contains 01 or 10 as substring is regular.
L = {x | 01 is a substring of x} ∪ {x | 10 is a substring of x}
= {y01z | y, z ∈ Σ ∗ } ∪ {u10v | u, v ∈ Σ ∗ }
= Σ∗ {01}Σ ∗ ∪ Σ ∗ {10}Σ∗
Since Σ∗ , {01}, and {10} are regular we have L to be regular. In fact, at this point, one can easily
notice that
(0 + 1)∗ 01(0 + 1)∗ + (0 + 1)∗ 10(0 + 1)∗
is a regular expression representing L.
If L is represented by a regular expression r, i.e. L(r) = L, then we may simply use r instead of L(r) to
indicated the language L. As a consequence, for two regular expressions r and r 0 , r ≈ r 0 and r = r 0
are equivalent.
25
26
GRAMMARS
• In the context of natural languages, the grammar of a language is a set of rules which are used to
construct/validate sentences of the language.
27
Consider the English sentence
The students study automata theory.
In order to observe that the sentence is grammatically correct, one may attribute certain rules of the
English grammar to the sentence and validate it. For instance, the Article the followed by the Noun students
form a Noun-phrase and similarly the Noun automata theory form a Noun-phrase. Further, study is a Verb.
Now, choose the Sentential form “Subject Verb Object” of the English grammar. As Subject or Object can be
a Noun-phrase by plugging in the above words one may conclude that the given sentence is a
grammatically correct English sentence.
28
29
In this process, we observe that two types of words are in the discussion.
1. The words like the, study, students.
2. The words like Article, Noun, Verb.
The main difference is, if you arrive at a stage where type (1) words are appearing, then you need not
say anything more about them. In case you arrive at a stage where you find a word of type (2), then you
are assumed to say some more about the word. For example, if the word Article comes, then one should
say which article need to be chosen among a, an and the. Let us call the type (1) and type (2) words as
terminals and nonterminals, respectively, as per their features.
Thus, a grammar should include terminals and nonterminals along with a set of rules which attribute some
information regarding nonterminal symbols.
30
Components Of Grammar
31
Definition 2:
Let G = (N, Σ, P, S) be a grammar with V = N ∪ Σ.
1. We define a binary relation ⇒ on V* by
G
α ⇒ β if and only if α = α1Aα2, β = α1γα2 and A → γ ∈ P, for all α, β ∈ V* .
G
2. The relation ⇒ is called one step relation on G. If α ⇒ β, then we call α yields β in one step in G.
G G ∗
∗ ⇒ =
3. The reflexive-transitive closure of ⇒ is denoted by ⇒ . That is, for α, β ∈ V* , G
G G
∗ ∃n ≥ 0 and α0 , α 1, . . . , α n ∈ V∗ such that
α ⇒ β if and only if α =α ⇒ α ⇒ · · · ⇒ α ⇒ α = β.
G 0
G 1G G n−1 G n
∗ ∗
4. For α, β ∈ V* , if α ⇒ β, then we say β is derived from α or α derives β. Further, α ⇒ β is called as a derivation in G.
G G
5. If α =α0 ⇒ α1 ⇒ · · · ⇒ αn−1 ⇒ αn = β is a derivation, then the length of the derivation is n and it may be written as α ⇒ 𝑛 β.
G G G G G
32
6. In a given context, if we deal with only one grammar G, then we may simply write ⇒, in stead of ⇒.
G
∗
7. If α ⇒ β is a derivation, then we say β is the yield of the derivation.
∗
8. A string α ∈ V∗ is said to be a sentential form in G, if α can be derived from the start symbol S of G. That is, S ⇒
α.
9. In particular, if α ∈ Σ∗ , then the sentential form α is known as a sentence. In which case, we say α is generated by
G.
10. The language generated by G, denoted by L(G), is the set of all sentences generated by G. That is,
∗
L(G) = {x ∈ Σ* | S ⇒x}.
33
Types of Grammars
34
CHOMSKY HIERARCHY
35
36
Type-3 grammars (restricted grammars)
Type-3 grammars generate regular languages. Type-3 grammars must have a single non-terminal on the left-
hand side and a right-hand side consisting of a single terminal or single terminal followed by a single non-
terminal.
The productions must be in the form X → a or X → aY
where X, Y ∈ N (Non terminal)
and a ∈ T (Terminal)
The rule S → ε is allowed if S does not appear on the right side of any rule.
Example: X → ε
X → a | aY
Y→b
https://www.tutorialspoint.com/automata_theory/chomsky_classification_of_grammars.htm
37
Type-2 grammars (context-free grammars)
Type-2 grammars generate context-free languages.
The productions must be in the form A → γ
where A ∈ N (Non terminal)
and γ ∈ (T ∪ N)* (String of terminals and non-terminals).
These languages generated by these grammars are be recognized by a non-deterministic pushdown automaton.
Example, S → X a
X→a
X → aX
X → abc
X→ε
https://www.tutorialspoint.com/automata_theory/chomsky_classification_of_grammars.htm
38
Type-1 grammars (context-sensitive grammars)
Type-1 grammars generate context-sensitive languages.
The productions must be in the form α A β → α γ β
where A ∈ N (Non-terminal)
and α, β, γ ∈ (T ∪ N)* (Strings of terminals and non-terminals)
The strings α and β may be empty, but γ must be non-empty.
The rule S → ε is allowed if S does not appear on the right side of any rule. The languages generated by these
grammars are recognized by a linear bounded automaton.
Example, AB → AbBc
A → bcA
B→b
https://www.tutorialspoint.com/automata_theory/chomsky_classification_of_grammars.htm
39
Type-0 grammars (unrestricted grammars)
Type-0 grammars generate recursively enumerable languages. The productions have no restrictions. They
are any phase structure grammar including all formal grammars.
They generate the languages that are recognized by a Turing machine.
The productions can be in the form of α → β where α is a string of terminals and nonterminals with at
least one non-terminal and α cannot be null. β is a string of terminals and non-terminals.
Example, S → ACaB
Bc → acB
CB → DB
aD → Db
https://www.tutorialspoint.com/automata_theory/chomsky_classification_of_grammars.htm
40
41
FINITE AUTOMATA
• Regular grammars, as language generating devices, are intended to generate regular languages - the class of
languages that are represented by regular expressions. Finite automata, as language accepting devices, are
important tools to understand the regular languages better.
Example, Let us consider the regular language - the set of all strings over {a, b} having odd number of a’s.
digraph representation of the grammar is given as:
• In a digraph that models a system which understands a language, nodes holds some information about the traversal.
As each node is holding some information it can be considered as a state of the system and hence a state can be
considered as a memory creating unit.
• As we are interested in the languages having finite representation, we restrict ourselves to those systems with finite
number of states only. In such a system we have transitions between the states on symbols of the alphabet. Thus, we
may call them as finite state transition systems. As the transitions are predefined in a finite state transition system, it
automatically changes states based on the symbols given as input. Thus a finite state transition system can also be
called as a finite state automaton or simply a finite automaton - a device that works automatically. The plural form
of automaton is automata.
42
DEFINITION
• 5 Tuple: (Q,Σ,δ,q0,F)
• Q: Finite set of states
• Σ: Finite set of alphabets
• δ: Transition function
• Qχ Σ → Q
43
SOME EXAMPLES
0
1 1
0
What does this FSM do?
It accepts the empty string or any string that ends with 0
These set of strings which takes the FSM to its accepting states
are often called language of the automaton.
ANOTHER EXAMPLE
0 0 1 1
1 0 0 1
• Accepts strings that starts and ends with the same bits.
DESIGNING FSMS
• Its an art.
• Pretend to be an FSM and imagine the strings are coming one by one.
• Remember that there are finite states.
• So, you cannot store the entire string, but only crucial information.
• Also, you do not know when the string ends, so you should always be ready with an answer.
EXAMPLE
• Design a FSM which accepts 0,1 strings which has an odd number of 1’s.
• You require to remember whether there are odd 1’s so far or even 1’s so far.
0
1 0
even odd
1
EXAMPLE
• Design a FSM that accepts strings that contain 001 as substrings.
• There are 4 possibilities
• No string
• seen a 0
• seen a 00
• seen a 001
FINITE AUTOMATA
NFA with ϵ
49
MEALY MOORE MACHINE
FINITE AUTOMATA WITH OUTPUT
λ 0 Q=Δ λ0 Q* Ʃ =Δ
Mealy and Moore machine consist of SIX TUPLE:
(Q, Ʃ, δ, qo, Δ, λ)
Q= Set of State
Ʃ= Set of Alphabet
δ= Transition
q0= Initial State
Δ= Set of Output alphabet (Output Symbol)
λ= Output Mapping Function
50
51
52
0 0/0
1 B
B/0 D/1
0/0
0
0
A/0 1 0 A 0/1 1/1
1
1 1/0
C/0 E/1 C
0
1 1/0
53
54
MOORE MACHINE TO MEALY MACHINE
55
56
MEALY MACHINE TO MOORE MACHINE
57
58
Description of a DFA
• As shown in the figure, there are mainly three components namely input tape, reading head, and finite control. It is
assumed that a DFA has a left-justified infinite tape to accommodate an input of any length. The input tape is
divided into cells such that each cell accommodate a single input symbol. The reading head is connected to the input
tape from finite control, which can read one symbol at a time. The finite control has the states and the information of
the transition function along with a pointer that points to exactly one state.
59
• At a given point of time, the DFA will be in some internal state, say p, called the current state, pointed by the pointer
and the reading head will be reading a symbol, say a, from the input tape called the current symbol. If δ(p, a) = q,
then at the next point of time the DFA will change its internal state from p to q (now the pointer will point to q) and
the reading head will move one cell to the right.
• Initializing a DFA with an input string x ∈ Σ∗ means that x be placed on the input tape from the left most (first) cell of
the tape with the reading head placed on the first cell and by setting the initial state as the current state. By the time
the input is exhausted, if the current state of the DFA is a final state, then the input x is accepted by the DFA.
Otherwise, x is rejected by the DFA.
60
Deterministic Finite Automata
• Deterministic finite automaton is a type of finite automaton in which the transitions are deterministic, in the sense that there will be
exactly one transition from a state on an input symbol. Formally,
a deterministic finite automaton (DFA) is a quintuple A = (Q, Σ, δ, q0, F),
where
Q is a finite set called the set of states,
Σ is a finite set called the input alphabet,
q0 ∈ Q, called the initial/start state, F ⊆ Q, called the set of final/accept states, and
δ : Q × Σ → Q is a function called the transition function or next-state function.
Note that, for every state and an input symbol, the transition function δ assigns a unique next state.
Example, Let Q = {p, q, r}, Σ = {a, b}, F = {r} and δ is given by the following table:
61
Transition Table
• Instead of explicitly giving all the components of the quintuple of a DFA, we may simply point out the initial state
and the final states of the DFA in the table of transition function, called transition table. For instance, we use an
arrow to point the initial state and we encircle all the final states. Thus, we can have an alternative representation of
a DFA, as all the components of the DFA now can be interpreted from this representation.
62
Transition Diagram
• Normally, we associate some graphical representation to understand abstract concepts better. In the
present context also we have a digraph representation for a DFA, (Q, Σ, δ, q0, F), called a state
transition diagram or simply a transition diagram which can be constructed as follows:
1. Every state in Q is represented by a node.
2. If δ(p, a) = q, then there is an arc from p to q labeled a.
3. If there are multiple arcs from labeled a1, . . . ak−1, and ak, one state to another state, then we simply put only
one arc labeled a1, . . . , ak−1, ak.
4. There is an arrow with no source into the initial state q0.
5. Final states are indicated by double circle.
The transition diagram for the DFA given as:
Note that there are two transitions from the state r to itself on symbols a and b. As indicated in the point 3
above, these are indicated by a single arc from r to r labeled a, b.
63
Extended Transition Function
• Note that the transition function δ assigns a state for each state and an input symbol. This naturally can be extended
to all strings in Σ∗ , i.e. assigning a state for each state and an input string.
The extended transition function δ: Q × Σ∗ → Q is defined recursively as follows: For all q ∈ Q, x ∈ Σ∗ and a ∈ Σ,
δ(q, ε) = q and δ(q, xa) = δ(δ(q, x), a).
For example, in the DFA given, δ(p, aba) is q because
δ(p, aba) = δ(δ(p, ab), a)
= δ(δ(δ(p, a), b), a)
= δ(δ(δ(δ(p, ε), a), b), a)
= δ(δ(δ(p, a), b), a)
= δ(δ(q, b), a)
= δ(p, a) = q
Given p ∈ Q and x = a1a2 · · · ak ∈ Σ ∗ , δ(p, x) can be evaluated easily using the transition diagram by identifying the
state that can be reached by traversing from p via the sequence of arcs labeled a1, a2, . . . , ak.
For instance, the above case can easily be seen by the traversing through the path labeled aba from p to reach to q is
shown as:
64
• A configuration or an instantaneous description of a DFA gives the information about the current state and the
portion of the input string that is right from and on to the reading head, i.e. the portion yet to be read. Formally, a
configuration is an element of Q × Σ∗ .
• Observe that for a given input string x the initial configuration is (q0, x) and a final configuration of a DFA is of the
form (p, ε). The notion of computation in a DFA A can be described through configurations.
Definition: Let C = (p, x) and C’ = (q, y) be two configurations. If δ(p, a) = q and x = ay, then we say that the DFA A
moves from C to C’ in one step and is denoted as C ⊢ C’.
Here, ⊢ is a binary relation on the set of configurations of A .
Definition: A the computation of A on the input x is of the form C ⊢∗ C’ where C = (q0, x) and C’ = (p, ε), for some p.
Remark: Given a DFA A = (Q, Σ, δ, q0, F), x ∈ L(A ) if and only if (q0, x) ⊢∗ (p, ε) for some p ∈ F.
65
Language of a DFA
• A string x ∈ Σ∗ is said to be accepted by a DFA A = (Q, Σ, δ, q0, F) if δ(q0, x) ∈ F. That is, when you apply the string
x in the initial state the DFA will reach to a final state.
• The set of all strings accepted by the DFA A is said to be the language accepted by A and is denoted by L(A). That
is, L(A ) = {x ∈ Σ∗ | δ(q0, x) ∈ F}
Example, Consider the following DFA
The only way to reach from the initial state q0 to the final state q2 is through the string ab and it is through abb to reach
another final state q3. Thus, the language accepted by the DFA is {ab, abb}
66
NONDETERMINISTIC FINITE AUTOMATA
• In contrast to a DFA, where we have a unique next state for a transition from a state on an input symbol, now we
consider a finite automaton with nondeterministic transitions. A transition is nondeterministic if there are several
(possibly zero) next states from a state on an input symbol or without any input. A transition without input is called as
ε-transition. A nondeterministic finite automaton is defined in the similar lines of a DFA in which transitions may be
nondeterministic.
• Formally, a nondeterministic finite automaton (NFA) is a quintuple N = (Q, Σ, δ, q0, F), where Q, Σ, q0 and F are as
in a DFA; whereas, the transition function δ is as below:
δ : Q × (Σ ∪ {ε}) → ℘(Q) or δ : Q × Σ → 2
is a function so that, for a given state and an input symbol (possibly ε), δ assigns a set of next states,
possibly empty set.
Remark: Clearly, every DFA can be treated as an NFA.
67
• Example, Let Q = {q0, q1, q2, q3, q4}, Σ = {a, b}, F = {q1, q3} and δ be given by the following transition table.
The quintuple N = (Q, Σ, δ, q0, F) is an NFA. In the similar lines of a DFA, an NFA can be represented by a state
transition diagram. For instance, the present NFA can be represented as follows:
68
Note the following few nondeterministic transitions in this NFA.
1. There is no transition from q0 on input symbol b.
2. There are multiple (two) transitions from q2 on input symbol a.
3. There is a transition from q0 to q4 without any input, i.e. ε-transition.
Consider the traces for the string ab from the state q0. Clearly, the following four are the possible traces.
Note that three distinct states, viz. q1, q2 and q3 are reachable from q0 via the string ab. That means, while tracing a
path from q0 for ab we consider possible insertion of ε in ab, wherever ε-transitions are defined. For example, in trace
(ii) we have included an ε-transition from q0 to q4, considering ab as εab, as it is defined. Whereas, in trace (iii) we
consider ab as aεb. It is clear that, if we process the input string ab at the state q0, then the set of next states is {q1,
q2, q3}.
69
Definition. Let N = (Q, Σ, δ, q0, F) be an NFA. Given an input string x = a1a2 · · · ak and a state p of N , the set of
next states δ (p, x) can be easily computed using a tree structure, called a computation tree of δ (p, x), which is
defined in the following way:
1. p is the root node
2. Children of the root are precisely those nodes which are having transitions from p via ε or a1.
3. For any node, whose branch (from the root) is labeled a1a2 · · · ai (as a resultant string by possible insertions of ε), its children are
precisely those nodes having transitions via ε or ai+1.
4. If there is a final state whose branch from the root is labeled x (as a resultant string), then mark the node by a tick mark √.
5. If the label of the branch of a leaf node is not the full string x, i.e. some proper prefix of x, then it is marked by a cross X –
indicating that the branch has reached to a dead-end before completely processing the string x.
70
The computation tree of δ(q0, abb) in the NFA is given as
Notice that the branch q0 − q4 − q4 − q3 has the label ab, as a resultant string, and as there are no further transitions
at q3, the branch has got terminated without completely processing the string abb. Thus, it is indicated by marking a
cross X at the end of the branch
71
ε-NFA/NFA’S WITH ε-TRANSITIONS
We extend the class of NFAs by allowing instantaneous (ε) transitions:
1. The automaton may be allowed to change its state without reading the input symbol.
2. In diagrams, such transitions are depicted by labeling the appropriate arcs with ε.
3. Note that this does not mean that ε has become an input symbol. On the contrary, we assume that the symbol ε
does not belong to any alphabet.
72
Definition. An ε-closure of a state p, denoted by E(p) is defined as the set of all states that are reachable from p via
zero or more ε-transitions.
As the automaton given in example is nondeterministic, if a string is processed at a state, then there may be multiple
next states (unlike DFA), possibly empty. For example, if we apply the string bba at the state q0, then the only possible
way to process the first b is going via ε-transition from q0 to q4 and then from q4 to q3 via b. As there are no transitions
from q3, the string cannot be processed further. Hence, the set of next states for the string bba at q0 is empty. Thus,
given a string x = a1a2 · · · an and a state p, by treating x as εa1εa2ε · · · εanε and by looking at the possible complete
branches starting at p, we find the set of next states for p via x. To introduce the notion of δ in an NFA, we introduce
the notion called ε-closure of a state.
73
The ε-closure of the state q, denoted E(q) or ECLOSE(q), is the set that contains q, together with all states that can
be reached starting at q by following only ε-transitions.
74
Definition. A string x ∈ Σ∗ is said to be accepted by an NFA N = (Q, Σ, δ, q0, F), if
δ(q0, x) ∩ F ≠ ∅.
That is, in the computation tree of (q0, x) there should be final state among the nodes marked with √. Thus, the language accepted by N
is
L(N ) = {x ∈ Σ∗ | δ(q0, x) ∩ F ≠ ∅}
Example, Consider the following NFA
1. From the initial state q0 one can reach back to q0 via strings from a∗ or from aεb∗ b, i.e. ab+, or via a string which is a mixture of
strings from the above two sets. That is, the strings of (a + ab+)∗ will lead us from q0 to q0.
2. Also, note that the strings of ab∗ will lead us from the initial state q0 to the final state q2.
3. Thus, any string accepted by the NFA can be of the form – a string from the set (a + ab+)∗ followed by a string from the set ab∗ .
Hence, the language accepted by the NFA can be represented by
(a + ab+)∗ ab∗
75
ELIMINATION OF ε-TRANSITIONS
Given an ε-NFA N, this construction produces an NFA N' such that L(N')=L(N).
76
77
78
79
Equivalence of NFA and DFA
• Two finite automata A and A’ are said to be equivalent if they accept the same language, i.e. L(A ) = L(A’). In the
present context, although NFA appears more general with a lot of flexibility, we prove that NFA and DFA accept the
same class of languages.
• Since every DFA can be treated as an NFA, one side is obvious. The converse, given an NFA there exists an
equivalent DFA, is being proved through the following two lemmas:
Lemma 1. Given an NFA in which there are some ε-transitions, there exists an equivalent NFA without ε-transitions.
Lemma 2. For every NFA N’ without ε-transitions, there exists a DFA A such that L(N’) = L(A).
80
Heuristics to Convert NFA to DFA
• It defines certain heuristics to convert an NFA without ε-transitions to its equivalent DFA.
• Nondeterminism in a finite automaton without ε-transitions is clearly because of multiple transitions at a state for an
input symbol. That is, for a state q and an input symbol a, if δ(q, a) = P with |P| = 0 or |P| > 1, then such a
situation need to be handled to avoid nondeterminism at q.
• To do so, techniques with examples are:
Case 1: |P| = 0.
In this case, there is no transition from q on a. We create a new (trap) state t and give transition from q to t via a. For
all input symbols, the transitions from t will be assigned to itself. For all such nondeterministic transitions in the finite
automaton, creating only one new trap state will be sufficient.
Case 2: |P| > 1. Let P = {p1, p2, . . . , pk}.
If q /∈ P, then choose a new state p and assign a transition from q to p via a. Now all the transitions from p1, p2, . . . ,
pk are assigned to p. This avoids nondeterminism at q; but, there may be an occurrence of nondeterminism on the new
state p. One may successively apply this heuristic to avoid nondeterminism further.
81
Heuristics to Convert NFA to DFA
If q ∈ P, then the heuristic mentioned above can be applied for all the transitions except for the one reaching to q, i.e.
first remove q from P and work as mentioned above. When there is no other loop at q, the resultant scenario with the
transition from q to q is depicted as under:
82
• Example, for the language over {a, b} which contains all those strings starting with ba, it is easy to construct an NFA
as given below.
• Clearly, the language accepted by the above NFA is {bax | x ∈ a, b∗ }. Now, by applying the heuristics one may
propose the following DFA for the same language.
83
84
85