Compiler Construction
CS-4207
Instructor Name: Atif Ishaq
Lecture 6
Today’s Lecture
Recognition of Token
Regular Expression and FSM
Transition Diagram Construction
2
Recognition of Token : Transition Diagram
A language defined by a grammar is a (possibly infinite) set of strings
An automation is a device that determines, by reading a string (word) one
character at a time, whether the string belongs to a special language
A finite state automata (FSA, NFA) is an automaton that recognizes regular
languages (regular expressions)
Simplest automaton : memory is an element of a finite set
3
Recognition of Token : Transition Diagram
Graphically a Finite State Automata are represented by
A set of labeled states, represented as nodes in a digraph
Directed edges labelled with a character are drawn between states
One or more states designated as terminal (accepting)
One or more state designated as initial
On reading character a ∈ ∑ , automaton may move from state S1 to state S2 if
there exists an a-labled edge connecting S1 to S2.
A string belongs to the language if, while reading the string, the automaton
may move from an initial state to an accepting state.
4
Recognition of Token : Transition Diagram
Following diagram is an NFA which recognizes the language of all string over ∑ :
{a , b} which have an even number of a’s and b’s
For even a’s and b’s
5
Recognition of Token : Transition Diagram
6
Recognition of Token : Transition Diagram
7
Recognition of Token : Transition Diagram
8
Recognition of Token : Transition Diagram
9
Recognition of Tokens : Transition Diagram
relop < | > | <= | >= | <> | =
id letter (letter | digit )*
10
Recognition of Tokens : Transition Diagram
A transition diagram for unsigned digits
A transition diagram for white spaces
11
What else a Lexical Analyze Do?
All keyword / reserve word are matched as ids
After the match, symbol table or special keyword table is consulted
Keywords table contains string version of all keywords along with the
associated token value
When a match is found the token is returned along with its symbolic
value, i.e, “then”,16
If match is not found then it is assumed that an id has been discovered
if 15
then 16
begin 17
... ...
12
Transition Diagram : Code
token nexttoken()
{ while (1) {
switch (state) {
case 0: c = nextchar();
if (c==blank || c==tab || c==newline) { Decides the
state = 0;
lexeme_beginning++; next start state
}
else if (c==‘<’) state = 1; to check
else if (c==‘=’) state = 5;
else if (c==‘>’) state = 6;
else state = fail();
break; int fail()
case 1: { forward = token_beginning;
… swith (start) {
case 9: c = nextchar(); case 0: start = 9; break;
if (isletter(c)) state = 10; case 9: start = 12; break;
else state = fail(); case 12: start = 20; break;
break; case 20: start = 25; break;
case 10: c = nextchar(); case 25: recover(); break;
if (isletter(c)) state = 10; default: /* error */
else if (isdigit(c)) state = 10; }
else state = 11; return start;
break; }
13
…
Transition Diagram : Code
14
Lecture Outcome
Significance of context free grammar in compiler construction
How to resolve associativity and precedence issues in arithmetic
expressions
Focusing on unambiguous grammar for parsing
15
Lecture Outcome
Token Recognition
Transition Diagram Construction
16
17