CH 3
CH 3
error error
Symbol Table
4
Attributes of Tokens
<id, “y”> <assign, > <num, 31> <‘+’, > <num, 28> <‘*’, > <id, “x”>
token
(lookahead)
tokenval Parser
(token attribute)
6
Examples of tokens
8
Lexical Errors
It is hard for a lexical analyzer to tell, without
the aid of other components, that there is a
source-code error. e.g.
fi (x == f(x)) …..
A lexical analyzer cannot tell whether fi is a
misspelling of the keyword if or an
undeclared function identifier.
Probably the parser in this case - handle an
error due to transposition of the letters.
The lexical analyzer can detect characters
9
where:
Each di is a new symbol, not in Σ and not the
same as any other of the d's, and
Specification of Patterns for 17
* + + * *
One or more instances: r = r | and r = r r =rr
Zero or one instance: r? = r
Character classes: [a-z] = a b c … z
[abc] = a b c
19
20
Grammars
Token ws is diff erent from the other tokens in
that, when we recognize it, we do not return it to
the parser, but rather restart the lexical analysis
from the character that follows the whitespace.
It is the following token that gets returned to
blank b
the parser.
tab ^T
newline ^M
delim blank | tab | newline
+
ws delim
22
Analyzer Do?
All Keywords / Reserved words are matched
After the match, the symbol table or a special
as ids
keyword table is consulted
Generators
Scanner generator tools that automatically
create lexical analyzers (scanners) from user-
provided rules defined using regular
expressions.
Lex is the original Unix tool, while Flex is its
faster, modern, and free open-source
counterpart, which generates C source code
for the scanner.
These scanners then read input streams,
28
Finite Automata
These are essentially graphs, like transition diagrams,
with a few diff erences:
They are recognizers.
Answer in yes or no.
Nondeterministic Finite automata (NFA)
Have no restrictions on the labels of their edges. A
symbol can label several edges out of the same
state, and the ϵ empty string, is a possible label.
Deterministic Finite automata (DFA)
Have, for each state, and for each symbol of its
input alphabet exactly one edge with that symbol
leaving that state.
38
Optional
regular
NFA DFA
expressions
Automata
40
Transition Graph
An NFA can be diagrammatically
represented by a labeled directed
graph called a transition graph
a
S = {0,1,2,3}
start 0 a 1 b 2 b 3 = {a,b}
s0 = 0
b
F = {3}
41
Transition Table
The Language Defined by an 42
NFA
43
Subset construction
DFA
From Regular Expression to 44
NFA (Thompson’s
Construction)
start
i f
a start a
i f
start
N(r1)
r1 r2 i f
N(r2)
start
r 1r 2 i N(r1) N(r2) f
r* start
i N(r) f
45
a { action1 } start
abb { action2 } 3 a 4 b 5 b 6
a b
a*b+ { action3 }
start
7 b 8
1 a 2
start
0 3 a 4 b 5 b 6
a b
7 b 8
46
start
0 3 a 4 b 5 b 6 action2
a b
7 b 8 action3
a a b a
none
0 2 7 8
action3
1 4
3 7 Must find the longest match:
7 Continue until no further moves are possib
When last state is accepting: execute actio
47
start
0 3 a 4 b 5 b 6 action2
a b
7 b 8 action3
a b b a
none
0 2 5 6
action2
1 4 8 8
action3
3 7
7 When two or more accepting states are reached, t
first action given in the Lex specification is execut
48
Example DFA
b
b
a
start 0 a 1 b 2 b 3
a a
50
1
2 a 3
start
0 1 6 7 a 8 b 9 b 10
4 b 5
b
Dstates
C
A = {0,1,2,4,7}
b a b
start
B = {1,2,3,4,6,7,8}
A a B b D b E C = {1,2,4,5,6,7}
a
a
a D = {1,2,4,5,6,7,9}
Subset Construction Example 55
2
1 a 2 a1
start
0 3 a 4 b 5 b 6 a2
a b
7 b 8 a3
b
Dstates
C a3
a A = {0,1,3,7}
b
b b B = {2,4,7}
start
A D C = {8}
a a D = {7}
B b E b F
E = {5,8}
a1 a3 a2 a3
56
C
b
b a b
start start
A a B b D b E AC a B b D b E
a a
a
a b a a
57
* a
3
alternation
| position
number
a b
1 2 (for leafs )
From Regular Expression to 60
Leaf true
{1, 2} | {1, 2}
2 {1, 2, 3}
3 {4}b b
4 {5} a
start a 1,2, b 1,2, b 1,2,
1,2,3
5 {6} 3,4 3,5 3,6
a
6 - a
66
Time-Space Tradeoffs
Space Time
Automato
(worst (worst
n
case) case)
NFA O( r ) O( r x )
|r|
DFA O(2 ) O( x )