Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
14 views37 pages

Lecture 04

The document discusses the implementation of lexical analysis in programming, focusing on the conversion of regular expressions to finite automata (both deterministic and non-deterministic). It emphasizes the importance of simplicity in software design, outlines the steps for creating lexical specifications, and addresses ambiguities and error handling in the process. Additionally, it covers the execution and implementation of finite automata, including the use of tables for efficient DFA execution.

Uploaded by

itsmeshinoo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views37 pages

Lecture 04

The document discusses the implementation of lexical analysis in programming, focusing on the conversion of regular expressions to finite automata (both deterministic and non-deterministic). It emphasizes the importance of simplicity in software design, outlines the steps for creating lexical specifications, and addresses ambiguities and error handling in the process. Additionally, it covers the execution and implementation of finite automata, including the use of tables for efficient DFA execution.

Uploaded by

itsmeshinoo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

Implementation of Lexical Analysis

CS143
Lecture 4

Instructor: Fredrik Kjolstad


Slide design by Prof. Alex Aiken, with modifications 1
Written Assignments

• WA1 assigned today

• Due in one week


– 11:59pm
– Electronic hand-in on Gradescope

2
Tips on Building Large Systems

• KISS (Keep It Simple, Stupid!)

• Don’t optimize prematurely

• Design systems that can be tested

• It is easier to modify a working system than to get


a system working

3
Value simplicity

“It's not easy to write good software. […] it has a lot to do with valuing
simplicity over complexity.”
- Barbara Liskov

“Debugging is twice as hard as writing the code in the first place.


Therefore, if you write the code as cleverly as possible, you are, by
definition, not smart enough to debug it.”
- Brian Kernighan

“There are two ways of constructing a software design: One way is to


make it so simple that there are obviously no deficiencies, and the other
way is to make it so complicated that there are no obvious deficiencies.
The first method is far more difficult.”
- Tony Hoare

“Simplicity does not precede complexity, but follows it.”


- Alan Perlis 4
Outline

• Specifying lexical structure using regular


expressions

• Finite automata
– Deterministic Finite Automata (DFAs)
– Non-deterministic Finite Automata (NFAs)

• Implementation of regular expressions

5
Convert Regular Expressions to Finite Automata

• High-level sketch
NFA

Regular
expressions DFA

Lexical Table-driven
Specification Implementation of DFA

Lexer → Regex → NFA → DFA → Tables 6


Notation

• There is variation in regular expression notation

• Union: A + B ≡A|B
• Option: A + ε ≡ A?
• Range: ‘a’+’b’+…+’z’ ≡ [a-z]
• Excluded range:
complement of [a-z] ≡ [^a-z]

Lexer → Regex → NFA → DFA → Tables 7


Regular Expressions in Lexical Specification

• Last lecture: a specification for the predicate


s ∈ L(R)

• But a yes/no answer is not enough!


• Instead: partition the input into tokens

• We will adapt regular expressions to this goal

Lexer → Regex → NFA → DFA → Tables 8


Lexical Specification → Regex in five steps

1. Write a regex for each token


• Number = digit +
• Keyword = ‘if’ + ‘else’ + …
• Identifier = letter (letter + digit)*
• OpenPar = ‘(‘
• …

Lexer → Regex → NFA → DFA → Tables 9


Lexical Specification → Regex in five steps

2. Construct R, matching all lexemes for all tokens

R = Keyword + Identifier + Number + …


= R 1 + R2 + …

(This step is done automatically by tools like flex)

Lexer → Regex → NFA → DFA → Tables 10


Lexical Specification → Regex in five steps

3. Let input be x1…xn


For 1 ≤ i ≤ n check
x1…xi ∈ L(R)
4. If success, then we know that
x1…xi ∈ L(Rj) for some j
5. Remove x1…xi from input and go to (3)

Lexer → Regex → NFA → DFA → Tables 11


Ambiguity 1

• There are ambiguities in the algorithm

• How much input is used? What if


• x1…xi ∈ L(R) and also
• x1…xK ∈ L(R)

• Rule: Pick longest possible string in L(R)


– Pick k if k > i
– The “maximal munch”

Lexer → Regex → NFA → DFA → Tables 12


Ambiguity 2

• Which token is used? What if


• x1…xi ∈ L(Rj) and also
• x1…xi ∈ L(Rk)

• Rule: use rule listed first


– Pick j if j < k
– E.g., treat “if” as a keyword, not an identifier

Lexer → Regex → NFA → DFA → Tables 13


Error Handling

• What if
No rule matches a prefix of input ?
• Problem: Can’t just get stuck …
• Solution:

– Write a rule matching all “bad” strings


– Put it last (lowest priority)

Lexer → Regex → NFA → DFA → Tables 14


Summary

• Regular expressions provide a concise notation for string


patterns

• Use in lexical analysis requires small extensions


– To resolve ambiguities
– To handle errors

• Good algorithms known


– Require only single pass over the input
– Few operations per character (table lookup)

Lexer → Regex → NFA → DFA → Tables 15


Finite Automata

• Regular expressions = specification


• Finite automata = implementation

• A finite automaton consists of


– An input alphabet Σ
– A set of states S
– A start state n
– A set of accepting states F ⊆ S
– A set of transitions state →input state

Lexer → Regex → NFA → DFA → Tables 16


Finite Automata

• Transition
s1 →a s2
• Is read
In state s1 on input “a” go to state s2

• If end of input and in accepting state => accept

• Otherwise => reject

Lexer → Regex → NFA → DFA → Tables 17


Finite Automata State Graphs

• A state

• The start state

• An accepting state

a
• A transition

Lexer → Regex → NFA → DFA → Tables 18


A Simple Example

• A finite automaton that accepts only “1”

0 0,1

0,1

Lexer → Regex → NFA → DFA → Tables 19


Another Simple Example

• A finite automaton accepting any number of 1’s


followed by a single 0
• Alphabet: {0,1}
1

0 0

0,1

Lexer → Regex → NFA → DFA → Tables 20


And Another Example

• Alphabet {0,1}
• What language does this recognize?

1 0

0 0

1
1

Lexer → Regex → NFA → DFA → Tables 21


Epsilon Moves in NFAs

• Another kind of transition: ε-moves


ε
A B

• Machine can move from state A to state B


without reading input

• Only exist in NFAs

Lexer → Regex → NFA → DFA → Tables 22


Deterministic and Nondeterministic Automata

• Deterministic Finite Automata (DFA)


– Exactly one transition per input per state
– No ε-moves

• Nondeterministic Finite Automata (NFA)


– Can have zero, one, or multiple transitions for one
input in a given state
– Can have ε-moves

Lexer → Regex → NFA → DFA → Tables 23


Execution of Finite Automata

• A DFA can take only one path through the state


graph
– Completely determined by input

• NFAs can choose


– Whether to make ε-moves
– Which of multiple transitions for a single input to take

Lexer → Regex → NFA → DFA → Tables 24


Acceptance of NFAs

• An NFA can get into multiple states


1

0 0

• Input: 1 0 0

Rule: NFA accepts if it can get to a final state

Lexer → Regex → NFA → DFA → Tables 25


NFA vs. DFA (1)

• NFAs and DFAs recognize the same set of


languages (regular languages)

• DFAs are faster to execute


– There are no choices to consider

Lexer → Regex → NFA → DFA → Tables 26


NFA vs. DFA (2)

• For a given language NFA can be simpler than


DFA
1
0 0
NFA
0

1 0
0 0
DFA
1
1

• DFA can be exponentially larger than NFA

Lexer → Regex → NFA → DFA → Tables 27


Convert Regular Expressions to NFA (1)

• For each kind of rexp, define an NFA


– Notation: NFA for rexp M

• For ε
ε

• For input a
a

Lexer → Regex → NFA → DFA → Tables 28


Convert Regular Expressions to NFA (2)

• For AB
A
ε
B

• For A + B

ε B ε

ε A ε

Lexer → Regex → NFA → DFA → Tables 29


Convert Regular Expressions to NFA (3)

• For A*
ε

ε A ε

Lexer → Regex → NFA → DFA → Tables 30


Example of RegExp to NFA conversion

• Consider the regular expression


(1+0)*1
• The NFA is
ε

ε C
1 E ε
A ε B ε H ε 1 J
G I
ε D 0 F ε
ε

Lexer → Regex → NFA → DFA → Tables 31


NFA to DFA: The Trick

• Simulate the NFA


• Each state of DFA
= a non-empty subset of states of the NFA
• Start state
= the set of NFA states reachable through ε-moves from
NFA start state
• Add a transition S →a S’ to DFA iff
– S’ is the set of NFA states reachable from any state in
S after seeing the input a, considering ε-moves as well

Lexer → Regex → NFA → DFA → Tables 32


NFA to DFA. Remark

• An NFA may be in many states at any time

• How many different states ?

• If there are N states, the NFA must be in some


subset of those N states

• How many subsets are there?


– 2N - 1 = finitely many

Lexer → Regex → NFA → DFA → Tables 33


NFA -> DFA Example
ε

ε C 1 E ε
A ε B G H 1 J
I
ε D 0 F ε ε ε
ε
0
0 FGHIABCD
ABCDHI 0 1
1
1 EJGHIABCD

Lexer → Regex → NFA → DFA → Tables 34


Implementation

• A DFA can be implemented by a 2D table T


– One dimension is “states”
– Other dimension is “input symbol”
– For every transition Si →a Sk define T[i,a] = k

• DFA “execution” input symbols


0 1
– If in state Si and input a, read
a a b
T[i,a] = k and skip to state Sk

states
b a b
– Very efficient
c b b
d a b

Lexer → Regex → NFA → DFA → Tables 35


Table Implementation of a DFA

0
T
0
S 0 1
1
1 U

0 1
S T U
T T U
U T U

Lexer → Regex → NFA → DFA → Tables 36


Implementation (Cont.)

• NFA -> DFA conversion is at the heart of tools


such as flex

• But, DFAs can be huge

• In practice, flex-like tools trade off speed for


space in the choice of NFA and DFA
representations

Lexer → Regex → NFA → DFA → Tables 37

You might also like