0% found this document useful (0 votes)

14 views37 pages

Lecture 04

The document discusses the implementation of lexical analysis in programming, focusing on the conversion of regular expressions to finite automata (both deterministic and non-deterministic). It emphasizes the importance of simplicity in software design, outlines the steps for creating lexical specifications, and addresses ambiguities and error handling in the process. Additionally, it covers the execution and implementation of finite automata, including the use of tables for efficient DFA execution.

Uploaded by

itsmeshinoo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views37 pages

Lecture 04

Uploaded by

itsmeshinoo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 37

Implementation of Lexical Analysis

CS143
Lecture 4

Instructor: Fredrik Kjolstad

Slide design by Prof. Alex Aiken, with modifications 1
Written Assignments

• WA1 assigned today

• Due in one week

– 11:59pm
– Electronic hand-in on Gradescope

2
Tips on Building Large Systems

• KISS (Keep It Simple, Stupid!)

• Don’t optimize prematurely

• Design systems that can be tested

• It is easier to modify a working system than to get

a system working

3
Value simplicity

“It's not easy to write good software. […] it has a lot to do with valuing
simplicity over complexity.”
- Barbara Liskov

“Debugging is twice as hard as writing the code in the first place.

Therefore, if you write the code as cleverly as possible, you are, by
definition, not smart enough to debug it.”
- Brian Kernighan

“There are two ways of constructing a software design: One way is to

make it so simple that there are obviously no deficiencies, and the other
way is to make it so complicated that there are no obvious deficiencies.
The first method is far more difficult.”
- Tony Hoare

“Simplicity does not precede complexity, but follows it.”

- Alan Perlis 4
Outline

• Specifying lexical structure using regular

expressions

• Finite automata
– Deterministic Finite Automata (DFAs)
– Non-deterministic Finite Automata (NFAs)

• Implementation of regular expressions

5
Convert Regular Expressions to Finite Automata

• High-level sketch
NFA

Regular
expressions DFA

Lexical Table-driven
Specification Implementation of DFA

Lexer → Regex → NFA → DFA → Tables 6

Notation

• There is variation in regular expression notation

• Union: A + B ≡A|B
• Option: A + ε ≡ A?
• Range: ‘a’+’b’+…+’z’ ≡ [a-z]
• Excluded range:
complement of [a-z] ≡ [^a-z]

Lexer → Regex → NFA → DFA → Tables 7

Regular Expressions in Lexical Specification

• Last lecture: a specification for the predicate

s ∈ L(R)

• But a yes/no answer is not enough!

• Instead: partition the input into tokens

• We will adapt regular expressions to this goal

Lexer → Regex → NFA → DFA → Tables 8

Lexical Specification → Regex in five steps

1. Write a regex for each token

• Number = digit +
• Keyword = ‘if’ + ‘else’ + …
• Identifier = letter (letter + digit)*
• OpenPar = ‘(‘
• …

Lexer → Regex → NFA → DFA → Tables 9

Lexical Specification → Regex in five steps

2. Construct R, matching all lexemes for all tokens

R = Keyword + Identifier + Number + …

= R 1 + R2 + …

(This step is done automatically by tools like flex)

Lexer → Regex → NFA → DFA → Tables 10

Lexical Specification → Regex in five steps

3. Let input be x1…xn

For 1 ≤ i ≤ n check
x1…xi ∈ L(R)
4. If success, then we know that
x1…xi ∈ L(Rj) for some j
5. Remove x1…xi from input and go to (3)

Lexer → Regex → NFA → DFA → Tables 11

Ambiguity 1

• There are ambiguities in the algorithm

• How much input is used? What if

• x1…xi ∈ L(R) and also
• x1…xK ∈ L(R)

• Rule: Pick longest possible string in L(R)

– Pick k if k > i
– The “maximal munch”

Lexer → Regex → NFA → DFA → Tables 12

Ambiguity 2

• Which token is used? What if

• x1…xi ∈ L(Rj) and also
• x1…xi ∈ L(Rk)

• Rule: use rule listed first

– Pick j if j < k
– E.g., treat “if” as a keyword, not an identifier

Lexer → Regex → NFA → DFA → Tables 13

Error Handling

• What if
No rule matches a prefix of input ?
• Problem: Can’t just get stuck …
• Solution:

– Write a rule matching all “bad” strings

– Put it last (lowest priority)

Lexer → Regex → NFA → DFA → Tables 14

Summary

• Regular expressions provide a concise notation for string

patterns

• Use in lexical analysis requires small extensions

– To resolve ambiguities
– To handle errors

• Good algorithms known

– Require only single pass over the input
– Few operations per character (table lookup)

Lexer → Regex → NFA → DFA → Tables 15

Finite Automata

• Regular expressions = specification

• Finite automata = implementation

• A finite automaton consists of

– An input alphabet Σ
– A set of states S
– A start state n
– A set of accepting states F ⊆ S
– A set of transitions state →input state

Lexer → Regex → NFA → DFA → Tables 16

Finite Automata

• Transition
s1 →a s2
• Is read
In state s1 on input “a” go to state s2

• If end of input and in accepting state => accept

• Otherwise => reject

Lexer → Regex → NFA → DFA → Tables 17

Finite Automata State Graphs

• A state

• The start state

• An accepting state

a
• A transition

Lexer → Regex → NFA → DFA → Tables 18

A Simple Example

• A finite automaton that accepts only “1”

0 0,1

0,1

Lexer → Regex → NFA → DFA → Tables 19

Another Simple Example

• A finite automaton accepting any number of 1’s

followed by a single 0
• Alphabet: {0,1}
1

0 0

0,1

Lexer → Regex → NFA → DFA → Tables 20

And Another Example

• Alphabet {0,1}
• What language does this recognize?

1 0

0 0

1
1

Lexer → Regex → NFA → DFA → Tables 21

Epsilon Moves in NFAs

• Another kind of transition: ε-moves

ε
A B

• Machine can move from state A to state B

without reading input

• Only exist in NFAs

Lexer → Regex → NFA → DFA → Tables 22

Deterministic and Nondeterministic Automata

• Deterministic Finite Automata (DFA)

– Exactly one transition per input per state
– No ε-moves

• Nondeterministic Finite Automata (NFA)

– Can have zero, one, or multiple transitions for one
input in a given state
– Can have ε-moves

Lexer → Regex → NFA → DFA → Tables 23

Execution of Finite Automata

• A DFA can take only one path through the state

graph
– Completely determined by input

• NFAs can choose

– Whether to make ε-moves
– Which of multiple transitions for a single input to take

Lexer → Regex → NFA → DFA → Tables 24

Acceptance of NFAs

• An NFA can get into multiple states

0 0

• Input: 1 0 0

Rule: NFA accepts if it can get to a final state

Lexer → Regex → NFA → DFA → Tables 25

NFA vs. DFA (1)

• NFAs and DFAs recognize the same set of

languages (regular languages)

• DFAs are faster to execute

– There are no choices to consider

Lexer → Regex → NFA → DFA → Tables 26

NFA vs. DFA (2)

• For a given language NFA can be simpler than

DFA
1
0 0
NFA
0

1 0
0 0
DFA
1
1

• DFA can be exponentially larger than NFA

Lexer → Regex → NFA → DFA → Tables 27

Convert Regular Expressions to NFA (1)

• For each kind of rexp, define an NFA

– Notation: NFA for rexp M

• For ε
ε

• For input a
a

Lexer → Regex → NFA → DFA → Tables 28

Convert Regular Expressions to NFA (2)

• For AB
A
ε
B

• For A + B

ε B ε

ε A ε

Lexer → Regex → NFA → DFA → Tables 29

Convert Regular Expressions to NFA (3)

• For A*
ε

ε A ε

Lexer → Regex → NFA → DFA → Tables 30

Example of RegExp to NFA conversion

• Consider the regular expression

(1+0)*1
• The NFA is
ε

ε C
1 E ε
A ε B ε H ε 1 J
G I
ε D 0 F ε
ε

Lexer → Regex → NFA → DFA → Tables 31

NFA to DFA: The Trick

• Simulate the NFA

• Each state of DFA
= a non-empty subset of states of the NFA
• Start state
= the set of NFA states reachable through ε-moves from
NFA start state
• Add a transition S →a S’ to DFA iff
– S’ is the set of NFA states reachable from any state in
S after seeing the input a, considering ε-moves as well

Lexer → Regex → NFA → DFA → Tables 32

NFA to DFA. Remark

• An NFA may be in many states at any time

• How many different states ?

• If there are N states, the NFA must be in some

subset of those N states

• How many subsets are there?

– 2N - 1 = finitely many

Lexer → Regex → NFA → DFA → Tables 33

NFA -> DFA Example
ε

ε C 1 E ε
A ε B G H 1 J
I
ε D 0 F ε ε ε
ε
0
0 FGHIABCD
ABCDHI 0 1
1
1 EJGHIABCD

Lexer → Regex → NFA → DFA → Tables 34

Implementation

• A DFA can be implemented by a 2D table T

– One dimension is “states”
– Other dimension is “input symbol”
– For every transition Si →a Sk define T[i,a] = k

• DFA “execution” input symbols

0 1
– If in state Si and input a, read
a a b
T[i,a] = k and skip to state Sk

states
b a b
– Very efficient
c b b
d a b

Lexer → Regex → NFA → DFA → Tables 35

Table Implementation of a DFA

0
T
0
S 0 1
1
1 U

0 1
S T U
T T U
U T U

Lexer → Regex → NFA → DFA → Tables 36

Implementation (Cont.)

• NFA -> DFA conversion is at the heart of tools

such as flex

• But, DFAs can be huge

• In practice, flex-like tools trade off speed for

space in the choice of NFA and DFA
representations

Lexer → Regex → NFA → DFA → Tables 37

Chapter 3 Implementation - of - Lexical - Analysis
No ratings yet
Chapter 3 Implementation - of - Lexical - Analysis
63 pages
File 1675742677 110405 LexicalAnalysis-Continue1
No ratings yet
File 1675742677 110405 LexicalAnalysis-Continue1
39 pages
CD - Unit1 - Lecture4 5 6 7
No ratings yet
CD - Unit1 - Lecture4 5 6 7
50 pages
3 Regex
No ratings yet
3 Regex
16 pages
CS 346: Compilers: Lexical Analyzer Lexical Analyzer
No ratings yet
CS 346: Compilers: Lexical Analyzer Lexical Analyzer
52 pages
CompilerD L3
No ratings yet
CompilerD L3
36 pages
Unit II - Lexical Analysis-20-1-2021
No ratings yet
Unit II - Lexical Analysis-20-1-2021
49 pages
Implementation of The Regular Expression
No ratings yet
Implementation of The Regular Expression
10 pages
Lexical Analysis
No ratings yet
Lexical Analysis
47 pages
Lec02 Lexicalanalyzer
100% (1)
Lec02 Lexicalanalyzer
50 pages
Lexical Analysis for Programmers
No ratings yet
Lexical Analysis for Programmers
67 pages
Lexical Analysis and Token Recognition
100% (3)
Lexical Analysis and Token Recognition
51 pages
Lexical Analysis
No ratings yet
Lexical Analysis
16 pages
Chapter 2 Lexical Analysis
No ratings yet
Chapter 2 Lexical Analysis
55 pages
Lexical Analysis: Regular Expressions
No ratings yet
Lexical Analysis: Regular Expressions
11 pages
Compiler Design Lab Manual
No ratings yet
Compiler Design Lab Manual
32 pages
1st Phase Lexical Analyzer
No ratings yet
1st Phase Lexical Analyzer
33 pages
Compiler Construction Lecture 3-4
No ratings yet
Compiler Construction Lecture 3-4
78 pages
Chapter 3 - Lexical Analysis
100% (1)
Chapter 3 - Lexical Analysis
51 pages
2 - Compilers (Lexical Analysis)
No ratings yet
2 - Compilers (Lexical Analysis)
60 pages
Recognition of Tokens
No ratings yet
Recognition of Tokens
34 pages
Chapter 2
No ratings yet
Chapter 2
99 pages
Applications of FA
No ratings yet
Applications of FA
29 pages
Compilers CH 3
No ratings yet
Compilers CH 3
58 pages
Lexical Analysis
No ratings yet
Lexical Analysis
36 pages
Lecture Week 03
No ratings yet
Lecture Week 03
24 pages
Compiler Course: Lexical Analysis
No ratings yet
Compiler Course: Lexical Analysis
50 pages
UNIT-I - Lexical Analysis
No ratings yet
UNIT-I - Lexical Analysis
51 pages
Lexical Analysis All Token List and Diffence
No ratings yet
Lexical Analysis All Token List and Diffence
4 pages
Chapter-2 Compiler Design
No ratings yet
Chapter-2 Compiler Design
98 pages
ch-2.pdf 2
No ratings yet
ch-2.pdf 2
27 pages
Chapter 2
No ratings yet
Chapter 2
56 pages
19CSE401 CD 02 Scanners
No ratings yet
19CSE401 CD 02 Scanners
82 pages
Chapter Two LexicalAnalysis
No ratings yet
Chapter Two LexicalAnalysis
16 pages
Finite Automata
No ratings yet
Finite Automata
16 pages
PLDI Week 06 Parsing
No ratings yet
PLDI Week 06 Parsing
55 pages
Chapter 3 - Lexical Analysis
No ratings yet
Chapter 3 - Lexical Analysis
51 pages
Chapter 2
No ratings yet
Chapter 2
31 pages
Lesson 13
No ratings yet
Lesson 13
35 pages
Chapter 2
No ratings yet
Chapter 2
91 pages
Compiler Design: Lexical Analysis Basics
No ratings yet
Compiler Design: Lexical Analysis Basics
52 pages
Week 02
No ratings yet
Week 02
28 pages
Formal Language and Automata Theory: Prof. Sachin Jain, Prof - Atul Kumar, Prof. Vaibhavi Patel
No ratings yet
Formal Language and Automata Theory: Prof. Sachin Jain, Prof - Atul Kumar, Prof. Vaibhavi Patel
86 pages
Chapter 3 - Lexical Analysis
No ratings yet
Chapter 3 - Lexical Analysis
51 pages
Finite Automata: A Simple Computing Model
No ratings yet
Finite Automata: A Simple Computing Model
53 pages
Lex Analysis
No ratings yet
Lex Analysis
13 pages
CS-352 - Spring 2024 - Lec4
No ratings yet
CS-352 - Spring 2024 - Lec4
38 pages
Lect 04
No ratings yet
Lect 04
12 pages
Flat CH 2
No ratings yet
Flat CH 2
86 pages
Lect 07
No ratings yet
Lect 07
46 pages
Code Source Tokens Scanner Parser IR
No ratings yet
Code Source Tokens Scanner Parser IR
26 pages
Regular Expression
No ratings yet
Regular Expression
46 pages
FLAT - Ch.2
No ratings yet
FLAT - Ch.2
86 pages
Patterns, Automata, and Regular Expressions
No ratings yet
Patterns, Automata, and Regular Expressions
4 pages
2 - 8 Design of A Lexical Analyzer Generator
No ratings yet
2 - 8 Design of A Lexical Analyzer Generator
15 pages
Modeling, Control and Implementation of Smart Structures - A FEM-State Space Approach
No ratings yet
Modeling, Control and Implementation of Smart Structures - A FEM-State Space Approach
281 pages
Solutions in Multimedia and Hypertext. (P. 21-30) Eds.: Susan Stone and Michael
No ratings yet
Solutions in Multimedia and Hypertext. (P. 21-30) Eds.: Susan Stone and Michael
15 pages
Angry Birds - Olson
No ratings yet
Angry Birds - Olson
2 pages
Directions: Read The Questions Carefully and Kindly Write Your Answers On The Answer Sheet Provided. Do Not Write Anything On The Test Paper
No ratings yet
Directions: Read The Questions Carefully and Kindly Write Your Answers On The Answer Sheet Provided. Do Not Write Anything On The Test Paper
3 pages
Ooad Solutions
100% (1)
Ooad Solutions
11 pages
Network Safety Test
100% (1)
Network Safety Test
7 pages
Manual Camara Feutron
No ratings yet
Manual Camara Feutron
165 pages
(By Kuafu) Introduction To 3D Game Programming With DirectX90c A Shader Approach
100% (1)
(By Kuafu) Introduction To 3D Game Programming With DirectX90c A Shader Approach
413 pages
Verizon Turn Off Port 4567
No ratings yet
Verizon Turn Off Port 4567
2 pages
WF 7610 Manual PDF
No ratings yet
WF 7610 Manual PDF
333 pages
03 Graphs
No ratings yet
03 Graphs
51 pages
Time Duration Calculator
No ratings yet
Time Duration Calculator
1 page
Waqas Hussain PDF
No ratings yet
Waqas Hussain PDF
4 pages
SAP HANA Modeling Guide en
100% (1)
SAP HANA Modeling Guide en
120 pages
Quanta NB5 Block Diagram 2009
No ratings yet
Quanta NB5 Block Diagram 2009
39 pages
COMP 2710 Software Construction: Class Diagrams
100% (1)
COMP 2710 Software Construction: Class Diagrams
13 pages
Sylla CISP43010 F
No ratings yet
Sylla CISP43010 F
6 pages
Python Assignments Set
No ratings yet
Python Assignments Set
10 pages
OmniStack HPE Interoperability Guide - Oct2017
No ratings yet
OmniStack HPE Interoperability Guide - Oct2017
7 pages
Risk Assessment in Internal Control - Docx 1
No ratings yet
Risk Assessment in Internal Control - Docx 1
8 pages
Fracture Mechanics Practical File
100% (1)
Fracture Mechanics Practical File
21 pages
Book Magazine
No ratings yet
Book Magazine
4 pages
Subqueries With ANY, IN, or SOME: ANY True True ANY s1 t1 s1 s1 t2 IN Any IN Any Some ANY
No ratings yet
Subqueries With ANY, IN, or SOME: ANY True True ANY s1 t1 s1 s1 t2 IN Any IN Any Some ANY
7 pages
Paper 1 Mock Review Answers
No ratings yet
Paper 1 Mock Review Answers
10 pages
Lab Assignment 1
No ratings yet
Lab Assignment 1
4 pages
313301-Data Structure Using C 030724
No ratings yet
313301-Data Structure Using C 030724
7 pages
Schedule Tangki
No ratings yet
Schedule Tangki
4 pages
Chapter (1) Introduction To Multimedia, Its Components and Requirements
No ratings yet
Chapter (1) Introduction To Multimedia, Its Components and Requirements
4 pages
B.Tech Seminar: Autonomous Cars
No ratings yet
B.Tech Seminar: Autonomous Cars
5 pages
SQ - Root of Expressions
No ratings yet
SQ - Root of Expressions
10 pages