0% found this document useful (0 votes)

29 views36 pages

Lexical Analysis

The document discusses lexical analysis and the design of lexical analyzers. It describes how lexical analysis works, specifies patterns for tokens using regular expressions and regular definitions, and discusses how to code the specifications as transition diagrams and generate lexical analyzers using tools like Lex and Flex.

Uploaded by

anjali meela

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views36 pages

Lexical Analysis

Uploaded by

anjali meela

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 36

Lexical Analysis and

Design of Lexical Analyzer

Lexical Analysis
• Input is scanned completely to identify the tokens
• Tokens (Logical unit)
– Identifier, Keywords, operators etc.
Specification of Tokens
– Strings and Languages
• Finite sequence of Symbols is called Strings
• Set of strings over some alphabet is called Language
– Operation on Languages
• Concatenation:
– L1L2 = { s1s2 | s1  L1 and s2  L2 }
• Union
– L1 L2 = { s | s  L1 or s  L2 }
• Kleene Closure
– L* = 

L i

• Positive Closure
i 0

– L+ = 

L i

– Regular Expressions
i 1
4

Regular Expression
• Notation for representing Tokens
• Ex: Identifiers in Pascal
letter  A | B | ... | Z | a | b | ... | z
digit  0 | 1 | ... | 9
id  letter (letter | digit ) *
5

The Reason Why Lexical

Analysis is a Separate Phase
• Simplifies the design of the compiler
– LL(1) or LR(1) parsing with 1 token lookahead would
not be possible (multiple characters/tokens to match)
• Provides efficient implementation
– Systematic techniques to implement lexical analyzers
by hand or automatically from specifications
– Stream buffering methods to scan input
• Improves portability
– Non-standard symbols and alternate character
encodings can be normalized (e.g. trigraphs)
6

Interaction of the Lexical

Analyzer with the Parser
Token,
Source Lexical tokenval
Program Parser
Analyzer
Get next
token

error error

Symbol Table
7

Attributes of Tokens

y := 31 + 28*x Lexical analyzer

<id, “y”> <assign, > <num, 31> <+, > <num, 28> <*, > <id, “x”>

token
tokenval
(token attribute) Parser
8

Tokens, Patterns, and Lexemes

• A token is a classification of lexical units
– For example: id and num
• Lexemes are the specific character strings that
make up a token
– For example: abc and 123
• Patterns are rules describing the set of lexemes
belonging to a token
– For example: “letter followed by letters and digits” and
“non-empty sequence of digits”
9

Specification of Patterns for

Tokens: Definitions
• An alphabet  is a finite set of symbols
(characters)
• A string s is a finite sequence of symbols
from 
– s denotes the length of string s
–  denotes the empty string, thus  = 0
• A language is a specific set of strings over
some fixed alphabet 
10

Specification of Patterns for

Tokens: String Operations
• The concatenation of two strings x and y is
denoted by xy
• The exponentation of a string s is defined
by

s0 = 
si = si-1s for i > 0

note that s = s = s
11

Specification of Patterns for

Tokens: Language Operations
• Union
L  M = {s  s  L or s  M}
• Concatenation
LM = {xy  x  L and y  M}
• Exponentiation
L0 = {}; Li = Li-1L
• Kleene closure
L* = i=0,…, Li
• Positive closure
L+ = i=1,…, Li
12

Specification of Patterns for

Tokens: Regular Expressions
• Basis symbols:
–  is a regular expression denoting language {}
– a   is a regular expression denoting {a}
• If r and s are regular expressions denoting
languages L(r) and M(s) respectively, then
– rs is a regular expression denoting L(r)  M(s)
– rs is a regular expression denoting L(r)M(s)
– r* is a regular expression denoting L(r)*
– (r) is a regular expression denoting L(r)
• A language defined by a regular expression is
called a regular set
13

Specification of Patterns for

Tokens: Regular Definitions
• Regular definitions introduce a naming
convention:
d 1  r1
d 2  r2
…
d n  rn
where each ri is a regular expression over
  {d 1, d 2, …, d i-1 }
• Any d j in ri can be textually substituted in ri to
obtain an equivalent set of definitions
14

Specification of Patterns for

Tokens: Regular Definitions
• Example:

letter  AB…Zab…z
digit  01…9
id  letter ( letterdigit )*
• Regular definitions are not recursive:

digits  digit digitsdigit wrong!

Specification of Patterns for

Tokens: Notational Shorthand
• The following shorthands are often used:

r+ = rr*
r? = r
[a-z] = abc…z

• Examples:
digit  [0-9]
num  digit+ (. digit+)? ( E (+-)? digit+ )?
16

Regular Definitions and

Grammars
Grammar
stmt  if expr then stmt
 if expr then stmt else stmt

expr  term relop term
 term Regular definitions
term  id if  if
 num then  then
else  else
relop  <  <=  <>  >  >=  =
id  letter ( letter | digit )*
num  digit+ (. digit+)? ( E (+-)? digit+ )?
17

Coding Regular Definitions in

Transition Diagrams
relop  <<=<>>>==
start < =
0 1 2 return(relop, LE)
>
3 return(relop, NE)
other
4 * return(relop, LT)
=
5 return(relop, EQ)
> =
6 7 return(relop, GE)
other
8 * return(relop, GT)
id  letter ( letterdigit )* letter or digit

start letter other

9 10 11 * return(gettoken(),
install_id())
Coding Regular Definitions in 18

Transition Diagrams: Code

token nexttoken()
{ while (1) {
switch (state) {
case 0: c = nextchar();
if (c==blank || c==tab || c==newline) { Decides the
state = 0;
lexeme_beginning++; next start state
}
else if (c==‘<’) state = 1; to check
else if (c==‘=’) state = 5;
else if (c==‘>’) state = 6;
else state = fail();
break; int fail()
case 1: { forward = token_beginning;
… swith (start) {
case 9: c = nextchar(); case 0: start = 9; break;
if (isletter(c)) state = 10; case 9: start = 12; break;
else state = fail(); case 12: start = 20; break;
break; case 20: start = 25; break;
case 10: c = nextchar(); case 25: recover(); break;
if (isletter(c)) state = 10; default: /* error */
else if (isdigit(c)) state = 10; }
else state = 11; return start;
break; }
…
19

The Lex and Flex Scanner

Generators
• Lex and its newer cousin flex are scanner
generators
• Systematically translate regular definitions
into C source code for efficient scanning
• Generated code is easy to integrate in C
applications
20

Creating a Lexical Analyzer with

Lex and Flex
lex
source lex or flex lex.yy.c
program compiler
lex.l

lex.yy.c C a.out
compiler

input sequence
stream a.out of tokens
21

Design of a Lexical Analyzer

Generator
• Translate regular expressions to NFA
• Translate NFA to an efficient DFA

Optional

regular
NFA DFA
expressions

Simulate NFA Simulate DFA

to recognize to recognize
tokens tokens
22

Nondeterministic Finite
Automata
• An NFA is a 5-tuple (S, , , s0, F) where

S is a finite set of states

 is a finite set of symbols, the alphabet
 is a mapping from S   to a set of states
s0  S is the start state
F  S is the set of accepting (or final) states
23

Transition Graph
• An NFA can be diagrammatically
represented by a labeled directed graph
called a transition graph

a
S = {0,1,2,3}
start a b b  = {a,b}
0 1 2 3
s0 = 0
b F = {3}
24

Transition Table
• The mapping  of an NFA can be
represented in a transition table

Input Input
State
(0,a) = {0,1} a b
(0,b) = {0} 0 {0, 1} {0}
(1,b) = {2} 1 {2}
(2,b) = {3}
2 {3}
25

The Language Defined by an

NFA
• An NFA accepts an input string x if and only if
there is some path with edges labeled with
symbols from x in sequence from the start state to
some accepting state in the transition graph
• A state transition from one state to another on the
path is called a move
• The language defined by an NFA is the set of
input strings it accepts, such as (ab)*abb for the
example NFA
26

Design of a Lexical Analyzer

Generator: RE to NFA to DFA
Lex specification with NFA
regular expressions
p1 { action1 }
 N(p1) action1
p2 { action2 } start
s0
 N(p2) action2
…
…
pn { actionn } 
N(pn) actionn

Subset construction

DFA
27

From Regular Expression to NFA

(Thompson’s Construction)

start
i  f

a start a
i f

 N(r1) 
r1r2
start
i f
 N(r2) 
start
r1r2 i N(r1) N(r2) f

r* start
i  N(r)  f


28

Combining the NFAs of a Set of

Regular Expressions
start a
1 2

a { action1 }
start a b b
abb { action2 } 3 4 5 6
a b
a*b+ { action3 }
start
7 b 8
a
1 2

start
0  3
a
4
b
5
b
6
a b

7 b 8
29

Simulating the Combined NFA

Example 1
a
1 2 action1

start
0  3
a
4
b
5
b
6 action2
a b

7 b 8 action3

a a b a
none
0 2 7 8 action3
1 4
3 7 Must find the longest match:
7 Continue until no further moves are possible
When last state is accepting: execute action
30

Simulating the Combined NFA

Example 2
a
1 2 action1

start
0  3
a
4
b
5
b
6 action2
a b

7 b 8 action3

a b b a
none
0 2 5 6 action2
1 4 8 8 action3
3 7
7 When two or more accepting states are reached, the
first action given in the Lex specification is executed
31

Deterministic Finite Automata

• A deterministic finite automaton is a special case
of an NFA
– No state has an -transition
– For each state s and input symbol a there is at most one
edge labeled a leaving s
• Each entry in the transition table is a single state
– At most one path exists to accept a string
– Simulation algorithm is simple
32

Example DFA

A DFA that accepts (ab)*abb

b
b
a
start a b b
0 1 2 3

a a
33

Conversion of an NFA into a

DFA
• The subset construction algorithm converts an
NFA into a DFA using:
-closure(s) = {s}  {t  s  …  t}
-closure(T) = sT -closure(s)
move(T,a) = {t  s a t and s  T}
• The algorithm produces:
Dstates is the set of states of the new DFA
consisting of sets of states of the NFA
Dtran is the transition table of the new DFA
34

-closure and move Examples

-closure({0}) = {0,1,3,7}
a
1 2 move({0,1,3,7},a) = {2,4,7}
 -closure({2,4,7}) = {2,4,7}
start
0  3
a
4
b
5
b
6
move({2,4,7},a) = {7}
a b -closure({7}) = {7}
 move({7},b) = {8}
7 b 8 -closure({8}) = {8}
move({8},a) = 
a a b a
none
0 2 7 8
1 4
3 7
7 Also used to simulate NFAs
35

Simulating an NFA using

-closure and move
S := -closure({s0})
Sprev := 
a := nextchar()
while S   do
Sprev := S
S := -closure(move(S,a))
a := nextchar()
end do
if Sprev  F   then
execute action in Sprev
return “yes”
else return “no”
36

Minimizing the Number of States

of a DFA

C
b a
b a
start a b b start a b b
A B D E A B D E
a a
a
a b a

Step by Step Guide Book On Home Wiring
100% (4)
Step by Step Guide Book On Home Wiring
50 pages
QMS Internal Audit Checklist Demo
No ratings yet
QMS Internal Audit Checklist Demo
4 pages
CH 3 Myppt
No ratings yet
CH 3 Myppt
59 pages
Lexical Analysis and Lexical Analyzer Generators: COP5621 Compiler Construction
No ratings yet
Lexical Analysis and Lexical Analyzer Generators: COP5621 Compiler Construction
52 pages
Chapter 3 - Lexical Analysis and Lexical Analyzer Generators
No ratings yet
Chapter 3 - Lexical Analysis and Lexical Analyzer Generators
52 pages
Chapter 2
No ratings yet
Chapter 2
56 pages
Compiler Course: Lexical Analysis
No ratings yet
Compiler Course: Lexical Analysis
50 pages
Chapter 2
No ratings yet
Chapter 2
91 pages
ch-2.pdf 2
No ratings yet
ch-2.pdf 2
27 pages
Lexical Analysis and Token Recognition
100% (3)
Lexical Analysis and Token Recognition
51 pages
CS 346: Compilers: Lexical Analyzer Lexical Analyzer
No ratings yet
CS 346: Compilers: Lexical Analyzer Lexical Analyzer
52 pages
Chapter 2
No ratings yet
Chapter 2
99 pages
Compiler Design: Lexical Analysis
No ratings yet
Compiler Design: Lexical Analysis
27 pages
1st Phase Lexical Analyzer
No ratings yet
1st Phase Lexical Analyzer
33 pages
Chapter 2
No ratings yet
Chapter 2
77 pages
CompilerD L3
No ratings yet
CompilerD L3
36 pages
Chapter 3 - Lexical Analysis
100% (1)
Chapter 3 - Lexical Analysis
51 pages
Lexical Analysis
No ratings yet
Lexical Analysis
47 pages
Compilers CH 3
No ratings yet
Compilers CH 3
58 pages
File 1675742677 110405 LexicalAnalysis-Continue1
No ratings yet
File 1675742677 110405 LexicalAnalysis-Continue1
39 pages
Chapter-2 Compiler Design
No ratings yet
Chapter-2 Compiler Design
98 pages
Compiler-Lexical Analysis
100% (1)
Compiler-Lexical Analysis
59 pages
Chapter 7 Lexical Analysis
No ratings yet
Chapter 7 Lexical Analysis
61 pages
UNIT-I - Lexical Analysis
No ratings yet
UNIT-I - Lexical Analysis
51 pages
CD - Unit1 - Lecture4 5 6 7
No ratings yet
CD - Unit1 - Lecture4 5 6 7
50 pages
Unit II - Lexical Analysis-20-1-2021
No ratings yet
Unit II - Lexical Analysis-20-1-2021
49 pages
Lexical Analysis1
No ratings yet
Lexical Analysis1
44 pages
CD - Unit II - Notes
No ratings yet
CD - Unit II - Notes
20 pages
Compiler Lexical Analysis Guide
No ratings yet
Compiler Lexical Analysis Guide
56 pages
Chapter 3 - Lexical Analysis
No ratings yet
Chapter 3 - Lexical Analysis
51 pages
Ch3 Modified
No ratings yet
Ch3 Modified
80 pages
Chapter Two LexicalAnalysis
No ratings yet
Chapter Two LexicalAnalysis
16 pages
Slides 02 - Compiler Construction - UET CS - Lexical Analyzer Rev 2
No ratings yet
Slides 02 - Compiler Construction - UET CS - Lexical Analyzer Rev 2
69 pages
Chapter 3 - Lexical Analysis
No ratings yet
Chapter 3 - Lexical Analysis
51 pages
Lexical Analysis for Programmers
No ratings yet
Lexical Analysis for Programmers
67 pages
CH 3
No ratings yet
CH 3
66 pages
Unit 1 (B)
No ratings yet
Unit 1 (B)
69 pages
Chapter 2 Lexical Analysis
No ratings yet
Chapter 2 Lexical Analysis
55 pages
CC Unit 2
No ratings yet
CC Unit 2
80 pages
2 - Compilers (Lexical Analysis)
No ratings yet
2 - Compilers (Lexical Analysis)
60 pages
Chapter Two (3) (Autosaved)
No ratings yet
Chapter Two (3) (Autosaved)
29 pages
Ch2+3 Compiler
No ratings yet
Ch2+3 Compiler
21 pages
Compiler Construction Lecture 3-4
No ratings yet
Compiler Construction Lecture 3-4
78 pages
CD ch2
No ratings yet
CD ch2
104 pages
Compiler
No ratings yet
Compiler
60 pages
2.1 Constituents of Lexical Analysis
No ratings yet
2.1 Constituents of Lexical Analysis
10 pages
Chapter 2 Lexical Analysis
No ratings yet
Chapter 2 Lexical Analysis
33 pages
Lec02 Lexicalanalyzer
100% (1)
Lec02 Lexicalanalyzer
50 pages
Compiler Design - Lexical Analysis
No ratings yet
Compiler Design - Lexical Analysis
16 pages
Recognition of Tokens
No ratings yet
Recognition of Tokens
34 pages
Lexical Analysis in Compiler Design
No ratings yet
Lexical Analysis in Compiler Design
88 pages
L4 Lexical Analyzer 2
No ratings yet
L4 Lexical Analyzer 2
33 pages
04 Lexi Cal A Analysis
No ratings yet
04 Lexi Cal A Analysis
39 pages
Unit 2 Lexical Analyzer
No ratings yet
Unit 2 Lexical Analyzer
63 pages
Slides CHP 3 and 4
No ratings yet
Slides CHP 3 and 4
21 pages
Compiler Design Chapter-2
60% (5)
Compiler Design Chapter-2
105 pages
Chapter 2
No ratings yet
Chapter 2
31 pages
Acknowledgements: The Slides For This Lecture Are A Modified Versions of The Offering by
No ratings yet
Acknowledgements: The Slides For This Lecture Are A Modified Versions of The Offering by
40 pages
Group 2 Research Paper Chapter 12
No ratings yet
Group 2 Research Paper Chapter 12
24 pages
PG Accomodation Building Construction: An Internship Report
No ratings yet
PG Accomodation Building Construction: An Internship Report
35 pages
Rohde and Schwarz TSMA6B - Bro - en - 3609-5622-12 - v0600
No ratings yet
Rohde and Schwarz TSMA6B - Bro - en - 3609-5622-12 - v0600
26 pages
Edb Efm User
No ratings yet
Edb Efm User
115 pages
JavaTextbook Chapter 21 JDBC-2020
No ratings yet
JavaTextbook Chapter 21 JDBC-2020
29 pages
Digital Tech for Experts
No ratings yet
Digital Tech for Experts
8 pages
1 s2.0 S0196890421011778 Main
No ratings yet
1 s2.0 S0196890421011778 Main
12 pages
Avila Et Al 2021 - Characterization of The Mechanical and Physical Properties
No ratings yet
Avila Et Al 2021 - Characterization of The Mechanical and Physical Properties
12 pages
IC Problem Set GCQ
No ratings yet
IC Problem Set GCQ
2 pages
N - Channel Enhancement Mode " Single Feature Size " Power Mosfet
No ratings yet
N - Channel Enhancement Mode " Single Feature Size " Power Mosfet
9 pages
30 Days of Photoshop Schedule
No ratings yet
30 Days of Photoshop Schedule
9 pages
Draft - Master Direction On Outsourcing of Information Technology (IT) Services
No ratings yet
Draft - Master Direction On Outsourcing of Information Technology (IT) Services
23 pages
Advanced Eigrp Concepts: CCNP ROUTE: Implementing IP Routing
No ratings yet
Advanced Eigrp Concepts: CCNP ROUTE: Implementing IP Routing
19 pages
Revision Questions
No ratings yet
Revision Questions
2 pages
NTC Type SMD: Thermometrics Surface Mount Devices
No ratings yet
NTC Type SMD: Thermometrics Surface Mount Devices
8 pages
Rtu PDF
No ratings yet
Rtu PDF
13 pages
Overlay
No ratings yet
Overlay
3 pages
3is Activity Sheets Quarter 1
No ratings yet
3is Activity Sheets Quarter 1
17 pages
Websys
No ratings yet
Websys
1 page
800 Hotmail Valid by Megalodon
No ratings yet
800 Hotmail Valid by Megalodon
15 pages
An Introduction To American Law Third Edition Ebook and TestBank Bundle Unlocked Test Bank
No ratings yet
An Introduction To American Law Third Edition Ebook and TestBank Bundle Unlocked Test Bank
319 pages
3ms Third Test
No ratings yet
3ms Third Test
4 pages
Arduino and Sensor Systems Review
No ratings yet
Arduino and Sensor Systems Review
7 pages
BCSL 63 Solved Assignment
No ratings yet
BCSL 63 Solved Assignment
10 pages
MUET
No ratings yet
MUET
1 page
Elektronik Soalan KVSkills Zon PDF
No ratings yet
Elektronik Soalan KVSkills Zon PDF
19 pages
Ali Hejazizo: - Curriculum Vitae
No ratings yet
Ali Hejazizo: - Curriculum Vitae
3 pages
Mobile Application User Guide
No ratings yet
Mobile Application User Guide
13 pages

Lexical Analysis

Uploaded by

Lexical Analysis

Uploaded by

Lexical Analysis and

Design of Lexical Analyzer

The Reason Why Lexical

Interaction of the Lexical

y := 31 + 28*x Lexical analyzer

Tokens, Patterns, and Lexemes

Specification of Patterns for

Specification of Patterns for

Specification of Patterns for

Specification of Patterns for

Specification of Patterns for

Specification of Patterns for

digits  digit digitsdigit wrong!

Specification of Patterns for

Regular Definitions and

Coding Regular Definitions in

start letter other

Transition Diagrams: Code

The Lex and Flex Scanner

Creating a Lexical Analyzer with

Design of a Lexical Analyzer

Simulate NFA Simulate DFA

S is a finite set of states

The Language Defined by an

Design of a Lexical Analyzer

From Regular Expression to NFA

Combining the NFAs of a Set of

Simulating the Combined NFA

Simulating the Combined NFA

Deterministic Finite Automata

A DFA that accepts (ab)*abb

Conversion of an NFA into a

-closure and move Examples

Simulating an NFA using

Minimizing the Number of States

You might also like