Intro To Compilers Lecture 2

Lexical analysis partitions a program's source code string into tokens. It defines a set of token types like identifiers, integers, keywords, and whitespace. A lexical analyzer recognizes substrings that correspond to each token type and returns the lexeme (substring) and token type. Regular expressions provide a notation for specifying the patterns that define each token type. A lexical analyzer implementation uses these regular expression patterns to efficiently scan the source code and classify its substrings into tokens.

Uploaded by

fikadu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

76 views15 pages

Intro To Compilers Lecture 2

Uploaded by

fikadu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 15

Compilers

Lexical Analysis
Lexical Analysis
• What is the goal?
if (i ==0)
z=0;
else
z=1;

• The input is just a string of characters:

• If (i==0)\n\tz=0;\nelse\n\tz=1;
• Goal: Partition input string into substrings
• where the substrings are tokens
Token
• Words which are the smallest unit above letters.
• Is the minimal syntax category.
• English: noun, verb, adjective …
• Programming language: Identifier, integer, keyword, whitespace, …
• Tokens correspond to sets of strings
• Identifier: strings of letters or digits, starting with a letter
• Integer: a non-empty string of digits
• Keyword: ”else” or “if” …
• Whitespace: a non-empty sequence of blanks, newlines and tabs.
Contd…
• Tokens classify program substrings according to its role
• The output of a lexical analysis is a stream of tokens.
• Parser relies on token distinction.
• Identifier, is treated differently than a keyword
Designing a lexical analyser
• Define a finite set of tokens
• Tokens describe all items of interest
• Choice of tokens depends on language, design of parser …
• Recall
• \tif (i == j)\n\t\tz = 0;\n\telse\n\t\tz = 1;
• Useful tokens for this expression:
• Integer, Keyword, Relation, Identifier, Whitespace, (, ), =, ;
• N.B., (, ), =, ; are tokens, not characters, here
• Next step is to Describe which substrings belong to each token.
Implementation
• An implementation is responsible for two things.
• Recognize substrings corresponding to tokens accurately
• Return the value or lexeme (substring) of the token.
• First it discards unneeded tokens which won’t contribute to parsing
• Whitespaces and comments.

if (i ==0) //if clause

z=0;
if (i == 0)\n\tz=0;\nelse\n\tz=1;
else /*else clause is located here*/
z=1;
Some examples
• C++
• Most are easily done.
• In Template syntax : Foo<Bar>
• Stream syntax: Cin >> var;
• When there is nested templates occur, there is a conflict: FOO<Bar<Bazz>>
• Is if two variables I and f?
• Is == two equal signs = = or ?
Solution
• Left-to-right scan
• lookahead sometimes required.
Regular languages
• Are one of the several formalisms for specifying tokens.
• Regular languages are simple and useful theory
• Easy to understand
• Efficient implementation
• Definition: Let Σ be a set of characters. A language over Σ is a set of
strings of characters drawn from Σ.
Examples of languages

English Programming language

• Alphabet = characters • Alphabet = ASCII
• Language = Sentences • Language = programs
Notations
• Languages are sets of strings.

• Need some notation for specifying which sets we want

• The standard notation for regular languages is regular expressions.

Regullar expressions
• Single character : ‘c’ ={“c”}
• Epsilon: ε ={“”}
• Union A+B ={ s| s ∈A or s ∈B}
• Concatenation AB = {ab | a ∈A and b ∈A}
• Iteration A* = where = AAA… i times.
Regular expressions
• Definition: The regular expressions over Σ are the smallest set of
expressions including
• ε
• ‘c’ where c ∈ Σ
• A + B where A, B are rexp over Σ
• AB “ “ “
• A* Where A is a rexp over Σ
Examples
• Keywords: “else” or “if” or …
• ‘else’ + ‘if’ …
• ‘else’ abbreviates as ‘e’ ‘l’ ‘s’ ‘e’
• Integer: a non-empty string of digits
• Digit = ‘0’ +'1’ +'2’ +'3’ +'4’ +'5’ +'6’ +'7’ +'8’ +’9’
• Integer = digit digit*
• Abbreviation: = AA*
• Identifir: strings of letters or digits, starting with a letter
• Letter = ‘A’ + … + ‘z’ +’a’+….+’z’
• Identifier = letter (letter + digit)*
• Whitespace: a non empty sequence of blanks, newlines, and tabs
Examples
• Phone Number
• +251-911-00 00 00
• Σ = digits U { -, +, ‘ ‘}
• Email Address
• [email protected]

• There are regular expressions everywhere.

• Everything discussed so far is Syntax not semantics (meaning).

Compiler Design Chapter-2
60% (5)
Compiler Design Chapter-2
105 pages
Compiler-Lexical Analysis
100% (1)
Compiler-Lexical Analysis
59 pages
SP Unit III-2024-25
No ratings yet
SP Unit III-2024-25
126 pages
Chapter 2 - Lexical Analysis
100% (1)
Chapter 2 - Lexical Analysis
69 pages
Chapter 2
No ratings yet
Chapter 2
31 pages
2 - Lexical Analysis
No ratings yet
2 - Lexical Analysis
36 pages
Lecture3 E
No ratings yet
Lecture3 E
153 pages
Unit I
No ratings yet
Unit I
89 pages
Lexical Analysis
No ratings yet
Lexical Analysis
57 pages
Chapter 2 Lexical - Analysis
No ratings yet
Chapter 2 Lexical - Analysis
38 pages
Ch3 Modified
No ratings yet
Ch3 Modified
80 pages
Compiler Lexical Analysis Guide
No ratings yet
Compiler Lexical Analysis Guide
56 pages
Chapter 2 - Lexical Analysis - Regular Expressions
No ratings yet
Chapter 2 - Lexical Analysis - Regular Expressions
27 pages
Scanner (Lexical Analyzer) : The Structure of A Compiler
No ratings yet
Scanner (Lexical Analyzer) : The Structure of A Compiler
109 pages
Chapter2-Lexical Analysis
No ratings yet
Chapter2-Lexical Analysis
64 pages
PL Lec 2 Syntax and Semantics
No ratings yet
PL Lec 2 Syntax and Semantics
48 pages
Lecture 03
No ratings yet
Lecture 03
42 pages
CSC 415 Compiler Design: Lexical Analysis
No ratings yet
CSC 415 Compiler Design: Lexical Analysis
40 pages
Chapter 7 Lexical Analysis
No ratings yet
Chapter 7 Lexical Analysis
61 pages
Lexical Analysis
No ratings yet
Lexical Analysis
44 pages
Lexical Analysis: Risul Islam Rasel
No ratings yet
Lexical Analysis: Risul Islam Rasel
148 pages
Acknowledgements: The Slides For This Lecture Are A Modified Versions of The Offering by
No ratings yet
Acknowledgements: The Slides For This Lecture Are A Modified Versions of The Offering by
40 pages
CH 3 Myppt
No ratings yet
CH 3 Myppt
59 pages
2-Lexical Analysis
No ratings yet
2-Lexical Analysis
52 pages
Lexi Cal A Analyzer
No ratings yet
Lexi Cal A Analyzer
38 pages
CD Unit-2
No ratings yet
CD Unit-2
64 pages
Compiler
No ratings yet
Compiler
60 pages
Lexical Analyser
No ratings yet
Lexical Analyser
55 pages
CD Unit-2
No ratings yet
CD Unit-2
64 pages
Unit 6
No ratings yet
Unit 6
109 pages
2 - Lexical Analysis
No ratings yet
2 - Lexical Analysis
52 pages
Compiler Lexical Analysis Guide
No ratings yet
Compiler Lexical Analysis Guide
65 pages
03 Lex Analysis
No ratings yet
03 Lex Analysis
61 pages
2 Lex
No ratings yet
2 Lex
45 pages
Lexical Analysis
No ratings yet
Lexical Analysis
41 pages
WINSEM2023-24 CSI2005 TH VL2023240501823 2024-01-08 Reference-Material-I
No ratings yet
WINSEM2023-24 CSI2005 TH VL2023240501823 2024-01-08 Reference-Material-I
23 pages
FALLSEM2025-26 BCSE307L TH VL2025260101614 2025-07-16 Reference-Material-I
No ratings yet
FALLSEM2025-26 BCSE307L TH VL2025260101614 2025-07-16 Reference-Material-I
20 pages
Ch2+3 Compiler
No ratings yet
Ch2+3 Compiler
21 pages
Lexical Analyzer 2023
No ratings yet
Lexical Analyzer 2023
38 pages
CH 2 - Lexical Analysis
No ratings yet
CH 2 - Lexical Analysis
36 pages
Lexical Analyzer 1
No ratings yet
Lexical Analyzer 1
37 pages
2024 CD-Ch02 Lexical Analysis
No ratings yet
2024 CD-Ch02 Lexical Analysis
25 pages
Lecture 02
No ratings yet
Lecture 02
150 pages
Compiler Design: Lexical Analysis
No ratings yet
Compiler Design: Lexical Analysis
27 pages
Compiler Design for Students
No ratings yet
Compiler Design for Students
40 pages
04 Lexi Cal A Analysis
No ratings yet
04 Lexi Cal A Analysis
39 pages
CD KCS502 Unit 1 B
No ratings yet
CD KCS502 Unit 1 B
12 pages
SE Compiler Chapter 2
No ratings yet
SE Compiler Chapter 2
16 pages
Chapter 1 Review Question and Answers
100% (1)
Chapter 1 Review Question and Answers
20 pages
Lec2 LexicalAnalyser
No ratings yet
Lec2 LexicalAnalyser
30 pages
Compilers - Week 2
No ratings yet
Compilers - Week 2
14 pages
Unit 2 Lexical Analyzer
No ratings yet
Unit 2 Lexical Analyzer
8 pages
Acd Unit-2
No ratings yet
Acd Unit-2
16 pages
Pdf&rendition 1
No ratings yet
Pdf&rendition 1
14 pages
Compiler Design Lexical Analysis
No ratings yet
Compiler Design Lexical Analysis
24 pages
Batch vs Real-Time Processing Explained
No ratings yet
Batch vs Real-Time Processing Explained
6 pages
M.Suhaib Khalid PDF
No ratings yet
M.Suhaib Khalid PDF
10 pages
Grammar Derivation Exercises
No ratings yet
Grammar Derivation Exercises
12 pages
Compiler Design Unit-1 - 4
No ratings yet
Compiler Design Unit-1 - 4
4 pages
Computer Networking Chapter 3 Review
No ratings yet
Computer Networking Chapter 3 Review
11 pages
PT-7528 Series User's Manual: Edition 3.1, October 2017
No ratings yet
PT-7528 Series User's Manual: Edition 3.1, October 2017
103 pages
SEPM Module 3
No ratings yet
SEPM Module 3
32 pages
Introduction To Computer Viruses
92% (12)
Introduction To Computer Viruses
31 pages
Lexical Analysis
No ratings yet
Lexical Analysis
47 pages
Compute!s Second Book of Commodore 64
No ratings yet
Compute!s Second Book of Commodore 64
300 pages
A Capstone Presented To The Faculty of IBA College of Mindanao Inc. T.N Pepito St. Valencia City
100% (1)
A Capstone Presented To The Faculty of IBA College of Mindanao Inc. T.N Pepito St. Valencia City
31 pages
Machine Learning For High-Dimensional Data and Signals: Michel Verleysen
No ratings yet
Machine Learning For High-Dimensional Data and Signals: Michel Verleysen
54 pages
Computer Networking Concepts Explained
No ratings yet
Computer Networking Concepts Explained
16 pages
Midterm Exam: University of Washington CSE 403 Software Engineering Spring 2011
No ratings yet
Midterm Exam: University of Washington CSE 403 Software Engineering Spring 2011
8 pages
Linux Kernel Fuzzing Guide
No ratings yet
Linux Kernel Fuzzing Guide
70 pages
Midterm Exam: University of Washington CSE 403 Software Engineering Spring 2011
No ratings yet
Midterm Exam: University of Washington CSE 403 Software Engineering Spring 2011
8 pages
Intro To Compilers Lecture 2
No ratings yet
Intro To Compilers Lecture 2
15 pages
Unit 1
No ratings yet
Unit 1
18 pages
Huawei QoS-DSCP Mapping Guide
No ratings yet
Huawei QoS-DSCP Mapping Guide
6 pages
SRS v1.0
No ratings yet
SRS v1.0
35 pages
Process Management in OS
No ratings yet
Process Management in OS
60 pages
42 Gears Mobile Device Management Guide
No ratings yet
42 Gears Mobile Device Management Guide
23 pages
Minimize
No ratings yet
Minimize
10 pages
Redefining Cybersecurity With Blockchain: A Modular Framework For Data Integrity and Trustless Security
No ratings yet
Redefining Cybersecurity With Blockchain: A Modular Framework For Data Integrity and Trustless Security
5 pages
Brain Tumor Classification FirstReport
No ratings yet
Brain Tumor Classification FirstReport
42 pages
F
No ratings yet
F
45 pages
Deployment View
No ratings yet
Deployment View
2 pages
Lecture 12-Extreme Programing (XP)
No ratings yet
Lecture 12-Extreme Programing (XP)
24 pages
System Specs for Tech Support
No ratings yet
System Specs for Tech Support
28 pages
Student Management System
No ratings yet
Student Management System
6 pages
Annunciator Unit: 1MRS 750406-MBG SACO 16D1
No ratings yet
Annunciator Unit: 1MRS 750406-MBG SACO 16D1
6 pages
123 Kamble Aditya AJP
No ratings yet
123 Kamble Aditya AJP
20 pages
MU Exam Prep: Distributed Systems
No ratings yet
MU Exam Prep: Distributed Systems
19 pages
Software Deployment Views 12
No ratings yet
Software Deployment Views 12
5 pages
Operating System Essentials
No ratings yet
Operating System Essentials
4 pages
Installed Files
No ratings yet
Installed Files
8 pages
Synopsis - Live Streaming
No ratings yet
Synopsis - Live Streaming
5 pages
Labs Practical of Aws
No ratings yet
Labs Practical of Aws
3 pages
Pandas Library: Features and Usage
No ratings yet
Pandas Library: Features and Usage
4 pages
Practical DateSheet May 2023 Student-1 - 1 - 1
No ratings yet
Practical DateSheet May 2023 Student-1 - 1 - 1
5 pages
Cloud & IT Roles for Experts
No ratings yet
Cloud & IT Roles for Experts
3 pages
Pure Sync Tool User Guide29.05.23
No ratings yet
Pure Sync Tool User Guide29.05.23
3 pages
Backflip
No ratings yet
Backflip
2 pages
Motorola Software R&D Freshers Job Description
No ratings yet
Motorola Software R&D Freshers Job Description
1 page

Intro To Compilers Lecture 2

Uploaded by

Intro To Compilers Lecture 2

Uploaded by

Compilers

• The input is just a string of characters:

if (i ==0) //if clause

English Programming language

• Need some notation for specifying which sets we want

• The standard notation for regular languages is regular expressions.

• There are regular expressions everywhere.

You might also like