Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
6 views20 pages

CD 2

The document provides an overview of lexical analysis, which is the first phase of a compiler that converts source code into tokens. It discusses the role of the lexical analyzer, issues in lexical analysis, and input buffering techniques, including one and two buffering schemes. Additionally, it covers the specification of tokens using regular expressions and the types of symbols found in high-level programming languages.

Uploaded by

hirof18524
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views20 pages

CD 2

The document provides an overview of lexical analysis, which is the first phase of a compiler that converts source code into tokens. It discusses the role of the lexical analyzer, issues in lexical analysis, and input buffering techniques, including one and two buffering schemes. Additionally, it covers the specification of tokens using regular expressions and the types of symbols found in high-level programming languages.

Uploaded by

hirof18524
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

1

COMPILER DESIGN

J.NEELASARASWATHI M.C.A.,MPHIL,SET
ASSISTANT PROFESSOR
DEPARTMENT OF COMPUTER APPLICATIONS
J . J . COLLEGE OF ARTS AND SCIENCE(AUTONOMOUS)
2
UNIT-1

►Lexical analysis
►Issues in lexical analysis
►Input Buffering
►Specification of token
3
LEXICAL ANALYSIS

► Lexical analysis is the first phases of a compiler. It takes the


modified source code from language preprocessors.
► Written in a form of sentences
► Process of converting a sequence of characters into
sequence of tokens
► It breaks syntaxes into series of Tokens, removing any white
spaces in the source code.
► It termed a lexer, tokenizer, scanner
4
THE ROLE OF LEXICAL ANALYZER

► Read input characters


► To group them into tokens
► Produce as output a sequence of tokens
► Interact with the symbol table
5
THE ROLE OF LEXICAL ANALYZER
6
ISSUES IN THE LEXICAL ANALYSIS

► Simpler design

► Compiler efficiency

► Compiler portability
7
INPUT BUFFERING

► Speed of lexical analysis is a concern

► Lexicalanalysis needs to look ahead several


character before a match can be announced

► Two buffer scheme


8
INPUT BUFFERING

The lexical analyzer scans the input from left


to right one character at a time. It uses two
pointers begin ptr(bp) and forward ptr (fp)to
keep track of the pointer of the input scanned.
9
INPUT BUFFERING

Initially both the pointers point to the first


character of the input string as shown above
10
INPUT BUFFERING
11
INPUT BUFFERING

The forward ptr moves ahead to search for end of lexeme. As


soon as the blank space is encountered, it indicates end of
lexeme. In above example as soon as ptr (fp) encounters a blank
space the lexeme “int” is identified.

The fp will be moved ahead at white space, when fp encounters


white space, it ignore and moves ahead. then both the begin
ptr(bp) and forward ptr(fp) are set at next token.
12
INPUT BUFFERING

The input character is thus read from secondary


storage, but reading in this way from secondary
storage is costly. Hence buffering technique is used.A
block of data is first read into a buffer, and then
second by lexical analyzer. There are two methods
used :
* One Buffer Scheme, and
* Two Buffer Scheme.
13
INPUT BUFFERING
14
ONE BUFFERING SCHEME

In this scheme, only one buffer is used to store the input


string but the problem with this scheme is that if lexeme
is very long then it crosses the buffer boundary, to scan
rest of the lexeme the buffer has to be refilled, that
makes overwriting the first of lexeme
15
ONE BUFFERING SCHEME
16
TWO BUFFERING SCHEME
17
SPECIFICATION OF TOKENS

Regular expression are an important notation


for scifying lexeme patterns
Alphabet is a finite set of symbols
Symbols are letters,digits and punctuation
The set {0,1} is the binary alphabet
18
SPECIFICATION OF TOKENS

String is a finite sequence of symbols drawn


from alphabet
The length is string s is denoted as |s|
Empty string is denoted by ε
19
SPECIFICATION OF TOKENS

Special Symbols
A typical high-level language contains the following symbols
Arithmetic Symbols Addition(+), Subtraction(-), Modulo(%), Multiplication(*), Division(/)
Punctuation Comma(,), Semicolon(;), Dot(.), Arrow(->)
Assignment =
Special Assignment +=, /=, *=, -=
Comparison ==, !=, <, <=, >, >=
Preprocessor #
Location Specifier &
Logical &, &&, |, ||, !
Shift Operator >>, >>>, <<, <<<
20

THANK YOU

You might also like