1
COMPILER DESIGN
J.NEELASARASWATHI M.C.A.,MPHIL,SET
ASSISTANT PROFESSOR
DEPARTMENT OF COMPUTER APPLICATIONS
J . J . COLLEGE OF ARTS AND SCIENCE(AUTONOMOUS)
2
UNIT-1
►Lexical analysis
►Issues in lexical analysis
►Input Buffering
►Specification of token
3
LEXICAL ANALYSIS
► Lexical analysis is the first phases of a compiler. It takes the
modified source code from language preprocessors.
► Written in a form of sentences
► Process of converting a sequence of characters into
sequence of tokens
► It breaks syntaxes into series of Tokens, removing any white
spaces in the source code.
► It termed a lexer, tokenizer, scanner
4
THE ROLE OF LEXICAL ANALYZER
► Read input characters
► To group them into tokens
► Produce as output a sequence of tokens
► Interact with the symbol table
5
THE ROLE OF LEXICAL ANALYZER
6
ISSUES IN THE LEXICAL ANALYSIS
► Simpler design
► Compiler efficiency
► Compiler portability
7
INPUT BUFFERING
► Speed of lexical analysis is a concern
► Lexicalanalysis needs to look ahead several
character before a match can be announced
► Two buffer scheme
8
INPUT BUFFERING
The lexical analyzer scans the input from left
to right one character at a time. It uses two
pointers begin ptr(bp) and forward ptr (fp)to
keep track of the pointer of the input scanned.
9
INPUT BUFFERING
Initially both the pointers point to the first
character of the input string as shown above
10
INPUT BUFFERING
11
INPUT BUFFERING
The forward ptr moves ahead to search for end of lexeme. As
soon as the blank space is encountered, it indicates end of
lexeme. In above example as soon as ptr (fp) encounters a blank
space the lexeme “int” is identified.
The fp will be moved ahead at white space, when fp encounters
white space, it ignore and moves ahead. then both the begin
ptr(bp) and forward ptr(fp) are set at next token.
12
INPUT BUFFERING
The input character is thus read from secondary
storage, but reading in this way from secondary
storage is costly. Hence buffering technique is used.A
block of data is first read into a buffer, and then
second by lexical analyzer. There are two methods
used :
* One Buffer Scheme, and
* Two Buffer Scheme.
13
INPUT BUFFERING
14
ONE BUFFERING SCHEME
In this scheme, only one buffer is used to store the input
string but the problem with this scheme is that if lexeme
is very long then it crosses the buffer boundary, to scan
rest of the lexeme the buffer has to be refilled, that
makes overwriting the first of lexeme
15
ONE BUFFERING SCHEME
16
TWO BUFFERING SCHEME
17
SPECIFICATION OF TOKENS
Regular expression are an important notation
for scifying lexeme patterns
Alphabet is a finite set of symbols
Symbols are letters,digits and punctuation
The set {0,1} is the binary alphabet
18
SPECIFICATION OF TOKENS
String is a finite sequence of symbols drawn
from alphabet
The length is string s is denoted as |s|
Empty string is denoted by ε
19
SPECIFICATION OF TOKENS
Special Symbols
A typical high-level language contains the following symbols
Arithmetic Symbols Addition(+), Subtraction(-), Modulo(%), Multiplication(*), Division(/)
Punctuation Comma(,), Semicolon(;), Dot(.), Arrow(->)
Assignment =
Special Assignment +=, /=, *=, -=
Comparison ==, !=, <, <=, >, >=
Preprocessor #
Location Specifier &
Logical &, &&, |, ||, !
Shift Operator >>, >>>, <<, <<<
20
THANK YOU