Compiler Design Book
Compiler Design Book
*********
i
CONTENTS
CHAPTER – I: INTRODUCTION TO COMPILERS 1.1 – 1.64
1.1 Introduction 1.1
INTRODUCTION TO COMPILERS
1.1 INTRODUCTION:
1.1.1 Translator:
★ A translator is system software which converts program written in one
language to another language.
★ During translation, if any syntax errors are encountered it will be intimated
to the users.
Object Program in
Source Program in another language
one language
Translator
Error Messages
Fig. 1.1
Machine Language:
Computers are made of electronic components and it can understand only two
states of the electronic devices
1. ON state
2. OFF state
1.2 COMPILER DESIGN
In the early days of computing, programming in computers were done only using
the machine understandable language
1 (ON state)
Example:
Assembly Language:
Example:
ADD 10, 2.
★ But, in case of large programs, coding using mnemonics are also difficult.
This resulted in the development of the next higher level of programming.
In High Level Language coding, programs are being written, using english like
statements. This makes the programming easier.
Example:
C = 10 + 2
INTRODUCTION TO COMPILERS 1.3
Error Messages
Fig. 1.2
★ Programming can be in any form either in Assembly Language (or) in High
level language.
★ However machine can understand only 0’s and 1’s (Low Level Language).
Thus, a software is necessary to convert the programs written in High Level
Language / Assembly Level Language to Low Level Language. Such a
software is called as Translator.
COMPILER:
Source Object
Program Compiler Program
in HLL (eg. C, C++) in LLL
Error Messages
Fig. 1.3
The source program is analyzed by the compiler to check (for the syntax errors)
whether the program is up to the standard of the corresponding programming
language or not.
1.4 COMPILER DESIGN
2. Hierarchical Analysis
3. Semantic Analysis
Lexical Analysis: In this stage, the source program is read character by character
from left to right and is grouped into collection of characters called TOKENS.
Hierarchical Analysis: In this stage, tokens are grouped hierarchically into a nested
structure called SYNTAX TREE for checking them for their syntax.
Semantic Analysis: In this stage, the hierarchical structure is checked for its
meaning (like verifying the data types of the variables).
1.1.2 Language Processing System:
We have learnt that any computer system is made of hardware and software.
The hardware understands a language, which humans cannot understand. So we
write programs in high-level language, which is easier for us to understand and
remember. These programs are then ed into a series of tools and OS components to
get the desired code that can be used by the machine. This is known as Language
Processing System.
High Leve
Language
Pre-Processor
Pure HLL
Compiler
Assembly
Language
Assembler
Relocatable
Machine Code
Loader/Linker
Absolute
Machine Code
Fig. 1.4
INTRODUCTION TO COMPILERS 1.5
★ A linker tool is used to link all the parts of the program together for
execution (executable machine code).
★ A loader loads all of them into memory and then the program is executed.
Preprocessor:
Interpreter:
Assembler:
Linker:
Linker is a computer program that links and merges various object files together
in order to make an executable file. All these files might have been compiled by
separate assemblers. The major task of a linker is to search and locate referenced
module/routines in a program and to determine the memory location where these
codes will be loaded, making the program instruction to have absolute references.
Loader:
Cross-compiler:
A compiler that takes the source code of one programming language and
translates it into the source code of another programming language is called a
source-to-source compiler.
Analysis phase reads the source program and splits it into multiple tokens and
constructs the intermediate representation of the source program. And also checks
and indicates the syntax and semantic errors of a source program.
It collects information about the source program and prepares the symbol table.
Symbol table will be used all over the compilation process. This is also called as the
front end of a compiler.
It will get the analysis phase input (intermediate representation and symbol
table) and produces the targeted machine level code. This is also called as the
back end of a compiler.
INTRODUCTION TO COMPILERS 1.7
Error
Checking
Source
Program
Analysis Phase Synthesis Phase Machine
Code
Constructs
1. Intermediate
Representation
2. Symbol Table
Source Program
(Character Stream)
Scanner
To kens
Parser
Syntactic Structure
Semantic
Routines
Intermediate
Representasion
Optimizer
Code
Generator
Target
Machine Code
Fig. 1.6
1.8 COMPILER DESIGN
Scanner:
★ The scanner begins the analysis of the source program by reading the input,
character by character and grouping, characters into individual word and
symbols (tokens).
★ RE Regular Expressions
★ YACC.
1.4 SEMANTIC ROUTINES:
★ IR (Intermediate Representation).
INTRODUCTION TO COMPILERS 1.9
Optimizer:
★ The IR code generated by the semantic routines is analyzed and transformed
into functionally equivalent but improved IR code.
★ This phase can be very complex & slow.
★ Peephole optimization.
★ Loop optimization, register allocation, code scheduling.
Lexical
analyzer Analysis phase
Syntax analyzer
Semantic
analyzer
Synthesis
phase
Code optimizer
Code generator
Fig. 1.7
1.10 COMPILER DESIGN
Preprocessor
Source program
Compiler
Assembler
Libraries &
relocatable Linker
object files
Absolute machine
code
Fig. 1.8
INTRODUCTION TO COMPILERS 1.11
Compilers:
★ Source code
• Size of hand written code and compiled machine code for same program.
• Better compiler − generates smaller code.
★ If a compiler produces a code which is 20 − 30% slower than the handwritten
code then it is acceptable.
★ Maintainable code.
1.12 COMPILER DESIGN
Source program
Preprocessor
Compiler
Assembler
Library files
relocatable Linker/Loader
object files
Fig. 1.9
INTRODUCTION TO COMPILERS 1.13
Compiler Vs Interpreter:
Compiler Interpreter
★ Errors are displayed after entire Errors are displayed for every
program is checked. instruction interpreted (if any)
Structure of a compiler:
★ Two parts (front end and Back end) − Analysis & Synthesis
★ Analysis part → breaks source program into constituent pieces and imposes
grammatical structure on them.
Phases of a compiler:
Character stream
Lexical Analyzer
Token stream
Syntax Analyzer
Syntax tree
Semantic
Analyzer
Syntax tree
Intermediate
Code Generator
Intermediate Representation
Machine Idependent
Code Optimizer
Intermediate Representation
Code generator
Machine - Dependent
Code Optimizer
Fig. 1.10
INTRODUCTION TO COMPILERS 1.15
★ Symbol table − stores information about the entire source program and used
by all phases of the compiler.
★ Syntax tree:
• Interior node represents the operation.
=
<id, 1> +
<id, 2>
*
<id, 3> 60
Fig. 1.11
1.16 COMPILER DESIGN
★ Semantic Analysis
• It checks the source program for semantic consistency with the language
definition.
• Semantic analysis uses syntax tree and information in the symbol table.
• Gathers type information and saves it in syntax tree (or) symbol table
for subsequent use during intermediate code generation.
• If position, initial and rate are floating point numbers then lexeme 60 is
converted.
★ Code optimization
• Improves the intermediate code so that faster better target code will
resulted [which is shorter and less power consumption].
t1 : = id3 ∗ 60.0
id1 : = id2 + t1
★ Code Generation
• Registers (or) Memory locations are selected for each variables used by
the program.
LDF R2 , id3
MULF R2 , R2 , # 60.0
LDF R1 , id2
ADD F R1 , R1 , R2
STF id1 , R1
Symbol Table
1 position ...
2 initial ...
3 rate
• Back end pass phase consist of code generation for a particular target
machine.
• compilers can be produced for different source languages for one target
machine.
(combining different front ends with the backend)
Phases of Compiler:
a = b + c * 60
Lexical analyzer
Syntax analyzer
=
id1 +
* Syntax tree
id2
id3 60
Semantic analyzer
id
=1
id1 + Semantic tree
id2 *
id3 int to float
60
a ..........
Intermediate code
b .......... generator
c ..........
t1 : = inttofloat (60)
. t2 : = id3 * t1 Intermediate code
. ..........
. t3 : = id2 * t2
id1 : = t3
Symbol table
Code optimizer
Code generator
MOVF id3 , R2
MULF #60.0, R2
Machine code
MOVF id2 , R1
ADDF R2 , R 1
MOVF R 2, id1
Fig. 1.13
INTRODUCTION TO COMPILERS 1.21
→ Preprocessors
→ Assemblers
→ Two-pass assembler
★ Preprocessors
★ Assemblers
★ Two-pass assembler
• first pass: identifiers are found, stored in symbol table and are assigned
storage locations.
token
Source Lexical
analyzer get next Parser
program
token
Symbol
table
Fig. 1.14
★ Secondary task − stripping out from comments and white spaces (blank, tab,
new line).
• Correlating error messages from the compiler with the source program
(eg. associating error message with its line no.)
const pi = 3.1416.
if if if
relation < , < = , = , < > , >, > = < or < = or = or < > or > or > =
★ When more than one pattern matches a lexeme, the lexical analyzer must
provide additional information about the particular lexeme that matched to
the subsequent phases of the compiler.
★ Line number on which the identifier first appears and its lexeme are stored
in symbol table.
★ Buffer pairs
Algorithm:
if forward at end of first half then begin reload second half
forward = forward + 1
end
else if forward at end of second half then begin
reload first half
end
Forward = Forward + 1
forward = forward + 1
end
else if forward at end of second half then begin
reload first half
end
else /* eof within a buffer signifying
end of i/p */
end
Look ahead code with sentinels
• alphabet (or) character class denotes any finite set of symbols (letters &
characters).
• { 0, 1 } − binary alphabet
1.26 COMPILER DESIGN
• strings over some alphabet − finite sequence of symbols drawn from that
alphabet.
• S − length of a string S.
eg: S → banana
• x = dog y = house
xy → concatenation of x and y → doghouse.
S0 = t
Si = Si − 1
∈ S = S S1 = S S2 = SS S3 = SSS, …
INTRODUCTION TO COMPILERS 1.27
★ Operation on languages
Operation Definition
union of L and M LUM = { S S is in L (or) S is in M }
LUM
Concatenation of L and M LM = { St S is in L and t is in M }
LM
∞
L = ∪ Li
∗
Kleene closure of L
i=0
L∗
* 0 (or) more number of
concatenation
∞
L+ = ∪ Li
Positive closure of L i=1
★ Regular Expressions
L (r)
• Let Σ = { a, b }
(a∗ b∗)∗
Regular expression a/a* b ⇒ a + b − set containing the string and all strings
consisting of zero or more a’s followed by ab.
1.28 COMPILER DESIGN
Basis of definition
• If r and S are regular expression denotes the language L (r) and L (s)
then
Axiom Description
rs=sr Commutative
r (s t) = (r s) t is associative
r (s t) = rs rt
r∗∗ = r∗ → * is idempotent.
★ Regular definitions
d2 → r2
dn → r n
di − distinct name
Σ ∪ { d1 , d2 , … di − 1 }
eg: letter→A B … Z a b … Z
digit → 0 1 ... 9
optional-fraction → digits ∈
★ Notational shorthands
digit → 0 1 ... 9
digits → digit +
optional-fraction → ( • digits)?
optional-exponent → (E (+ − )? digits)?
• character classes
[a − z] denote regular expression a|b| ... |z
[A − z a − z] [A − Z a − z 0 − 9]* ⇒ identifiers.
term → id num
if, then, else, relop, id, num → terminals which generate set of strings given by the
following representations.
INTRODUCTION TO COMPILERS 1.31
if → if
then → then
else → else
ws → delim+
ws − −
if if −
then then −
else else −
< relop LT
<= relop LE
= relop EQ
<> relop NE
= relop GT
>= relop GE
1.32 COMPILER DESIGN
★ Transition diagrams
• Intermediate step in the construction of a lexical analyzer.
• Stylized flowchart represents transition diagram.
• Depict the actions that take place when a lexical analyzer is called by
the parser to get next token.
*
4 return (relop, LT)
=
5 return (relop, EQ)
> =
6 7 return (relop, GE)
other *
8 return (relop, GT)
Fig. 1.15
Transition diagram for identifical and key words
Letter or disit
other *
Start Letter return (get token ()
9 10 11
install - id ())
returns a panter to
symbol table entry
Fig. 1.16
INTRODUCTION TO COMPILERS 1.33
digit digit
+ or -
Start digit digit E
12 13 14 15 16 17
E
digit
digit
digit *
18 other 19
digit digit
digit other *
Start digit 24
20 21 22 23
digit digit
Fig. 1.17
Non-Regular Languages:
(i) The construction of scanner for a new language takes much time and
hence to automate the construction of scanners for the new
programming language, several built in tools have been developed.
(ii) ‘LEX’ is such a tool used to generate lexical analyzer for a variety of
languages.
1.34 COMPILER DESIGN
LEX Source
LEX Compiler
Program (lex.l) lex.yy.c
(Tabular representation
of transition diagram &
routine to recognize
lexemes)
C Compiler
lex.yy.c a.out
C Compiler
Input stream Sequence of
tokensa
Fig. 1.18
(iv) LEX Specifications:
A LEX program consists of 3 parts:
Declarations
% %
Translation rules
% %
Auxiliary procedures
(v) Declarations:
This includes declaration of variables, regular definitions and manifest constant,
identifiers that is declared to represent a constant.
Note: A manifest constant is an identifier that is declared to represent a
constant.
(vi) Transition Rules:
These are statements of the form,
p1 { ACTION1 }
p2 { ACTION2 }
pn { ACTIONn }
INTRODUCTION TO COMPILERS 1.35
Those procedures that are needed by the actions are specified in this part.
r1 , r2 → regular expressions
Eg:
with this, the lexical analyzer will look ahead in its input buffer for a sequence of
letters & digits followed by an equal sign followed by letters and digits followed by
a comma.
DO5I = 1, 25 → DO − Keyword
(ix) LEX program to identify constants, variables, keywords & relational operators.
1.36 COMPILER DESIGN
FA simulator
Fig. 1.19
% { /* definitions of manifest constants
LT, LE, EQ, GT, GE, IF, THEN, ELSE, ID*, To }
/* regular expression */
delim [ \t \n ]
cos { delim }+
letter [ A - Z a - z ]
digit [0 - 9]
% %
if { return (IF) ; }
“< =” “ LE
“=” “ EQ
“>” “ GT
“>=” “ GE
“< >” “ NE
% %
/* regular expression */
delim [ \t \n ]
ws { delim }+
letter [ A - Z a - z ]
digit [0 - 9]
install_id ( )
}
install_num ( )
{
similar procedure to install number into the symbol table
}
Some variables used in this program are:
N(P1 )
So e
N(P2 )
N(P3 )
Fig. 1.20
★ At each step, the sequence of steps that the combined NFA will be in after
seeing each input character should be constructed.
★ If more than one match occurs, then return the pattern that appears first.
Convert the NFA into DFA. If more than one accepting state occurs, then the
pattern that appears first has the priority.
INTRODUCTION TO COMPILERS 1.39
Declaration part
% {
int c = 0, w = 0, l = 0, s = 0;
% }
% %
translation rules
[\n] l ++ ; s ++ ;
[\t ‘ ’ ] s ++ ;
[^ ‘ ’ \ t \ n ] + w ++ ; c + = y y leng ;
% %
if (arg c = = 2)
yylex ( ) ;
else
print f ( “error” ) ;
}
1.40 COMPILER DESIGN
% }
Regular definitions
letter [a − Z A − Z]
digit [0 − 9]
operators [ + * / = ]
% %
% %
yylex ( ) ;
yylex ( )
function vote have to invoke to start the process lex will take the file which is
pointed by yyin
yyleng
Returns the length of the matched string in yytex t
After scanning the whole file it needs to print the output in a file
yy out → pointer to a file where it has to keep the o/p.
int n = 0 ;
%}
// rule section
%%
. ;
%%
int main()
yylex();
}
1.42 COMPILER DESIGN
★ At the time of transition, the automata can either move to the next state
or stay in the same state.
★ Finite automata have two states, Accept state or Reject state. When the
input string is processed successfully, and the automata reached its final
state, then it will accept.
Formal Definition of FA:
A finite automaton is a collection of 5-tuple (Q, Σ , δ , q0, F), where:
1. Q: finite set of states
5. δ: Transition function
Finite Automata Model: Finite automata can be represented by input tape and
finite control.
Input tape: It is a linear tape having some number of cells. Each input symbol is
placed in each cell.
Finite control: The finite control decides the next state on receiving particular input
from input tape. The tape reader reads the cells one by one from left to right, and
at a time only one input symbol is read.
a b c a b b a Input tape
Tape reader
reading the
input symbol
Finite Control
Fig. 1.21: Finite automata model
INTRODUCTION TO COMPILERS 1.43
Compiler Frontend
symbol Table
Fig. 1.22
★ Lexical Analysis: Identify atomic language constructs.
Each type of construct is represented by a token.
(e.g. 3.14 → FLOAT, if → IF, a → ID).
Fig. 1.23
WS (blank|tab|newline)+ skip
IF if genToken():
1
a
a b
initial b final
0 3 4
state state
b b
2
Fig. 1.24
INTRODUCTION TO COMPILERS 1.45
Finite Automata
Fig. 1.25
1. DFA:
2. NFA:
∂ → Transitional Function.
q0 → Beginning state.
F → Final State
NFA with (null) or ∈ move: If any finite automata contains ε (null) move or
transaction, then that finite automata is called NFA with ∈ moves.
1 1 0, 1
e
A B C
e
0
STATES 0 1 epsilon
A B, C A B
B − B C
C C C −
Epsilon (∈) – closure: Epsilon closure for a given state X is a set of states which
can be reached from the states X with only (null) or ε moves including the state X
INTRODUCTION TO COMPILERS 1.47
itself. In other words, ε-closure for a state can be obtained by union operation of the
ε-closure of the states which can be reached from X with a single ε closure are as
follows:
∈ closure (A) : {A, B, C}
Where
3. q0 : initial state
4. F : final state
5. δ : Transition function
NFA with ∈ move: If any FA contains ε transaction or move, the finite automata
is called NFA with ∈ move.
ε-closure: ε-closure for a given state A means a set of states which can be reached
from the state A with only ε (null) move including the state A itself.
Steps for coverting NFA with ε to DFA:
Step 1: We will take the ε-closure for the starting state of NFA as a starting state
of DFA.
Step 2: Find the states for each input symbol that can be traversed from the present.
That means the union of transition value and their closures for each state of NFA
present in the current state of DFA.
Step 3: If we found a new state, take it as current state and repeat step 2.
Step 4: Repeat Step 2 and Step 3 until there is no new state present in the transition
table of DFA.
Step 5: Mark the states of DFA as a final state which contains the final state of
NFA.
1.48 COMPILER DESIGN
q1
e 0
Start q q 1 q
0 3 4
e
1
q2
Fig. 1.27
Solution:
= ε−closure { q3 }
= { q3 } call it as state B.
= ε−closure { q3 }
= { q3 } = B.
INTRODUCTION TO COMPILERS 1.49
A B
1
Fig. 1.28
Now,
δ′ (B, 0) = ε−closure { δ (q3, 0) }
=φ
For state C:
δ′(C, 0) = ε−closure { δ (q4, 0) }
=φ
δ′(C, 1) = ε−closure { δ (q4, 1) }
=φ
A B
1
1
Fig. 1.29
Outline:
First Algorithm:
Second Algorithm:
★ Combines states having the same future behavior O(n* log (n))
Third Algorithm:
★ Used when computing ε − closure (move (T, a)) − the set of states reachable
from T on input a.
★ Important state
• Initial states in the basis part for a particular symbol position in the R.
start a
i F
Fig. 1.30
• The accepting state of the NFA, r becomes important state in the (r) #
NFA.
Syntax Tree:
0 #
6
0 b
5
0 b
4
a 0 cat nodes are
* 3 reprecented as
circle
a b
1 2
Fig. 1.31
Representation Rules:
a
1 C
e
start e e a b b e
A B E 3 4 5 6 12
e
b
2 D
e
Fig. 1.32
★ The correspondence between numbered states in the NFA and the positions
in the syntax tree.
Nullable (n):
X = a1 a2 … an is L ( (r) #)
such that,
Example:
1. nullable (n) = false
* 03
b
5
a b
1 2
Fig. 1.33
INTRODUCTION TO COMPILERS 1.55
c1 c2
6
{1, 2, 3} {4}
0 {3} 4 {4}
c2
{1, 2}
{1, 2} last {3} a {3}
pos first 3 last
first
nodel pos
pos * pos
{1, 2} {1, 2}
c1 c2
{1, 2} {1, 2}
Node
a b
{1} {2}
{1} 1 2 {2}
{1} {2}
Fig. 1.34
1.56 COMPILER DESIGN
2. if n is a star-node (rule 2)
if i is a position in last pos (n) then all positions in first pos (n) are in
follow pos (i).
Applying rule 1 & 2 for previous syntax tree to find follow pos:
Applying rule 1
Applying rule 2
Input:
Regular Expression r
Output:
Method:
★ to build
• D states the set of DFA states
(i) start state of D is first pos (no), where no is the root of T.
D tran [A, b] =
followpos (0) = {1,2,3} = A
D tran [B, a] =
D tran [B, b] =
D tran [C, a] =
D tran [C, b] =
followpos (2) u followpos (5) = {1,2,3,5} = D
Minimized DFA
b
b a
start a b b
123 1234 1235 1236
a
a
Fig. 1.35
1.58 COMPILER DESIGN
D tran [D, a] =
followpos (1) ∪ followpos (3) = B
D tran [D, b] =
followpos (2) ∪ followpos (6) = {1,2,3}
3 4 5 6
Fig. 1.36
Solution:
r = aa∗ bb∗
r1 r2 r3 r4
r5 r6
R7
INTRODUCTION TO COMPILERS 1.59
R1 :
r7
r5 r6
a
r1 r2 r3 r4
a a* b b*
Fig. 1.37
R2 :
e
e a e
e
Fig. 1.38
R3 :
Fig. 1.39
R4 :
b
e
e b
e
Fig. 1.40
R5 :
e
a e a e
e
Fig. 1.41
1.60 COMPILER DESIGN
R6 : R3 R4
e
b b
e e
e
Fig. 1.42
R7 : R5 R6
e
a e a e
1 2 3 4 5
e
e
e 11
12
e
e b e
6 7 e 8 9 10
b
e
e
Fig. 1.43
NFA to DFA using subset construction:
Method:
Computation of ε − closure:
ε − closure (0) = { 0, 1, 6 }
ε − closure (1) = { 1 }
ε − closure (2) = { 2, 3, 5, 11 }
ε − closure (3) = { 3 }
ε − closure (4) = { 3, 5, 11 }
ε − closure (5) = { 5, 11 }
ε − closure (6) = { 6 }
INTRODUCTION TO COMPILERS 1.61
ε − closure (11) = { 11 }
Construction of DFA:
ε − closure (0) = { 0, 1, 6 } → A
8 (A, a) = { 2 }
8 (A, b) = { 7 }
= { 2, 3, 5, 11 } → B
8 (B, a) = { 4 }
8 (B, b) = { }
= { 7, 8, 10, 11 } → c
8 (C, a) = { }
8 (C, b) = { 9 }
= { 3, 4, 5, 11 } → D
1.62 COMPILER DESIGN
={}
8 (D, a) = { 4 }
8 (D, b) = { }
= { 8, 9, 10, 11 } → E
8 (E, a) = { }
8 (E, b) = { 9 }
Transition Table:
I/P’s
State
a b
A B C
B D −
C − E
D D −
E − E
= { 3, 4, 5, 11 } → D
= { 8, 9, 10, 11 } → E
INTRODUCTION TO COMPILERS 1.63
DFA Diagram:
a
a
B D
a
b b
C E
b
Fig. 1.44
1.14. MINIMIZING DFA:
p = {A B C D E}
Non-Accepting Accepting
states states
= { {A} {BCDE} }
Transition Table
I/P’s
State
a b
A B C
B B −
C − C
1.64 COMPILER DESIGN
a
B
a
b
C
b
Fig. 1.46
*********
CHAPTER – II
SYNTAX ANALYSIS
2.1. INTRODUCTION:
Syntax Analysis:
★ Syntax of a Programming Language can be described by context − free
grammars (or) BNF (Backs − Newer form).
★ Grammar − precise
easy to understand
syntactic specification
token
Source Parse intermed
Lexical rest of
get Parser
Program analyzer Tree front end refresher
next
token
Symbol
table
Fig. 2.1
2.2 COMPILER DESIGN
top-down
bottom-up parsers LR
(LL)
build parse tree from top to bottom build parse tree from leaf to root
(root) (leaf)
★ Error handler
Error productions:
Global correction:
★ minimal sequence of changes to obtain globally least cost correction.
★ transform x to y with minimal changes (insert, delete, change).
Role of a parser:
★ Every programming language has rules that describe the syntactic structure
of structured programs.
★ Parsing or synta Analysis is one phase of compiler that checks whether the
states of the program as per the conventions of the language and it converts
the sequence of tokens into syntax free (or) parse tree.
Syntax Syntax or
Sequence Analysis parse tree
of tokens
Fig. 2.2
★ Give the tokens separated by the lexical analyzer phase, the syntax analyzer
checks whether the stats are upto the constructs of source programming
language & if correct it constructs the syntax tree otherwise displays an
error.
2. Mechanism to check whether the states of the input stream are as per the
constructs (parsing)
E − expression
S1 , S2 − statement
★ grammar production
statement → if expression then statement else CFG consists of statement.
2.4 COMPILER DESIGN
★ Report the place in source program where an error is detected because the
actual error occurred within the previous few tokens.
local correction
replace correction
delete extra;
insert missing;
Replace a prefix of the remaining input by some string that allows parser to
continue
synchronization tokens
delimiters like;
Eg:
expr → (expr)
expr → − expr
expr → id
op → +
op → −
op → *
op → /
op → ^
id + − ∗ ⁄ ^( )
expr op
start symbol
expr
★ Notational Conventions
terminals
→ lower case letters such a, b, c, ...
→ digits 0, 1, ...
non terminals
→ upper case A, B, C ....
using
★ Shorthands
E → E A E | ( E ) | − E | id
A → +|−|∗|,|↑
E → start symbol
★ Derivation
Production is treated as a re-writing rule in which the non terminal on the left
is replaced by the string on the right side of the production
eg: E ⇒ − E ⇒ − ( E ) ⇒ − (id)
α1 ⇒ α2 means α1 derives α2
−
⇒ symbol means derives in zero or more steps.
+
⇒ symbol means derives in one (or) more steps
∗
α ⇒ α for any string α
∗ ∗
α ⇒ β and β ⇒ γ then α ⇒ γ
★ G − Grammar
+
String of terminal w is in L (G) if S ⇒ w
w − sentence of G
SYNTAX ANALYSIS 2.7
★ Parse tree
• Graphical representation for a derivation that filters out the choice
regarding replacement order.
• Each interior node − non-terminal A
• Children of node is labeled from left to right by the symbols in the right
side of the production by which this A was replaced in the derivative.
• Leaves are labeled by non-terminals (or) terminals and read from left to
right, they constitute a sentential form, called the yield (or) frontier of
the tree.
Parse tree for − (id + id)
E
_
E
( E )
E + E
id id
Fig. 2.3
2.8 COMPILER DESIGN
Eg: id + id ∗ id
E ⇒ E+E E ⇒ E * E
⇒ id + E ⇒ E+E∗E
⇒ id + E ∗ E ⇒ id + id ∗ E
⇒ id + id ∗ id ⇒ id + id ∗ id
E E
E + E E * E
id E * E E + E
id id id id id
Fig. 2.4
Context – Free Grammar:
Example:
expr → id
op → +
op → −
SYNTAX ANALYSIS 2.9
★ Terminals – id, +, −
★ Non-Terminals – expr, op
Notational Rules:
1. Terminals:
(i) Lower − case letters that appear at the beginning of the alphabet sequence
such as a, b, c, ....
2. Non-Terminals:
(i) Upper-case letters that appear at the beginning of the alphabet sequence such
as A, B, C ....
(ii) The letter ‘S’, when appears is usually the start symbol.
3. Grammar symbols:
Upper − case letters that appear at the end in the alphabet sequence such as
X, Y, Z represent a grammar symbol that is either terminals or non-terminals.
4. Strings of Terminals:
Lower case letters that appear at the nd in the alphabet sequence such as u,
v, ..., z represent string of terminals.
2.10 COMPILER DESIGN
6. Rule:
A → α1 ⁄ α2 ⁄ .... ⁄ αn.
7. Start symbol:
Unless specified, the non-terminal on the left side of first production is the start
symbol.
Derivations:
★ Notations
∗
• −−−> Derives in zero or more steps
+
• −−−> Derives in one or more steps.
★ Language
+
Given a grammar ‘G’ with the start symbol ‘S’, then −−−> relation can be used
to define L (G), the language generated by G.
SYNTAX ANALYSIS 2.11
Strings:
∗
Strings in L (G) contain only terminals, ‘w’ is in L (G) if, S −−−> w where w is
string of terminals (or) sentence of G.
If 2 CFG generate the same language, then the grammars are said to be
equivalent.
★ Sentential form:
∗
If S −−−> α, where α may contain non-terminal, then α is called as the sentential
form of G.
Derivations in which only the left most non-terminals in any centennial form
is replaced at each step are called as left most derivations.
Derivations in which only the right most non-terminal in any sentential form
is replaced at each step are called as right most derivations.
Example:
S → (L) ⁄ a
L → L, S ⁄ S
Construct the left most & right most derivation for the string (a, a).
lm rm
S −−−−> (F) S −−−−> (L)
lm rm
S −−−−> (L, S) S −−−−> (L, S)
lm rm
S −−−−> (S, S) S −−−−> (L, a)
lm rm
S −−−−> (a, S) S −−−−> (S, a)
lm rm
S −−−−> (a, a) S −−−−> (a, a)
Parse Trees:
( L )
L * S
S a
a
Fig. 2.5
Ambiguity:
A grammar that produces more than one parse tree for same sentence is said
to be ambiguous. An ambiguous grammar is one that produces more than one left
most derivation or more than one rightmost derivation for the same sentence.
Example:
E → E + E
E → E * E
E → a
lm lm
E −−−> E + E E −−−> E ∗ E
lm lm
−−−> a + E −−−> E + E ∗ E
lm lm
−−−> a + a ∗ E −−−> a + E ∗ E
lm lm
−−−> a + a ∗ E −−−> a + a ∗ E
lm lm
−−−> a + a ∗ a −−−> a + a ∗ a
Since for a single input sentence 2 left most derivations are there, the grammar
is said to be ambiguous.
Ai → a Aj
Ai → Aj
Example:
1. Write the CFG for the following automata:
a, b
a b b
0 1 2 3
Fig. 2.6
SYNTAX ANALYSIS 2.15
Solution:
Corresponding Grammar is,
A0 → a A0 ⁄ b A0 ⁄ a A1
A1 → b A2
A2 → b A3
A3 → ε
A0 → A1 ⁄ A3
A1 → a A2
A2 → a A2 . ∈
A3 → b A4
A4 → b A4 ⁄ ε.
a b b
0 1 2 3
a
a a
Fig. 2.8
2.16 COMPILER DESIGN
Solution:
A0 → b A0 . a A1
A1 → a A1 ⁄ b A2
A2 → b A3 ⁄ a A1
A3 → a A0 ⁄ a A1 ⁄ ε.
Example:
The grammar, E → E + E
A → A α1 ⁄ A α2 ⁄ … ⁄ A αm ⁄ β1 ⁄ β2 ⁄ … ⁄ βn.
A → β1 A′ ⁄ β2 A′ ⁄ … ⁄ βn A′
A′ → α1 A′ ⁄ α2 A′ ⁄ … ⁄ αm A′ ⁄ ε.
1. No cycles
2. No ε − productions.
SYNTAX ANALYSIS 2.17
Problems:
E → E +T⁄T
T → T∗ F⁄F
F → (E) ⁄ id
Solution:
E → E +T⁄T
⇓ replaced by
A → β A′
A′ → α A′ ⁄ ε
E → TE′
E′ → + T E′ ⁄ ε
T → T∗ F⁄F
T → FT′
T′ → ∗ FT′ ⁄ ε
∴ E → TE′
E′ → + TE′ ⁄ ε
T → FT′
T′ → ∗ FT′ ⁄ ε
F → (E) ⁄ id.
S → Aa ⁄ b .... (1)
A → Ac ⁄ sd .... (2)
Solution:
α1 = c , α2 = ad , β = bd
2.18 COMPILER DESIGN
S → Aa ⁄ b
A → AC
A → A ad ⁄ bd.
S → Aa ⁄ b
A → bd A′
A′ → C A′ ⁄ ad A′ ⁄ ε
S → a / ^ / (T)
T → T1 S ⁄ S
α β
Solution:
S → a / ^ / (T)
T → ST′
T′ → , ST′ ⁄ ε.
Left Factoring:
A → α A′
A′ → β1 ⁄ β2
To each non-terminal ‘A’, find the longest prefix ‘?’ common to two or more of
its alternatives.
A → α β1 ⁄ α β2 ⁄ .... ⁄ α βn ⁄ y
SYNTAX ANALYSIS 2.19
By
A → α A′ ⁄ y
A′ → β1 ⁄ β2 ⁄ … βn
Problems:
S → iEt SeS ⁄ i E t s
E → b
A → α A′ ⁄ ν
A′ → β1 ⁄ β2 … ⁄ βn
Solution:
S → i E t SS′ ⁄ a
S′ → e S ⁄ ?
E →b
S → a Abc ⁄ a Ab ⁄ d
A → e
Solution:
S → a A b S′ ⁄ d
S′ → c ⁄ ∈
A → e
2.20 COMPILER DESIGN
S → Ced ⁄ Cdb
C → db
Solution:
S → Cs′
S′ → Cd ⁄ db
C → db
★ Grammars are capable of describing most, but not all, of the syntax of
programming languages.
eg: Requirement that identifiers be declared before they are used cannot be
described by a context − free grammar.
A0 → a A0 ⁄ b A0 ⁄ a A1
A1 → b A2
A2 → b A3
A3 → ∈
• Lexical rules of a lang. are frequently quite simple (no need a rotation
as powerful as grammars)
S → (s) S ⁄ ∈
Ambiguity:
★ Grammar that produces more than one parse tree for some sentence is said
to be anbiguous.
★ Produce more than one left most (or) more than are rightmost derivation
for the same sentence.
★ Eliminating Ambiguity:
Stmt
S1
E1 if expr then stmt else stmt
E2 S2 S3
Fig. 2.9
if E1 then if E2 then S1 else S2
Stmt
if
expr then stmt
E1
if expr then stmt else stmt
E2 S1 S2
Fig. 2.10
• general rule is “match each else with closest previous unmatched then”.
• unambiguous grammar
A′ → α A′ | ∈
eg: E → E+T|T
T → T∗F|F
F → (F) | id
E → T E′
E′ → + T E′ | ∈
T → F T′
T′ → ∗ F T′ | ∈
F → (E) | id
A → A α1 A α2 … A αm β1 β2 … βn
A → β1 A′ B2 A′ … βn A′
A′ → α1 A′ α2 A′ … αm A′ ∈
But it doesn’t eliminate left recursion involving derivations of 2 (or) more steps.
eg: S → Aab
A → Ac Sd ∈
S ⇒ A a ⇒ Sol a
2.24 COMPILER DESIGN
For i = 1 to n do begin
For j = 1 to i − 1 do begin
by the productions Ai → δ1 γ δ2 γ … δK γ
end
end
eg: S → Aa⁄b
A → Ac ⁄ S
S → Aa ⁄ b
A → SA′
A′ → cA′ ⁄ ε
★ Left factoring
i.e. A → α β1 α β2
Left factored is
A → α A′
A′ → β1 β2
Grammar
A → α β1 α β2 … αβn α
then
A → α A′ γ
A′ → β1 β2 … βn
eg: S → iEtSiEtseSa
E → b
Left factored
S → i E t s s′ a
S′ → e S t
E → b.
★ An LR parser detects an error, when it consults the action table and finds
that there is no entry for the given state and input symbol.
★ Canonical LR parser will not make a single reduction before announcing the
error.
2.26 COMPILER DESIGN
★ SLR and LALR can make several reductions before detections an error but
will not make a single shift before detecting an error.
Types of Parsers:
Parsing is a technique that is used to check whether the statements of the input
stream are as per the syntax of the pgmming lang or not.
Parser is a program that takes a string of token as its input & produces a
parse tree if the string is accepted by the grammar otherwise, an error.
1. Top-down parsing
2. Bottom-up parsing.
Top-down parsing
For a given input string, these parsers construct the parse tree from root to
the leaves if the statement is accepted by the s/c language; otherwise it will generate
an error.
2. Predictive parsing.
★ Construct parse tree from root & create nodes of parse tree in pre-order.
eg: S → cAd
A → ab a string w = c a d
SYNTAX ANALYSIS 2.27
S S S
c A d c A d c A d
a b
a
Fig. 2.11
The most general form of top=down parsing that involves back-tracking is
recursive descent parsing.
To check whether a given string is accepted or not i.e., to construct a parse
tree, create child node in preorder.
Example:
1. Consider the following grammar
S → c Ad
A → ab ⁄ a
Construct parse tree for the string cad.
Solution:
Step 1:
S (starting node) Input pointer : cad
↑
Step 2:
S
c A d
Fig. 2.12 Input pointer : cad
↑
Step 3:
S
c A d
a b
Fig. 2.13 Input pointer : cad
2.28 COMPILER DESIGN
Step 4:
‘b’ is compared against ‘d’ & hence found to be wrong. Hence go back to ‘A’ and
check for an alternative.
c A d
Fig. 2.14
Input pointer : cad
↑
Step 5:
S
c A d
a
Fig. 2.15
“String accepted”
eg:
E: T E¢
0 1 2
+ T E¢
E′ :
3 4 5 6
T: F T¢
7 8 9
F T¢
10 * 11 12 13
T′ :
e
( E )
14 15 16 17
F:
id
Fig. 2.16
Simplification of E′ :
Apply ε to E′
e T
+ T +
3 4 5 Þ 3 4
e e
6 6
Apply E′ to E
T +
T + T
0 3 4 Þ 0 3
e
e 6 6
2.30 COMPILER DESIGN
Similarly: Simplifying T′
e
F
* F *
Þ
e
Apply T′ to T
F
*
F * F
Þ
e
e
Fig. 2.17
Non-recursive predictive passing:
★ Maintain stack explicitly rather than implicitly via recursive calls.
★ Key pbm → determining the production to be applied for a non-terminal
Input
a + b $
Buffer
Stack
X
Predictive Parsing Output
Program stream
Y
$ Parsing
Table M
Fig. 2.18
SYNTAX ANALYSIS 2.31
where A − nonterminal
a − terminal
★ The program considers X, the symbol on top of the stack and a, the current
input symbol. These two symbols determine the action of the parser.
(ii) If X = a ≠ $, the parser pops X off the stack & advances the input pointer to
the next input symbol.
E E → TE′ E → TE′
E′ E′ → + T E′ E′ → ∈ E′ → ∈
T T → F T′ T → FT′
T′ T′ → ∈ T′ → ∗ F T′ T′ → ∈ T′ → ∈
F F → id F → (E)
2.32 COMPILER DESIGN
$E id + id ∗ id $
$ E′ T id + id ∗ id $ E → TE′
$ E′ T′ F id + id ∗ id $ T → F T′
$ E′ T′ id id + id ∗ id $ F → id
$ E′ T′ + id ∗ id $
$ E′ + id ∗ id $ T′ → ∈
$ E′ T + + id ∗ id $ E′ → + TE′
$ E′ T id ∗ id $
$ E′ T′ F id ∗ id $ T → FT′
$ E′ T′ id id ∗ id $ F → id
$ E′ T′ ∗ id $
$ E′ T′ F ∗ ∗ id $ T′ → ∗ F T′
$ E′T′ F id $
$ E′ T′ id id $ F → id
$ E′ T′ $
$ E′ $ T′ → ∈
$ $ E′ → ∈
Let X be the top stack symbol and a the symbol pointed to by input
if X is a terminal or $ then
if X = a then
else error)
SYNTAX ANALYSIS 2.33
★ else / * X is a nonterminal * /
if M [X, a] = X → Y1 Y2 .... Yk then begin
end
else error ( )
Predictive parsing:
TD’s:
(c) A transition on a taken means that transition has to be done if that token
the next input symbol.
A → X1 X2 … Xn, create a path from initial to final state, with edges labeled
X1 X2 … xn
(b) If after some actions, it is in state ‘S’ with a edge labeled by terminal ‘a’ to
state ‘t’ & if the next input symbol is ‘a’, then parser moves the input pointer
one position right & goes to state ‘t’.
(c) If edge is labeled by a non-terminal ‘A’, the parser goes to start state for ‘A’,
the parser goes to start state for ‘A’, w/o moving the input pointer. If it
reaches the final state for ‘A’ it immediately goes to state ‘t’.
(d) If there is an edge from ‘S’ to ‘t’ labeled, then from state S the parser goes
to t w’o advancing the input pointer.
S → L=R⁄R
L → ∗ R ⁄ id
R →L
SYNTAX ANALYSIS 2.35
FIRST (L) = { ∗ , id }
FOLLOW (S) = { $ }
FOLLOW (L) = { = , $ }
FOLLOW (R) = { = , $ }
Step 5:
Rule 1:
For each terminal ‘a’ contained in FIRST (A), add A → X to M [A, a] in parsing
table if X derives ‘a’ as the first symbol
M [S, ∗] = S → L = R
M [S, id] = S → L = R
M [L, ∗] = L → ∗R
M [L, id] = L → id
M [R, ∗] = R → L
M [R, id] = R → L
2.36 COMPILER DESIGN
(b) Rule 2:
If FIRST (A) contain null production for each terminal ‘b’ in FOLLOW (A), add
this production (A → null) to M [A, b] in parsing table.
Step 6:
Parsing Table
Non-terminals/
* id = $
Terminals
S S→L=R S→L=R − −
L L → XR L →id − −
R R→L R→L − −
★ Uses a stack to hold information about sub trees that have been parsed
State val
X X-x
state entry is a pointer
Y Y-y to LR (1) parsing table
top → Z Z-z
SYNTAX ANALYSIS 2.37
E→T
T→F
F → digit
n top = top − r + 1
3∗5+4n − −
∗5+4n 3 3
∗5+4n F 3 F → digit
∗5+4n T 3 T → F
5 + 4n T * 3−
+ 4n T∗5 3−5
+ 4n T 15 T → T * F
+ 4n E 15 E → T
4n E+ 15 −
n E+4 15 − 4
n E+F 15 − 4 F → digit
n E+T 15 − 4 T → F
n E 19 E→E+T
En 19
L 19 L→En
Bottom-up Parsing:
Parsers that construct the parse tree from the leaves to the root for a given
input string are said to be Bottom-up parsing i.e. the input string is reduced to the
start symbol. If it can be reduced to a start symbol then the string is said to be
accepted otherwise, not.
★ LR parsers
• SLR parser.
• CLR parser.
• LALR parser.
SYNTAX ANALYSIS 2.39
★ The general form of bottom-up parser is shift − reduce parser & hence
parsing of the input string is made from leaves to the root.
★ This parsing attempts to construct a parse tree for an input string beginning
at the leaves & working up towards the root.
Example:
S → a AB e
A → Abc ⁄ b
B → d
Reduction: a bb cde
a Abc de { A → b }
a A d e { A → Abc }
a A Be { B → d }
S { S → a ABe }
2.40 COMPILER DESIGN
Handle:
Example:
E → E+E⁄E∗E
E → (E) ⁄ id
Identify the handles in the derivation of id + id ∗ id.
rm
E −−−−> E + E
__
rm
−−−−> E + E ∗ E
__
rm
−−−−> E + E
__ ∗ id { HANDLES UNDERLINED }
rm
−−−−> E
__ + id ∗ id
rm
−−−−> id + id ∗ id
HANDLE PRUNING:
Example:
E → E+E⁄E∗E
E → (E) ⁄ id.
and the input string (id + id): Obtain the handle pruning for this sentence.
SYNTAX ANALYSIS 2.41
(id + id) id E → id
(E + id) id E → id
(E + E) E+E E →E+E
E − −
VIABLE PREFIXES:
The set of prefixes of right − sentential forms that appear on the stack of a
shift-reduce parser are called viable prefixes.
(b) Choosing a production in case of there is more than one production with that
sub string on the right side.
★ 2 Pbms to be solve
Stack Input
$ w$
2.42 COMPILER DESIGN
★ Parser operates by shifting zero (or) more input symbols on to the stack
until a handle β is on top of the stack.
★ Parser repeats this cycle until an error is detected (or) stack contains start
symbol & input is empty (ie) stack i ⁄ p
$S $
★ Shift − next input symbol is shifted onto the top of the stack.
★ reduce − right end of handle is at top of stale & replace it with the left
end of handle.
$ E + E ∗ id3 $ reduce by E → id
$E+E∗E $ reduce by E → E ∗ E
$E+E $ reduce by E → E + E
$E $ accept
SYNTAX ANALYSIS 2.43
★ Viable prefixes − set of prefixes of right sentential farms that can appear
on the stack of a shift − reduce parser.
other
Stack
★ Stack content & next input symbol are not sufficient to determine which
production should be used in a reduction.
( $ XYZ , ai ai + 1 … an $ )
Initial configuration
($,w$)
Final configuration
($S,$)
ACTIONS:
Shift − The next input symbol is shifted on to the top of the stack.
Reduce − The parser knows the right end of the handle is on the top of the
stack. It then locates the left end of the handle within the stack
and decides the non-terminal for replacement.
Error − Discovers that an syntax error has occurred and calls an error
recovery routine.
1. The parser operates by shifting zero or more i/p symbols onto the stacks until
a handle β is on the top of the stack.
3. This is repeated until it has detected an error or until the stack reaches the
final configuration.
Problems:
S → Cc
C → cC
C → d
Solution:
$ ccdd $ shift
$ c cdd $ shift
$ cc dd $ shift
$ ccd d $ reduce C → d
$ ccC d $ Reduce C → cC
$ c C d $ Reduce C → cC
$ C d $ shift
$ C d $ Reduce C → d
$ CC $ Reduce S → CC
$ S $ Accept
E → E+T⁄T
T → T ∗F⁄F
F → (E) id
check whether the input string (id * id) & + id is accepted or not.
Upon reaching a configuration in which, knowing the stack contents & next
input symbol, the parser cannot decide whether to shift or to reduce.
Upon reaching a configuration in which, the parser cannot decide which of the
several reductions to make.
2.46 COMPILER DESIGN
Example:
At this stage, the parser cannot decide whether the stack top has to be reduced
or the next input symbol has to be shifted.
parameter → id.
At this stage, the id on the stack has to be reduced but since more than one
production is there with id on it’s left, it lade to a confusion.
Precedence Relations:
Consider:
E−> E op E | id
op−> + |*
Relation Meaning
id + * $
id ⋅> ⋅> ⋅>
+ <⋅ ⋅> <⋅ ⋅>
* <⋅ ⋅> ⋅> ⋅>
$ <⋅ <⋅ <⋅ ⋅>
Precedence Table
Example: The input string:
Basic Principle:
Having precedence relations allows identifying handles as follows:
1. Scan the string from left until seeing ⋅ > and put a pointer.
2. Scan backwards the string from right to left until seeing < ⋅
3. Everything between the two relations < ⋅ and ⋅ > forms the handle
Repeat: Let b be the top stack symbol, a the input symbol pointed to by ip
if (a is $ and b is $)
return
else
if a ⋅ > b or a = ⋅ b then
repeat
else error
end
The operator precedence parsers usually do not store the precedence table with
the relations; rather they are implemented in a special way.
1. Create functions fa for each grammar terminal a and for the end of string
symbol.
2. Partition the symbols in groups so that fa and gb are in the same group if
a = ⋅ b (there can be symbols in the same group even if they are not connected
by this relation).
SYNTAX ANALYSIS 2.49
3. Create a directed graph whose nodes are in the groups, next for each symbols
a and b do: place an edge from the group of gb to the group of fa if a < ⋅ b,
otherwise if a ⋅ > b place an edge from the group of fa to that of gb.
4. If the constructed graph has a cycle then no precedence functions exist. When
there are no cycles collect the length of the longest paths from the groups of
fa and gb respectively.
id + * $
g f id
id
f* g*
g+ f+
f$ g$
Fig. 2.19
2.50 COMPILER DESIGN
id + ∗ $
f 4 2 4 0
g 5 1 3 0
2.9. LR PARSER:
★ LR passing algorithm.
Input a1 ...... ai . . . . . . a $
n
Stack
Sm
LR
Xm Parsing Output
Program
Sm-1
X m-1
.
.
action goto
S0
Fig. 2.20
Xi − grammar symbol
Si − state symbol ⇒ summarizes the information contained in the stack below it.
★ Combination of state (sm) symbol contained in stack & (ai) current i/p symbol
are used to index the parsing table & determine shift − reduce parsing
decision [ sm , ai ].
• accept
• error
★ Function goto takes state & grammar symbol as and produces a state.
( S0 X1 s1 X2 s2 ⁄ … Xm sm , ai ai + 1 … an $)
stack content unexpended input
(S0 X1 S1 X2 S2 … Xm Sm ai S, ai + 1 … an $ )
Parser has shifted both current symbol ai & next state S which is given in
action [ sm , ai ] on to the stack; ai + 1 becomes the current i/p symbol.
( S0 X1 S1 X2 S2 … Xm − r Sm − r As , ai ai + 1 … an $ )
Here passers first popped 2r symbols of the (r state symbols) & r grammar
symbols, exposing state sm − r. The parser then pushed both A, the left side
of production & S, the entry for goto [ sm − r , A] onto stack.
(iv) If action [sm , ai ] = error, the passer has discovered an error & calls an error
recovery routine.
Input − Input string w & LR parsing table with for action and goto
for a grammar G.
Output − If w is in LCG
Method − Initially
parser has so initial state on its state & I/P buffer has w $
SYNTAX ANALYSIS 2.53
end
end
return
− else error ( )
end
Follow (T) = { *, $, +, ) }
Follow (F) = { *, $, +, ) }
Grammar:
(1) E → E + T
(2) E → T
(3) T → T ∗ F
(4) T → F
(5) F → (E)
(6) F → id
2.54 COMPILER DESIGN
action goto
State
id + * ( ) $ E T F
0 S5 S4 1 2 3
1 S6 acc
2 r2 S7 r2 r2
3 r4 r4 r4 r4
4 S5 S4 8 2 3
5 r6 r6 r6 r6
6 S5 S4 9 3
7 S5 S4 10
8 S6 S11
9 r1 S7 r1 r1
10 r3 r3 r3 r3
11 r5 r5 r5 r5
0 id ∗ id + id $ shift
0 id 5 ∗ id + id $ reduce F → id
0F3 ∗ id + id $ reduce T → F
OT 2 ∗ id + id $ shift
OT 2 * 7 id + id $ shift
OT 2 * 7 id 5 + id $ reduce F → id
SYNTAX ANALYSIS 2.55
OT 2 * 7 F10 + id $ reduce T → T * F
OT 2 + id $ reduce E → T
OE 1 + id $ shift
OE 1 + 6 id $ shift
OE 1 + 6 id 5 $ reduce F → id
OE 1 + 6 F3 $ reduce T → F
OE 1 + 6 F 9 $ reduce E → E + T
O E 1 $ accept
Parse tree
T T
F
F F
id * id + id
Fig. 2.21
★ LR parsers doesn’t have to scan the enter stack to know where the handle
appears on top. Rather than the state symbol on top of the stack contains
all the into it needs.
Comparison of LR Parsers:
LR(0)
CLR LALR SLR
Fig. 2.22
LR Parser:
This is a bottom-up syntax analysis technique that can be used to parse a class
of CFG’s. LR parser is actually called LR(K) parsing where,
S0 X1 S1 X2 … Xm − 1 Sm − 1 Xm Sm
where,
where,
3. Parsing Table:
1. Shift S
2. Reduce by a production A → β
3. Accept.
4. Error.
This takes a state and a grammar symbol as arguments & produces a state.
4. Configuration:
(S0 X1 S1 X2 … Xm Sn , ai ai + 1 … an $ )
Initial configuration:
(S0 , w $ )
5. Actions:
★ action [ Sm , ai ] = shift S
Shift ai onto the stack along with state ‘S’. The configuration becomes,
[S0 X1 S1 X2 … Xn Sm ai S, ai + 1 … an $ ]
★ Action [ Sm , ai ] = Reduce A → β
[S0 X1 S1 X2 … Xm − r Sm − r A S, ai ai + 1 , … an $ ]
2.58 COMPILER DESIGN
where,
S = goto [Sm − r , A]
r = length of β
★ action [ Sm , ai ] = Accept
parsing is completed
★ action [ Sm , ai ] = Error
LR (0) item:
An LR (0) item of a grammar ‘G’ is a production of G with a dot at some position
of the right side.
A → • XYZ
A → X • YZ
A → XY • Z
A → XYZ •
★ Item can be represented by pair of integers, first giving the no. of production
& second the position of the dot.
E′ → E
E → E +T⁄T
T → T∗ F⁄F
F → (t) ⁄ id
If I is the set of one item { [ E′ → • E] }, then closure (I) contains the items
constructed from I by the 2 rules.
2. See augmented.
Rule 1:
E′ → • E
Rule 2:
E → •E+T
E → •T
T → •T∗F
T → •F
F → • (E)
F → • id
Computation of closure:
begin
J=I
repeat
return J
end
★ Kernel items
• Include the initial item, S′ → • S and all items whose dots are not at the
left end.
★ Non kernel items → which have there dots at the left end.
goto operation:
E → E+•T
T → •T∗F
T → •F
F → • (E)
F → • id
SYNTAX ANALYSIS 2.61
begin
c = { closure ( { [ S′ → • S ] } ) } ;
repeat
end
I0 : E′ → • E
E → •E+T
E → •T
T → •T∗F
T → •F
F → • (E)
F → • id
E′ → E •
E → E•+T
E → T•
T → T•∗F
2.62 COMPILER DESIGN
I3 goto (I0 , F)
T → F•
I4 goto (I0 , C)
F → ( • E)
E → •E+T
E → •T
T → •T∗F
T → •F
F → • (E)
F → • id
F → id ⋅
I6 goto (I1 , $)
E → E+•T
T → •T∗F
T → •F
F → • (E)
F → • id
I7 goto ( I2 , ∗ )
T → T∗•F
F → • (E)
F → • id
I8 goto ( I4 , E)
F → (E •)
E → E•+T
SYNTAX ANALYSIS 2.63
I9 goto (I6 , T)
E → E+T•
T → T•∗F
I10 goto ( I7 , F)
T → T∗F•
I11 goto ( I8 , )
F → (E) •
to
id
I5
to
F C
I4
to
F I3
to
I0 E I1 + I6 T I9 * I7
T * F
I2 I7 I10
to
(
I4
to
F id
I5
I3
C
( E )
I4 I8 I11
id to
+ I6
id
I5 to
T
I2
to
F
I3
Fig. 2.23
2.64 COMPILER DESIGN
goto (I7 , id ) = I5
goto (I4 , T) = I2
goto (I4 , F) = I3
goto (I4 , C) = I4
goto (I6 , F) = I3
goto (I6 , C) = I4
goto (I7 , C) = I4
goto (I8 , + ) = I6
goto I9 , ∗ = I7
Output − The SLR parsing table functions goto & action for G′
Method
2. State i is constructed from Ii. The parsing actions for state i are determined
as follows:
If any conflicting actions are generated by the above rules, then the grammar
is not SLR(1).
3. The goto transactions for state i are constructed for all nonterminals A using
the rule.
goto [i, A] = j
4. All entries not defined by rules (2) & (3) are made “error”.
5. The initial state of the parser is the one constructed from the set of items
containing [S′ → • S].
1. S → L = R
2. S → R
3. L → ∗ R
4. L → id
5. R → L
Alegmented Grammar S′ → S
S′ → • S
S → •L=R
S → •∗R
L → • id
R → •L
2.66 COMPILER DESIGN
I0 S′ → • S
S → •L=R
S → •R
L ↑ •∗R
L → • id
R ↑ • L
I1 goto (I0 , S)
S′ → S •
I2 goto (I0 , L)
S → L•=R
R → L•
I3 goto (I0 , R)
S → R•
I4 goto (I0 , ∗)
L → ∗•R
R → •L
L → •∗R
L → • id
L → id •
I6 goto (I2 , = )
S → L=•R
R → •L
L → •∗R
L → • id
SYNTAX ANALYSIS 2.67
I7 goto (I4 , R)
L → ∗R•
I8 goto (I4 , L)
R → L•
goto (I4 , ∗) = I4
goto (I4 , id) = I5
I9 goto (I6 , R)
S → L=R•
goto (I6 , L) = I8
goto (I6 , ∗) = I4
goto (I6 , id) = I5
Follow (S) = { $ }
Follow (L) = { = , $ }
Follow (R) = { $ , = }
action goto
States
= * id $ S L R
0 S4 S5 1 2 3
1 accept
2 S6 ⁄ r5 r5
3 r2
4 S4 S5 8 7
5 r4 r4
6 S4 S5 8 9
7 r3 r3
8 r5 r5
9 r1
action [2, = ] is S6 ⁄ r5
2.68 COMPILER DESIGN
Even though the given grammar is not ambiguous, tare is a shift − reduce
conflict SLR parser is not powerful enough.
Tutorial:
S → As⁄b
A → s A⁄a
Construct SLR parse table for grammar. Show the actions of the parser for the
i/p string “abab”.
★ If A → α • is there in the itemset then the set of i/p symbols that can follow
a handle α for which there is a possible reduction to A will be added to
items in item set.
Closure (I)
item (G′):
Same as SLR but initially
C = { closure ( { [S′ → • S, $ ] } ) };
•
Eg:
1. S → CC
2. C → bc
3. C → d
I0
S′ → • S, $
S → • CC, $
C → • bc, b ⁄ d
C → • d, b ⁄ d
I1 goto (I0 , S)
S′ → S • , $
C → • bc, b ⁄ d
C → • d, b ⁄ d
I1 goto (I0 , S)
S′ → S • , $
I2 goto (I0 , C)
S → c•C,$
C → • bc, $
C • • d, $
2.70 COMPILER DESIGN
I3 goto (I0 , b)
C → b • C, b ⁄ d
C → • bC , b ⁄ d
C → • d,b⁄d
I4 goto (I0 , d)
C → d •,b⁄d
I5 goto (I2 , C)
S → cC • , $
C → b • C, $
C → • bC, $
C → • d, $
I7 goto (I2 , d)
C → d•,$
I8 goto (I3 , C)
C → bC • , b ⁄ d
goto (I3 , b) = − I3
goto (I3 , d) = I4
2.9.3. LALR:
LALR Parser:
★ The CLR parser avoids conflicts in the parse table. But it produces more
no. of states when compared to SLR parser. Hence it occupies more space.
★ So LALR parser can be used. Here the tables detained are smaller than
CLR parse tables & also efficient as CLR parsers.
SYNTAX ANALYSIS 2.71
★ LR(1) items that have same productions but different lookaheads are
combined to form a single set of items (It means that these items result in
the same state of the DFA)
eg:
I4 = goto (I0 , d)
={ C→ d•,b⁄d}
I7 = goto (I2 , d)
={C→d•,$}
I3 = goto (I0 , b)
= { → b • C, b ⁄ d
C → • bc, b ⁄ d
C → • d, b ⁄ d }
I6 = goto (I2 , b)
= { C → b • C, $
C → • bc , $
C → • d, $ }
I8 = goto (I3 , C)
= { C → bc • , b ⁄ d }
I9 = goto (I6 , C)
= { C → bc • , $
Combined as I89
2.72 COMPILER DESIGN
action goto
State
b d $ S C
0 S36 S47 1 2
1 acc
2 S36 S47 5
36 S36 S47 89
47 r3 r3 r3
5 r1
89 r2 r2 r2
★ This merger of states can never produce shift-reduce conflict. However it can
produce a reduce-reduce conflict.
2. For each core production present in LR(1) items, find all sets having
same core & replace these sets by their union.
5. All other entries are error entries Construct SLR, CLR, LALR parser
for the grammar given below.
S → a ^ (R)
T → S, T ⁄ S
R → T
It’s a time to compare SLR, LALR and LR parser for the common factors such
as size, class of CFG, efficiency and cost in terms of time and space.
Sr.
SLR parser LALR parser Canonical LR parser
No.
5. It requires less time The time and space The time and space
and space complexity. complexity is more in complexity is more for
LALR but efficient canonical LR parser.
methods exist for
constructing LALR
parsers directly.
2.74 COMPILER DESIGN
LR (1)
LALR (1)
SLR
LR (0)
★ It must recover to parse the rest of the input and check for subsequent
errors.
★ For one line input, the routine yyparse () can be made to return 1 on error
and then calls yyparse() again.
★ When the parser finds an error, it may need to reclaim parse tree storage,
delete or alter symbol table entries and set switches to avoid generating
further output.
★ Error handling routines are used to restart the parser to continue its process
even after the occurrence of error.
★ The YACC command uses a special token name error, for error handling.
The token is placed at places where error might occur so that it provides a
recovery subroutine.
SYNTAX ANALYSIS 2.75
★ The above rule tells the parser that when there is an error, it should ignore
the token and all following tokens until it finds the next semicolon.
★ It discards all the tokens after the error and before the next semicolon.
★ Once semicolon is found, the rule is reduced by parser and cleanup action
associated with that rule will be performed.
This statement leaves the error state and begins processing normally.
yyerrok;
intput
$$ = $4;
};
2.76 COMPILER DESIGN
★ When an error occurs, the lookahead token becomes the token at which the
error was detected.
★ The lookahead token must be changed if the error recovery action includes
code to find the correct place to start processing again.
★ To clear the lookahead token, the error-recovery action issues the following
statement: yyclearin;
2.12. YACC:
★ Translates any grammar that describes a language into a parser for that
language.
★ • y extension
T → T, s ⁄ S
T → ST′
T′ → , ST′ ⁄ ∈
first (s) = { a, ↑ , c }
first (T) = { a, ↑ , c }
first (T′) = { , , ∈ }
follow (T) = { ) }
follow (T′) = { ) }
( T )
S T
a ST
at
Fig. 2.25
I/P symbol
Non-terminal a ↑ ( ) , $
S S→a S→↑ S → (T)
T T → ST′ T → ST′ T → ST′
T′ T′ → ∈ T′ → , ST′
2.78 COMPILER DESIGN
$)T a, a ) $ $ ) T′ a a)$
$ ) T′ S a, a ) $ T → ST′ $ ) T′ )$
$ ) T′ a a, a ) $ S→a $) )$ T′ → ∈
$ ) T′ ,a)$ $ $
R → T
augment grammar
S′ → s
1. S → a S→a
2. S → ^ S→^
3. S → (R) S → (R)
4. T → S, T T → S, T
5. T → S T→S
6. R → T R →T
Consider grammar
S → a Sb S ⁄ bS aS ⁄ ∈
ababbaab
S ⇒ a Sb S ⇒ a b S a S b S ⇒ a b a Sb Sa SbS
lm lm lm
⇒
lm
SYNTAX ANALYSIS 2.79
(i) Manual generation of a parser takes much time & hence to automate the
construction, so many built − in tools are available.
(ii) YACC is one such tool used to generate parser (syntax analyzer) for a variety
of languages it generates LALR parsing table.
YACC Compiler
YACC y.tab.c
specification
(translate.y)
C Compiler
y.tab.c a.out
a.out
input syntax tree
string
Fig. 2.26
Declarations
% %
Translation rules
% %
Supporting procedures
(v) Declarations:
< left side > ’ < alt 1 > / < alt 2 > / ... / < alt n >
. . . . . . .
2. Non-Terminals: unquoted strings of letters and digits that have not been
declared to be a token.
3. If a production has more than one alternative on its right hand side, they
should be separated by a vertical bar.
5. The symbol on the left hand side of first translation rule should be the start
symbol.
*********
CHAPTER – III
INTERMEDIATE CODE
GENERATION
★ Annotated parse tree − parse tree showing values of attributes at each node.
b: = f (c1 , c2 , … ck)
↓
function
3.2. EITHER:
b depends on c1 , c2 … ck
L → En print (E • val)
• Evaluated bottom up
• Adapts LR parser generater
Annotated parse tree for 3 * 5 + 4n = 19
L
n
E.val = 19
+ T.val = 4
E.val = 15
F.val = 4
T.val = 15
digit.lexval = 4
T.val = 3 * F.val = 5
F.val = 3 digit.lexval = 5
digit.lexval = 3
Fig. 3.1
eg:
Syntax directed definition with inherited attributed L • in
D − Declaration
T − Type
in − inherited attribute
D → TL L • in : = T • type
L → L1 , id L1 • in : = L • in
real ) id3
L.in = real
id1
Fig. 3.2
★ For each node n in the parse tree do for each attribute a of the grammar
symbol at n do
for i = 1 to k do
Construct an edge from the node
A•a
X•x Y•y
Fig. 3.3
★ X • i = g (A • a, Y • y) semantic rule for A → XY
X.i A.a
E val
E1 E2
val val
+
Fig. 3.5
3.6 COMPILER DESIGN
4 5
type in
T L 6
3 entry
in 7 id3
real L
8 )
9
in L
10
) id2 2 entry
1 entry
id1
Fig. 3.6
2. id2 • entry
6. add type (id3 • entry, a5) → gives valid order in which semantic rules can
be evaluated.
7. a7 = a5
9. a9 = a7
★ The order in which the nodes in a parse tree are considered may not match
the order in which information about a construct becomes available.
★ Syntax tree
• Condensed form of parse tree
• Useful for representing language constructs.
• S → if B then S1 else S2.
if - then - else
B S1 S2
Fig. 3.7
• Operators & keywords don’t appear as leaves
n 4
E *
E + T 3 5
T F
T * F
F-3 4
5
Fig. 3.8
• Constructing syntex tree for expressions.
3.8 COMPILER DESIGN
P2 = mk leaf (num, 4)
P3 = mk node ( − , P1 , P2)
P5 = mk node ( + , P3 , P4)
_ id
to entry for C
id num 4
to entry for a
Fig. 3.9
INTERMEDIATE CODE GENERATION 3.9
a−4+C
E nptr
E nptr T nptr
+
E nptr
T nptr id
_
+
num
T nptr
_ id
id
to entry for C
id num 4
to entry for a
Fig. 3.10
General representation: a = b op c
Where a, b or c represents operands like names, constants or compiler generated
temporaries and op represents the operator.
t1 = b + c
t2 = uminus t1
t3 = a ∗ t2
a[i] = x * 5;
i=1
L : t1 = x ∗ 5
t2 = & a
t3 = sizeof(int)
t4 = t3 ∗ i
t5 = t2 + t4
t5 = t1
i=i+1
if i < = 10 goto L
1. Quadruple
2. Triples
3. Indirect Triples
INTERMEDIATE CODE GENERATION 3.11
1. Quadruple:
It is structure with consist of 4 fields namely op, argl, arg2 and result. op
denotes the operator and arg1 and arg2 denotes the two operands and result is used
to store the result of the expression.
Operator
Source 1
Source 2
Destination
Advantage:
t1 = uminus c
t2 = b ∗ t1
t3 = uminus c
t4 = b ∗ t3
t5 = t2 + t4
a = t5
2. Triples:
Operator
Source 1
Source 2
Disadvantage:
# Op Arg1 Arg2
(0) uminus c
(1) * (0) b
(2) uminus c
(3) * (2) b
(5) = a (4)
3. Indirect Triples:
Example: Write quadruple, triples and indirect triples for following expression:
(x + y) ∗ (y + z) + (x + y + z)
t1 = x + y
t2 = y + z
t3 = t1 ∗ t2
t4 = t1 + z
t5 = t3 + 14
(1) + x y t1
(2) + y z t2
(3) * t1 t2 t3
(4) + t1 z t4
(5) + t3 t4 t5
# Op Arg1 Arg2
(1) + x y
(2) + y z
(4) + (1) z
1. S → id: = E
2. E → E1 + E2
3. E → E1 * E2
4. E → (E1)
5. E → id
INTERMEDIATE CODE GENERATION 3.15
S → id : =E {p = look_up(id.name);
If p ≠ nil then
Emit (p = E.place)
Else
Error;
}
E → E1 + E2 {E.place = newtemp();
Emit (E.place = E1.place ‘+’ E2.place)
}
E → E1 * E2 {E.place = newtemp();
Emit (E.place = E1.place ‘+’ E2.place)
}
E → id {p = look_up(id.name);
If p ≠ nil then
Emit (p = E.place)
Else
Error;
}
Boolean expressions
Boolean expressions have two primary purposes. They are used for computing
the logical values. They are also used as conditional expression using if-then-else or
while-do.
Consider the grammar
1. E → E OR E
2. E → E AND E
3. E → NOT E
3.16 COMPILER DESIGN
4. E → (E)
5. E → id relop id
6. E → TRUE
7. E → FALSE
The AND and OR are left associated. NOT has the higher precedence then AND and
lastly OR.
E → E1 OR E2 {E.place = newtemp();
Emit {E.place ‘:=’ E1 place ‘OR’ E2.place}
}
E → E1 + E2 {E.place = newtemp();
Emit {E.place ‘:=’ E1.place ‘AND’ E2.place}
}
‘The EMIT function is used to generate the three address code and the
newtemp( ) function is used to generate the temporary variables.
INTERMEDIATE CODE GENERATION 3.17
The E → id relop id2 contains the next_state and it gives the index of next
three address statements in the output sequence.
Here is the example which generates the three address code using the above
translation scheme:
3. 101: t1:=0
5. 103: t1:=1
7. 105: t2:=0
9. 107: t2:=1
The goto statement alters the flow of control. If we implement goto statements
then we need to define a LABEL for a statement. A production can be added for
this purpose:
1. S → LABEL : S
2. LABEL → id
In this production system, semantic action is attached to record the LABEL and its
value in the symbol table.
3.18 COMPILER DESIGN
1. S → if E then S
2. S → if E then S else S
3. S → while E do S
4. S → begin L end
5. S → A
6. L → L ; S
7. L → S
2. S → if E then M S else M S
3. S → while M E do M S
4. S → begin L end
5. S → A
6. L → L ; M S
7. L → S
8. M → ∈
9. N → ∈
INTERMEDIATE CODE GENERATION 3.19
S → A S.NEXT = makelist ()
L → S L.NEXT = S.NEXT
M → ∈ M.QUAD = NEXTQUAD
Postfix Translation
In a production A → α, the translation rule of A.CODE consists of the concatenation
of the CODE translations of the non-terminals in α in the same order as the
non-terminals appear in α. Production can be factored to achiee postfix form.
C → W E do C W E do
1. S for L = E1 step E2 to E3 do S1
Can be factored as
1. F → for L
2. T → F = E1 by E2 to E3 do
3. S → T S1
Array references in arithmetic expressions
Elements of arrays can be accessed quickly if the elements are stored in a block
of consecutive location. Array can be one dimensional or two dimensional.
For one dimensional array:
1. A: array [low..high] of the ith elements is at:
2. base + (i-low)*width → i*width + (base - low*width)
Multi-dimensional arrays:
Row major or column major forms
★ Row major: a[1,1], a[1,2], a[1,3], a[2,1], a[2,2], a[2,3]
★ Column major: a[1,1], a[2,1], a[1,2], a{2,2}, a[1,3], a[2,3]
★ In raw major form, the address of a[i1, i2] is
★ Base+((i1-low)*(high2-low2+1)+i2-low2)*width
The production:
1. S → L : = E
2. E → E+E
3. E → (E)
INTERMEDIATE CODE GENERATION 3.21
4. E → L
5. L → Elist]
6. L → id
7. Elist → Elist, E
8. Elist → id
Where:
ndim denotes the number of dimensions.
limit(array, i) function returns the upper limit along with the dimension of array
Procedures call
Procedure is an important and frequently used programming construct for a
compiler. It is used to generate good code for procedure calls and returns.
Calling sequence:
The translation for a call includes a sequence of actions taken on entry and
exit from each procedure. Following actions take place in a calling sequence:
★ When a procedure call occurs then space is allocated for activation record.
★ Evaluate the argument of the called procedure.
★ Establish the environment pointers to enable the called procedure to access
data in enclosing blocks.
★ Save the state of the calling procedure so that it can resume execution after
the call.
★ Also save the return address. It is the address of the location to which the
called routine must transfer after it is finished.
★ Finally generate a jump to the beginning of the code for the called procedure.
Let us consider a grammar for a simple procedure call statement
1. S → call id (Elist)
2. Elist → Elist, E
3. Elist → E
3.4. DECLARATIONS:
When we encounter declarations, we need to lay out storage for the declared
variables. For every local name in a procedure, we create a ST(Symbol Table) entry
containing:
The production:
1. D → integer, id
2. D → real, id
3. D → D1, id
ENTER is used to make the entry into symbol table and ATTR is used to trace the
data type.
Case Statements:
switch E
begin
case V1:S1
case V2:S2
3.24 COMPILER DESIGN
default: Sn
end
goto NEXT
goto NEXT
goto NEXT
goto NEXT
TEST: if T = V1 goto L1
if T = V2 goto L2
goto
INTERMEDIATE CODE GENERATION 3.25
NEXT:
★ When switch keyword is seen then a new temporary T and two new labels
test and next are generated.
★ When the case keyword occurs then for each case keyword, a new label Li
is created and entered into the symbol table. The value of Vi of each case
constant and a pointer to this symbol-table entry are placed on a stack.
2. Incremental Translation
The rest of this chapter explores issues that arise during the translation of
ex-pressions and statements. We begin in this section with the translation of
ex-pressions into three-address code. An expression with more than one operator, like
a + b ∗ c, will translate into instructions with at most one operator per in-struction.
An array reference A [i] [j] will expand into a sequence of three-address instructions
that calculate an address for the reference. We shall consider type checking of
expressions and the use of boolean expressions to direct the flow of control through
a program.
S → id = E: S code = E.code
gen{top.get (id.lexeme) ‘=’ E.addr}
( E1 ) E.addr = E1.addr
E.code = E1.code
id E.addr = top.get(id.lexeme)
E.code = ’’
For convenience, we use the notation gen (x ‘=’ y ‘+’ z) to represent the
three-address instruction x = y + z. Expressions appearing in place of variables like x,
y, and z are evaluated when passed to gen, and quoted strings like are taken literally
5. Other three-address instructions will be built up similarly 5 In syntax-directed
INTERMEDIATE CODE GENERATION 3.27
definitions, gen builds an instruction and returns it. In translation schemes, gen
builds an instruction and incrementally emits it by putting it into the stream by
applying gen to a combination of expressions and strings.
When we translate the production E → Ei+E2, the semantic rules build up E.code
by concatenating Ei.code, E2.code, and an instruction that adds the values of E1 and
E2. The instruction puts the result of the addition into a new temporary name for
E, denoted by E.addr.
t1 = minus c
t2 = b + t1
a = t2
2. Incremental Translation
Code attributes can be long strings, so they are usually generated incrementally,
instead of building up E.code, we can arrange to generate only the new three-address
instructions, as in the translation scheme. In the incremental approach, gen not only
constructs a three-address instruction, it appends the instruction to the sequence of
instructions generated so far. The sequence may either be retained in memory for
further processing, or it may be output incrementally.
The approach can also be used to build a syntax tree. The new semantic action
for E → E1 + E2 creates a node by using a constructor, as in of generated instructions.
E → Ei + E2 {E.addr = new Node(“+’, E1.addr, E2.addr); }
Here, attribute addr represents the address of a node rather than a variable or
constant.
( E1 ) { E.addr = E1.addr; }
id { E.addr = top.get(id.lexeme); }
where base is the relative address of the storage allocated for the array. That is,
base is the relative address of A [0].
The expressions (1) and (6) can be both be rewritten as i x w + c, where the
subexpression c = base − low xw can be precalculated at compile time.
Note that c = base when low is 0. We assume that c is saved in the symbol
table entry for A, so the relative address of A [i] is obtained by simply adding
i × w to c.
Compile-time precalculation can also be applied to address calculations for
elements of multidimensional arrays; However, there is one situation where we cannot
use compile-time precalculation: when the array’s size is dynamic. If we do not know
the values of low and high (or their generalizations in many dimensions) at compile
time, then we cannot compute constants such as c. Then, formula like (6) must be
evaluated as they are written, when the program executes.
The above address calculations are based on row-major layout for arrays, which
is used in C and Java. A two-dimensional array is normally stored in one of two
forms, either row-major (row-by-row) or column-major (column-by-column). The layout
of a 2 × 3 array A in (a) row-major form and (b) column-major form. Column-major
form is used in the Fortran famil of languages.
A[1, 1] A[1, 1]
First row A[1, 2] A[2, 1] First column
A[1, 3] A[1, 2]
Second column
A[2, 1] A[2, 2]
Second row A[2, 2] A[1, 3]
Third column
A[2, 3] A[2, 3]
(a) Row Major (b) Column Major
The chief problem in generating code for array references is to relate the address
calculation formulas in Section to a grammar for array references. Let nonterminal
L generate an array name followed by a sequence of index expressions.
L → L [ E ] id [ E ]
Let us calculate addresses based on widths, using the formula (3), rather than
on numbers of elements, as in (5). The translation scheme generates three-address
code for expressions with array references. It consists of the productions and semantic
actions, together with productions involving nonterminal L.
id { E.addr = top.get(id.lexeme); }
L { E.addr = new Temp();
gen(E.addr ‘=’ L.array.base ‘[’ L.addr ‘]’); }
L → id [ E ] { L.array = top.get(id.lexeme);
L.type = L.array.type.elem;
L.addr = new Temp ();
gen(L.addr ‘=’ E.addr ‘*’ L.type.width); }
L1 [ E ] { L.array = L1.array;
L.type = L1.type.elem;
t = new Temp ();
L.addr = new Temp ();
gen(t ‘=’ E.addr ‘∗’ L.type.width); }
gen(L.addr ‘=’ L1.addr ‘+’ t); }
1. L.addr denotes a temporary that is used while computng the offset for the array
reference by summing the terms in × Wj in (3).
L.array is a pointer to the symbol-table entry for the array name. The base
address of the array, say, L.array.base is used to determine the actual.
l-value of an array reference after all the index expressions are analyzed.
L.type is the type of the subarray generated by L. For any type t, we assume
that its width is given by t.width. We use types as attributes, rather than widths,
since types are needed anyway for the checking. For any array type t, suppose that
t.elem gives the element type.
A type checker verifies that the type of a construct matches that expected by
its context. For example: arithmetic operator mod in Pascal requires integer operands,
so a type checker verifies that the operands of mod have type integer. Type
information gathered by a type checker may be needed when code is generated.
Type Systems
The design of a type checker for a language is based on information about the
syntactic constructs in the language, the notion of types, and the rules for assigning
types to language constructs.
For example: “If both operands of the arithmetic operators of +, − and * are of
type integer, then the result is of type integer”.
1. Basic types such as boolean, char, integer, real are type expressions.
A special basic type, type_error, will signal an error during type checking;
void denoting “the absence of a value” allows statements to be checked.
Constructors include:
Arrays: If T is a type expression then array (I,T) is a type expression denoting the
type of an array with elements of type T and index set I.
Records: The difference between a record and a product is that the names. The
record type constructor will be applied to a tuple formed from field names and field
types.
For example:
address: integer;
end;
Declares the type name row representing the type expression record ((address
X intege) X (lexeme X array(1..15,char))) and the variable table to be an arra of
records of this type.
For example, var p: ↑ row declares variable p to have type pointer (row).
4. Type expressions may contain variables whose values are type expressions.
X Pointer
char char
integer
Fig. 3.22: Tree representation for char x char → pointer (integer)
Type systems
Checking done by a compiler is said to be static, while checking done when the
target program runs is termed dynamic. Any check can be done dynamically, if the
target code carries the type of an element along with the value of that element.
A sound type system eliminates the need for dynamic checking of allows us to
determine statically that these errors cannot occur when the target program runs.
That is, if a sound type system assigns a type other than type_error to a program
part, then type errors cannot occur when the target code for the program part is
run.
A language is strongly typed if its compiler can guarantee that the programs
it accepts will execute without type errors.
Error Recovery
Since type checking has the potential for catching errors in program, it is
desirable for type checker to recover from errors, so it can check the rest of the
input. Error handling has to be designed into the type system right from the start;
the type checking rules must be prepared to cope with errors.
INTERMEDIATE CODE GENERATION 3.35
Coercions:
For example:
int xyz,p;
p=(float)xyz;
The identifier xyz is type-casted and this is how explicit conversion from int to
float takes place.
All conversions in Ada are explicit while C supports implicit conversions. (It
converts ASCII characters to integers in arithmetic expression.)
3.36 COMPILER DESIGN
Type checking rules for coercion from integer to float are as given below:
E → num E.type:=integer
E → num.num E.type:=float
E → id E.type:=look_up(id.entry)
E → E1 op E2 E.type:=if E1.type= int and E2.type = int
then int
else if E1.type = int and E2.type = float
then float
else if E1.type = float and E2.type = int
then float
else if E1.type = float and E2.type = float
then float
else
type_error
}
The necessary conversion from int to float takes place implicitly. The function
look_up returns the type saved in the symbol table for the corresponding id entry.
Consider an array A of floats that can be initialized to 1 as follows:
A[i] = 1;
A[i] = 1.0;
This code takes 5.4 nanoseconds to execute. Since the implicit conversion is done
at compile time there is a great improvement to the run time of the object program.
Thus semantic analysis is a phase in which analysis of declarative statements
is done. The analysis of declarative statement involves two activities (i) Type analysis,
name and scope analysis (ii) Entry of type, length, access control information in
symbol table. We have seen how a type for corresponding statement is decided.
*********
CHAPTER – IV
RUN-TIME ENVIRONMENT
AND CODE GENERATION
4.1. INTRODUCTION:
Run time environments (or) systems:
Manage activation of procedures
↓ into
execution
★ Primary issues
• is recursion possible?
• is parameter passing mechanisms?
(call by reference, call by value)
Activation tree:
★ Each node represents an activation of a procedure.
★ root − activation of main program.
★ node for ‘a’ is a parent to node for ‘b’ iff control flows from activation ‘a’ to
activation ‘b’.
★ node for ‘a’ is to left of the node for ‘b’ iff life time of ‘a’ occurs before life
time of ‘b’.
Control stack: Keep track of currently active activations
Even if a name is declared once, the same name can denote different data
objects at run time.
Environment: Function mapping from names (x) to storage locations (s). The
association is a binding (i.e. x is bound to S)
State: Function mapping storage locations to the values held in those locations.
eg: pi = 3.14
Procedures:
begin
for i : = 1 to 9 do read(a[i])
end;
Activation trees:
An activation tree is used to depict the way control enters and leaves activations.
In an activation tree,
3. The node for a is the parent of the node for b if and only if control flows
from activation a to b.
4. The node for a is to the left of the node for b if and only if the lifetime of
a occurs before the lifetime of b.
Control stack:
A control stack is used to keep track of live procedure activations. The idea is
to push the node for a activation onto the control stack as the activation begins and
to pop the node when the activation ends. The contents of the control stack are
related to paths to the root of the activation tree. When node n is at the top of
control stack, the stack contains the nodes along the path from n to the root.
4.4 COMPILER DESIGN
1. Static allocation − lays out storage for all data objects at compile time.
3. Heap allocation − allocates and deal locates storage as needed at run time
from a data area known as heap.
That is, when control returns to a procedure the values of the locals are the
same as they were when control left the last time. From the type of a name, the
compiler decides the amount of storage for the name and decides where the activation
records go. At compile time, we can fill in the addresses at which the target code
can find the data it operates on.
All compilers for languages that use procedures, functions or methods as units
of user-defined actions manage at least part of their run-time memory as a stack.
Each time a procedure is called, space for its local variables is pushed onto a stack,
and when the procedure terminates, that space is popped off the stack.
Calling sequences:
When designing calling sequences and the layout of activation records, the
following principles are helpful:
4.6 COMPILER DESIGN
★ Values communicated between caller and callee are generally placed at the
beginning of the callee’s activation record, so they are as close as possible
to the caller’s activation record.
Fixed length items are generally − link, and the machine placed in the middle.
Such i the control link, the access status fields.
Items − Whose size may not be known early enough are placed at the end of
the activation record. The most common example is dynamically sized array, where
the value of one of the callee’s parameters determines the length of the array.
The calling sequence and its division between caller and callee are as follows:
★ The caller stores a return address and the old value of top_sp into the
callee’s activation record. The caller then increments the top_sp to the
respective positions.
★ The callee saves the register values and other status information.
★ Using the information in the machine-status field, the callee restores top_sp
and other registers, and then branches to the return address that the caller
placed in the status field.
★ Although top_sp has been decremented, the caller knows where the return
value is, relative to the current value of top_sp; the caller therefore may
use that value.
The run-time memory management system must deal frequently with the
allocation of space for objects, the sizes of which are not known at the compile time,
but which are local to a procedure and thus may be allocated on the stack. The
reason to prefer placing objects on the stack is that we avoid the expense of garbage
RUN-TIME ENVIRONMENT AND CODE GENERATION 4.7
collecting their space. The same scheme works for objects of any type if they are
local to the procedure called and have a size that depends on the parameters of the
call.
4.4.3. Heap Allocation:
Stack allocation strategy cannot be used if either of the following is possible:
Heap allocation parcels out pieces of contiguous storage, as needed for activation
records or other objects. Pieces may be deal located in any order, so over the time
the heap will consist of alternate areas that are free and in use.
Position in the
Activation records in the heap Remarks
activation tree
S Retained
S activation
record for r
control link
r q(1, 9)
control link
q(1, 9)
control link
Fig. 4.3: Records for live activations need not be adjacent in heap
★ Therefore, the record for the new activation q (1, 9) cannot follow that for s
physically.
★ If the retained activation record for r is deal located, there will be free space
in the heap between the activation records for s and q.
4.8 COMPILER DESIGN
★ Block
★ Lexical scope
• Without nested procedures
• With nested procedures
★ Dynamic Scope
Block
★ A block is a statement containing its own local data declarations.
★ In C, a block has the syntax
{Declarations statements}
★ A characteristics of blocks is their nesting structure.
★ Delimiters mark the beginning and end of a block.
★ Delimiters ensure that one block is either independent of another, or is
nested inside the other.
Blocks in C Program Declaration Scope
main() int a = 0; B0 − B2
{ int b = 0; B0 − B1
int a = 0;
int b = 1; B1 − B3
int b = 0;
{ int a = 2; B2
int b = 1; int b = 3; B3
{
int a = 2;
B2 printf(“%d%d\n”, a, b);
B0 }
B1 {
B3 int b = 3;
print(“%d%d\n”, a, b);
}
print(“%d %d\n”, a, b);
}
print(“%d %d\n”, a, b);
}
RUN-TIME ENVIRONMENT AND CODE GENERATION 4.9
Nesting depth:
★ The nesting depth of a procedure is used to implement lexical scope.
★ Let the name of the main program be at nesting depth 1; and 1 to the
nesting depth as we go from an enclosing to an enclosed procedure.
Access links:
S S S S
a, x a, x a, x a, x
q(1, 9) q(1, 9) q(1, 9) q(1, 9)
access link access link access link access link
k, v k, v k, v k, v
(a) q(1, 3) q(1, 3) q(1, 3)
access link access link access link
k, v k, v k, v
(b) p(1, 3) p(1, 3)
access link access link
i, j i, j
(c) e(1, 3)
access link
(d)
Fig. 4.4
4.10 COMPILER DESIGN
Dynamic Scope:
★ Deep access with access links and use the control link to search into stack,
looking for the first activation record containing storage for the nonlocal
name.
★ The term deep access comes from the fact that the search may go deep into
the stack.
★ The depth to which the search may go depends on the input to the program
and cannot be determined at compile time.
Shallow access:
★ The idea is to hold the current value of each name in statically allocated
storage.
★ The previous value of n can be saved in the activation record for p and
must be restored when the activation of p ends.
★ Local variables
★ Parameters / temporaries
★ Return address
★ Returned value
RUN-TIME ENVIRONMENT AND CODE GENERATION 4.11
eg:
int g-var;
main ( )
int a [100];
int sum = 0;
sum + = a [i] ;
return avg;
}
int sum = 0;
return st;
}
4.12 COMPILER DESIGN
→ Pass by value
→ Pass by reference
→ Pass by name
Pass by value:
★ Copy of the value of the actual parameter is passed onto the formal
parameter.
★ Caller copies the r-value of the actual into the called method’s activation
record.
void f (int a)
a = 10 ;
void g ( )
int b;
b = 5;
f (b);
}
RUN-TIME ENVIRONMENT AND CODE GENERATION 4.13
Pass by reference:
★ Call by − address
call - by - location.
:
:
f ( & b)
:
:
void f (int * a)
:
:
* a ~=~ 10;
:
:
}
★ The side effects of the actuals donot have l-values being changed can be
avoided in this method.
4.14 COMPILER DESIGN
Pass by name:
★ Every call statement is replaced by the body of the called method.
★ Each occurrence of a formal parameter in the called method is replaced with
the corresponding argument. It is replaced by the actual text of the
argument, not its value.
Code Generation:
The final phase in compiler model is the code generator. It takes as input an
intermediate representation of the source program and produces as output an
equivalent target program. The code generation techniques presented below can be
used whether or not an optimizing phase occurs before code generation.
symbol table
2. Target program
3. Memory management
4. Instruction selection
5. Register allocation
6 Evaluation order
RUN-TIME ENVIRONMENT AND CODE GENERATION 4.15
(e) Prior to code generation, the front end must be scanned, parsed and translated
into intermediate representation along with necessary type checking.
Therefore, input to code generation is assumed to be error-free.
2. Target program:
The output of the code generator is the target program. The output may be: (a)
Absolute machine language.
★ if i > j, the jump is forward. We must store on a list for quadruple i the
location of the first machine instruction generated for quadruple j. When i
is processed, the machine locations for all instructions that forward jumps
to i are filled.
4. Instruction selection:
★ Instruction speeds and machine idioms are important factors when efficiency
of target program is considered.
★ The quality of the generated code is determined by its speed and size.
★ The former statement can be translated into the latter statement as shown
below:
a:=b+c
d:=a+e (a)
MOV b,R0
ADD c,R0
MOV R0,a (b)
MOV a,R0
ADD e,R0
MOV R0,d
5. Register allocation:
★ Instructions involving register operands are shorter and faster than those
involving operands in memory. The use of registers is subdivided into two
subproblems:
3. Certain machine requires even-odd register pairs for some operands and
results. For example, consider the division instruction of the form: D x, y
6. Evaluation order
The order in which the computations are performed can affect the efficiency of
the target code. Some computation orders require fewer registers to hold intermediate
results than others.
op source, destination
★ The source and destination fields are not long enough to hold memory
addresses, so certain bit patterns in these fields specify that words following
an instruction contain operands and/or addresses.
★ The contents (a) denotes the contents of the register or memory address
represented by a.
absolute M M 1
register R R 0
MOV R0, M
★ Indirect versions of the last two modes are indicated by prefix *. Thus,
MOV * 4(R0), M
contents(contents(4 + contents(R0)))
literal #c c l
★ The cost of an instruction is one plus the costs associated with the source
and destination address modes.
★ For most instructions, the time taken to fetch an instruction from memory
exceeds the time spent executing the instruction.
★ Some examples:
If the name in a register is no longer needed, then we remove the name from
the register and the register can be used to store some other names.
Next-use Information:
2. In the symbol table, set x to “not live” and “no next use”.
Symbol Table
y Live i
z Live i
(or)
ADD c, Ri Cost = 2
(or)
MOV c, Rj Cost = 3
ADD Rj, Ri
★ An address descriptor stores the location where the current value of the
name can be found at run time.
A code-generation algorithm:
1. Invoke a function getreg to determine the location L where the result of the
computation y op z should be stored.
2. Consult the address descriptor for y to determine y’, the current location of
y. Prefer the register for y’ if the value of y is currently both in memory and
a register. If the value of y is not already in L, generate the instruction MOV
y’, L to place a copy of y in L.
3. Generate the instruction OP z’, L where z’ is a current location of z. Prefer
a register to a memory location if z is in both. Update the address descriptor
of x to indicate that x is in location L. If x is in L, update its descriptor and
remove x from all other descriptors.
4. If the current values of y or z have no next uses, are not live on exit from
the block, and are in registers, alter the register descriptor to indicate that,
after execution of x : = y op z, those registers will no longer contain y or z.
a : = ∗p MOV *Rp, a 2
*p : = a MOV a, *Rp 2
Statement Code
x:=y+z MOV y, R0
*********
CHAPTER – V
CODE OPTIMIZATION
★ The optimization must be correct, it must not, in any way, change the
meaning of the program.
★ The optimization process should not delay the overall compiling process.
When to Optimize?
Optimization of the code is often performed at the end of the development stage
since it reduces readability and adds code that is used to increase the performance.
(ii) x = 12.4
y = x ⁄ 2.3
2. Variable Propagation:
//Before Optimization
c=a∗b
x=a
till
d=x∗b+4
//After Optimization
c=a∗b
x=a
till
d=a∗b+4
Hence, after variable propagation, a*b and x*b will be identified as common
sub-expression.
CODE OPTIMIZATION 5.3
c=a∗b
x=a
till
d=a∗b+4
//After elimination:
c=a∗b
till
d=a∗b+4
4. Code Motion:
while(a > 0)
b = x + y;
if (a % b = 0)}
printf(“%d”, a);
a = 200;
b = x + y;
while (a>0)
if (a % b = 0)
printf(“%d”, a);
}
5.4 COMPILER DESIGN
★ Strength reduction means replacing the high strength operator by the low
strength.
i = 1;
while (i<10)
y = i ∗ 4;
//After Reduction
i=1
t=4
while (t<40)
y = t;
t = t + 4;
Function-Preserving Transformations:
There are a number of ways in which a compiler can improve a program without
changing the function it computes.
CODE OPTIMIZATION 5.5
★ For example
t1: = 4*i
t2: = a [t1]
t3: = 4*j
t4: = 4*i
t5: = n
t6: = b [t4] + t5
The above code can be optimized using the common sub-expression elimination
as
t1: = 4*i
t2: = a [t1]
t3: = 4*j
t5: = n
t6: = b [t1] + t5
The common sub expression t4: = 4*i is eliminated as its computation is already
in t1 and the value of i is not been changed from definition to use.
5.6 COMPILER DESIGN
A = x∗r∗r;
The optimization using copy propagation can be done as follows: A = Pi∗r∗r;
Here the variable x is eliminated.
5.2.4. Strength Reduction:
Strength in reduction replaces expensive operations by equivalent cheaper ones
on the target machine. Certain machine instructions are considerably cheaper than
others and can often be used as special cases of more expensive operators. For
example, x2 is invariably cheaper to implement as x∗x than as a call to an
exponentiation routine. Fixed-point multiplication or division by a power of two is
cheaper to implement as a shift. Floating-point division by a constant can be
implemented as multiplication by constant, which may be cheaper.
B1
i : = m-1
j :=n
t1 : = 4*n
v : = a[t1 ]
B2
i : = i+1
t2 : = 4*1
t3 : = a[t2]
if t 3 < v goto B2
B3
j : = j-1
t4 : = 4*j
t5 : = a[t4]
if t 5 > v goto B3
B4
if i>=j goto B6
B5 B6
x : = t3 x : = t3
a[t2 ] : = t5 t14 : = a[t1]
a[t4 ] : = x a[t2] : = t 14
goto B2 a[t 1 ] : = x
Example:
i = 0;
if(i=1)
a=b+5;
Here, ‘if’ statement is dead code because this condition will never get satisfied.
Constant folding:
Deducing at compile time that the value of an expression is a constant and
using the constant instead is known as constant folding. One advantage of copy
propagation is that it open turns the copy statement into dead code.
For example,
B1
i : = m-1
j :=n
t1 : = 4*n
v : = a[t1 ]
B2
i : = i+1
t2 : = 4*1
t3 : = a[t2]
if t 3 < v goto B2
B3
j : = j-1
t4 : = 4*j
t5 : = a[t4]
if t 5 > v goto B3
B4
if i>=j goto B6
B5 B6
t 6 : = 4*1 t 11 : = 4*1
x : = a[t6 ] x : = a[t11]
t 7 : = 4*i t 12 : = 4*i
t8 : = 4*j t13 : = 4*n
t9 : = a[t8 ] t14 : = a[t13]
a [t7] : = t 9 a [t12] : = t14
t10 : = 4*j t15 : = 4*n
a [t10] : = x a [t15] : = x
goto B2
Code Motion:
t=limit-2;
Induction Variables:
Loops are usually processed inside out. For example consider the loop around
B3. Note that the values of j and t4 remain in lock-step; every time the value of j
decreases by 1, that of t4 decreases by 4 because 4*j is assigned to t4. Such identifiers
are called induction variables.
When there are two or more induction variables in a loop, it may be possible
to get rid of all but one, by the process of induction-variable elimination. For the
inner loop around B3 in Fig. 5.3 we cannot get rid of either j or t4 completely; 14
is used in B3 and j in B4.
Example:
As the relationship t4: =4*j surely holds after such an assignment to t4 in Fig.
and t4 is not changed elsewhere in the inner loop around B3, it follows that just
after the statement j:−j−1 the relationship t4:=4∗j−4 must hold. We may therefore
replace the assignment t4:=4∗j by t4:=t4−4. The only problem is that t4 does not have
a value when we enter block B3 for the first time. Since we must maintain the
relationship t4=4∗j on entry to the block B3, we place an initializations of t4 at the
end of the block where j itself is initialized, shown by the dashed addition to block
B1 in Fig. 5.1
A simple but effective technique for improving the target code is peep-hole
optimization, a method for trying to improving the performance of the target program
by examining a short sequence of target instructions (called the peep-hole) and
replacing these instructions by a shorter or faster sequence, whenever possible.
5.10 COMPILER DESIGN
The peephole is a small, moving window on the target program. The code in
the peep-hole need not be contiguous, although some implementations do require this.
It is characteristic of peep-hole optimization that each improvement may spawn
opportunities for additional improvements.
★ Redundant-instructions elimination
★ Flow-of-control optimizations
★ Algebraic simplifications
★ Unreachable.
1. MOV Ro,a
2. MOV a,R0
We can delete instructions (2) because whenever (2) is executed. (1) will ensure
that the value of a is already in register R0. If (2) had a label we could not be sure
that (1) was always executed immediately before (2) and so we could not remove (2).
Unreachable Code:
#define debug 0
....
If ( debug ) {
}
CODE OPTIMIZATION 5.11
One obvious peephole optimizationj is to eliminate jumps over jumps. Thus no matter
what the value of debug; (a) can be replaced by:
If debug ≠ 1 goto L2
If debut ≠0 goto L2
By goto L2. Then all the statement that print debugging aids are manifestly
unreachable and can be eliminated one at a time.
Flows-of-Control Optimizations:
The unnecessary jumps can be eliminated in either the intermediate code or the
target code by the following types of peep-hole optimizations. We can replace the
jump sequence.
goto L1
....
by the sequence
goto L2
....
L1: goto L2
If there are now no jumps to L1, then it may be possible to eliminate the statement
L1: goto L2 provided it is preceded by an unconditional jump. Similarly, the sequence
5.12 COMPILER DESIGN
if a < b goto L1
....
can be replaced by
If a < b goto L2
....
L1: goto L2
goto L1
may be replaced by
If a < b goto L2
goto L3
......
L3:
While the number of instructions in (e) and (f) is the same, we sometimes skip
the unconditional jump in (f), but never in (e). Thus (f) is superior to (e) in execution
time
Algebraic Simplification:
x : = x + 0 or
x:=x∗1
Reduction in Strength:
X2 → X∗X
i:=i+1 → i++
i:=i−1 → i− −
★ Directed acyclic graph (DAGs) are useful data structures for implementing
transformations on basic blocks.
• determining which names used inside the block but evaluated outside the
block, and
★ A DAG for a basic block is a directed acyclic graph with the following labels
on nodes:
1. Leaves are labeled by unique identifiers, either variable names or
constants.
2. Interior nodes are labeled by an operator symbol.
3. Nodes are also optionally given a sequence of identifiers for labels.
★ Each node of a flow graph can be represented by a dag, since each node of
the flow graph stands for a basic block.
Example:
(1) t1 : = 4 ∗ i
(2) t2 : = a [ t1 ]
(3) t3 : = 4 ∗ i
(4) t4 : = b [ t3 ]
(5) t5 : = t2 ∗ t4
(6) t6 : = prod + t5
(7) prod ; = t6
(8) t7 : = i + 1
(9) i : = t7
t5
prod0 *
t4
[] t2 [] (1)
<=
t 1, t3
* + t7 , i
20
a b
4 i0 1
Output: A DAG for the basic block containing the following information:
1. A label for each node. For leaves the label is an identifier, and for interior
nodes, an operator symbol.
METHOD:
★ Assume function node (identifier), returns the most recently created node
associated with identifier.
★ The DAG construction process is to do the following steps (1) through (3)
for each statement of the block.
★ Initially, assume there are no nodes and node is undefined for all arguments.
BACKPATCHING:
★ The main problem with generating code in a single pass is that during one
single pass, the labels that control must go at the time of jump statements
are generated is not known.
★ Each such statement will be put on a list of goto statements whose labels
will be filled in when the proper label can be determined. This
subsequent filling in of labels is called as
Backpatching:
★ In this section:
★ How backpatching can be used to generate code for boolean expressions and
flow-of-control statements in one pass.
1. makelist(i) creates a new list containing only i, an index into the array
of quadruples; makelist returns a pointer to the list it has made.
2. merge(p1, p2) concatenates the list pointed to by p1 and p2, and returns
a pointer to the concatenated list.
Boolean Expressions:
★ The grammar:
(1) E → E1 or M E2
(2) E1 and M E2
(3) not E
(4) (EI)
(6) true
(7) false
(8) M→ε
★ As code is generated for E, jumps to the true and false exits are left
incomplete, with the label field unfilled.
E → E1 and M E2
5.18 COMPILER DESIGN
★ If E1 is true, the target for the statements E1-truelist must be the beginning
of the code generated for the statements E2.
{ M.quad : = nextquad }
where the variable nextquad holds the index of the next quadruple to follow:
★ This value will be backpatched onto the E1. trulist for the remainder of the
production E1 and E2.
★ The semantic action (5) generates two statements, a conditional goto and an
unconditional one.
★ The index of the first generated statement is made into a list, and E.truelist
is given a pointer to that list.
★ The second generated statement goto_ is also made into a list and given to
E.falselist.
★ Structure-Preserving Transformations.
★ Algebraic Transformations.
Structure-Preserving Transformations:
Common sub expressions need not be computed over and over again. Instead
they can be computed once and kept in store from where it’s referenced.
Example:
(1) a: = b + c
(2) b: = a − d
(3) c: = b + c
(4) d: = a − d
The 2nd and 4th statements compute the same expression: b + c and a − d
a: = b+c
b: = a−d
c: = a
d: = b
It is possible that a large amount of dead (useless) code may exist in the
program. This might be especially caused when introducing variables and procedures
as part of construction or error-correction of a program − once declared and defined,
one forgets to remove them in case they serve no purpose. Eliminating these will
definitely optimize the code.
CODE OPTIMIZATION 5.21
★ Two statements
t1:=b+c
t2:=x+y
can be interchanged or reordered in its computation in the basic block when value
of t1 does not affect the value of t2.
Algebraic Transformations:
The relational operators <=, > =, <, >, + and = sometimes generate unexpected
common sub expressions. Associative laws may also be applied to expose common
sub expressions. For example, if the source code has the assignments
a :=b+c
e :=c+d+b
t :=c+d e :=t+b
5.22 COMPILER DESIGN
Example:
★ To efficiently optimize the code compiler collects all the information about
the program and distribute this information to each block of the flow graph.
This process is known as data-flow graph analysis.
★ For this kind of optimization user defined chaining is one particular problem.
★ Here using the value of the variable, we try to find out that which definition
of a variable is applicable in a statement.
Based on the local information a compiler can perform some optimizations. For
example, consider the following code:
1. x = a + b;
2. x = 6 ∗ 3
★ In this code, the first assignment of x is useless. The value computer for x
is never used in the program.
★ At compile time the expression 6*3 will be computed, simplifying the second
assignment statement to x = 18;
CODE OPTIMIZATION 5.23
Some optimization needs more global information. For example, consider the following
code:
1. a = 1;
2. b = 2;
3. c = 3;
4. if (....) x = a + 5;
5. else x = b + 4;
6. c = x + 1;
In this code, at line 3 the initial assignment is useless and x + 1 expression can be
simplified as 7. But it is less obvious that how a compiler can discover these facts
by looking only at one or two consecutive statements. A more global analysis is
required so that the compiler knows the following things at each point in the program:
Data flow analysis is used to discover this kind of property. The data flow
analysis can be performed on the program’s control flow graph (CFG).
m := b + c
d := m
4. Replace statement s by a := m.
Step 1
t5 : = 4 ∗ k
t6 : = a[t5]
CODE OPTIMIZATION 5.25
Step 2 and 3
m:=4∗k (12)
t1 : = m
t2 : = a [t1] (15)
Step 4
(12) : = 4 * k
(15) : = a[(12)]
t5 : = (12)
t6 : = (15)
The assignment in the form a:=b is called copy statement. The idea behind the
copy propagation transformation is to use b for a wherever possible after copy
statement a: = b. Let us see the algorithm for copy propagation.
The flow graph should also consists of a set of copies d: = y that reach block B
along every path and there should not be any change to either x or y along that
path. We also need du chains (i.e. definition and used chain) so that use of every
definition can be obtained.
3. If s satisfies the condition mentioned in step (2) then remove s and replace
all uses of x found in (1) by y.
For example:
Step 1 and 2
a[t1] : = t2
a[t4] : = x Use
y:=x+3 Use
a[t5] : = y
Since value of t3 and x is not altered along the path from its definition we will
replace x by t3 and then eliminate the copy statement.
CODE OPTIMIZATION 5.27
x : = t3
a[t1] : = t2
a[t1] : = t2
a[t4] : = t3
a[t4] : = t3
Eliminating
Copy statement
y : = t3 + 3 y : = t3 + 3
a[t5] : = y a[t5] : = y
For example:
While eliminating induction variables first of all we have to identify all the
induction variables.
a:=i∗b
a:=b∗i
a:=i±b
a:=b±i
(j, b, 0). We will understand this concept of writing triple with the help of block.
In the block B2 has basic induction variable i because i gets incremented each
time of loop L by 1. The family of i contains t2 because there is an assignment
t2 : = 4 ∗ i. Hence the triple for t2 is
B2
i : = i+1
t2 : = 4*1
t3 : = a[t2]
if t 3 < 10 goto B 2
(i, 4, 0)
Method:
1. Find the induction variable i with triple (i, c, d). Consider a test form
t:=c∗x
t:=t+d
if j relop x goto B
Finally delete all assignments to the eliminated induction variables from the
loop L because these induction variables will be useless.
For example
B1
i :=m+3
k :=n
t1 : = 4*n
v : = u[t1 ]
B2
i : = i+1 Here I in B2
t2 : = 4*1 and k in B3 are
t3 : = a[t2] two induction variables
because their values get
if t 3 < v goto B2 changed at each iteration,
we will create now
B3 temporary variables
k:=k-1 r and r to which
Induction variables
t4 : = 4*k
i and k are assigned
t5 : = B[t4] + 10
if t 5 < v goto B3
B4
if - i >= k goto B 6
B5 B6
Fig. 5.5
5.30 COMPILER DESIGN
B1
i :=m+3
k :=n
t1 : = 4*n
v : = a[t1 ]
r1: = 4*I
r2: = 4*k } Note these
newly introduced variables
B2
i : = i+1 Note that now we
r1: = r1 + 4 can simply perform
t2 : = r1 r1 + 4 to get effect
t3 : = a[t2] of next value
if t3 < v goto B2 of 4 * i
B3
k:=k-1 Note that now we
r2: = r2-4 can simply perform
t4 : = r2 r2 + 4 to get effect
t5 : = a[t4 ] + 10 of next value
if t 5 < v goto B 3 of 4 * k
B4
if - i >= k goto B 6
B5 B6
*********
LABORATORY
EX. NO: 01
DATE:
AIM:
ALGORITHM:
Step 6 : Separate all the file contents as tokens and match it with the
functions.
Step 7 : Define all the keywords in a separate file and name it as key.c
Step 8 : Define all the operators in a separate file and name it as open.c
Step 10 : Finally print the output after recognizing all the tokens.
PROGRAM:
#include<stdio.h>
#include<conio.h>
#include<ctype.h>
#include<string.h>
void main()
{
FILE *fi,*fo,*fop,*fk;
int flag=0,i=1;
char c,t,a[15],ch[15], file[20];
clrser();
printf(“\n Enter the File Name:”);
scanf(“%s”,&file);
fi=fopen(file, “r”);
fo=fopen(“inter.c”, “w”);
fop=fopen(“oper.c”,"r");
fk=fopen(“key.c”,"r");
c=gete(fi);
while(!feof(fi))
{
if(isalpha(c)||isdigit(c)||(c==‘[’||c==‘]’||c==‘.’==]))
fputc(c,fo);
else
{
if(c==‘\n’)
fprintf(fo,"\t$\t");
else fprintf(fo,"\t%c\t",c);
}
c=gete(fi);
}
LABORATORY 3
fclose(fi);
fclose(fo);
fi=fopen(“inter.c”,"r");
printf(“\n Lexical Analysis”);
fscanf(fi,"%s",a);
printf(“\n Line: %d\n”,i++);
while (!feof(fi))
{
if(stremp(a,"$")==0)
{
printf(“\n Line: %d \n”,i++);
fscanf(fi,"%s",a);
}
fscanf(fop,"%s",ch);
while(!feof(fop))
{
if(strcmp(ch,a)==0)
{
fscanf(fop,"%s",ch);
printf(“\t\t%s\t:\t%s\n”,a,ch);
flag=1;
} fscanf(fop, “%s”,ch);
}
rewind(fop);
fscanf(fk,"%s",ch);
while(!feof(fk))
{
if(strcmp(ch,a)==0)
{
fscanf(fk,"%k",ch);
4 COMPILER DESIGN
printf(“\t\t%s\t:\tKeyword\n”,a);
flag=1;
}fscanf(fk,"%s",ch);
}
rewind(fk);
if(flag==0)
{
if(isdigit(a[0]))
printf(“\t\t%s\t:\tConstant\n”,a);
else
printf(“\t\t%s\t:\tIdentifier\n”,a);
}flag=0;
fscanf(fi,"%s",a); }
getch();
}
Key.C:
int
void
main
char
if
for
while
else
printf
scanf
FILE
Include
stdio.h
conio.h
iostream.h
LABORATORY 5
Oper.C:
( open para
) closepara
{ openbrace
} closebrace
< lesser
> greater
“ doublequote ‘ singlequote
: colon
; semicolon
# preprocessor
= equal
== assign
% percentage
^ bitwise
& reference
* star
+ add
− sub
\ backslash
/ slash
Input.C:
#include “stdio.h”
#include “conio.h”
void main()
{
int a=10,b,c;
a=b*c;
getch();
}
6 COMPILER DESIGN
OUTPUT:
LEXICAL ANALYSIS
Line : 1
# : preprocessor
include : keyword
“ : doublequote
stdio.h : keyword
“ : doublequote
Line : 2
# : preprocessor
include : keyword
“ : doublequote
stdio.h : keyword
“ : doublequote
Line : 3
void : keyword
main : keyword
( : openpara
) : closepara
Line : 4
( : openrace
Line : 5
int : keyword
a : identifier
= : equal
LABORATORY 7
10 : constant
. : identifier
b : identifier
. : identifier
c : identifier
: : semicolon
Line : 6
a : identifier
= : equal
b : identifier
* : star
c : identifier
; : semicolon
Line : 7
getch : identifier
( : openpara
) : closepara
; : semicolon
Line : 8
) : closebrace
Line : 9
$ : identifier
Result:
Thus the above program for developing the lexical the lexical analyzer and
recognizing the few pattern s in C is executed successfully and the output is verified.
*********
8 COMPILER DESIGN
EX. NO: 02
DATE:
ALGORITHM:
Step 1 : Start the program for performing inset, display, delete, search and
modify option in symbol table.
Step 3 : Enter the choice for performing the operations in the symbol Table.
Step 4 : If the entered choice is 1, search the symbol table for the symbol to
be inserted. If the symbol is already present, it displays “Duplicate
Symbol”. Else, insert the symbol and the corresponding address in the
symbol table.
Step 5 : If the entered choice is 2, the symb ols present in the symbol table
are displayed.
Step 7 : If it is not found in the symbol table it displays “Label Not found”.
Else, the symbol is deleted.
PROGRAM CODE:
#include<stdio.h>
#include<ctype.h>
#include<stdlib.h>
#include<string.h>
#include<math.h>
void main()
{
LABORATORY 9
int i=0,j=0,x=0,n;
void *p,*add[5];
char ch,srch,b[15],d[15],c;
printf(“Expression terminated by $:”);
while((c=getchar())!=‘$’)
{
b[i]=c;
i++;
}
n=i-l;
printf(“Given Expression:”);
i=0;
while(i<=n)
{
printf(“%c”,b[i]);
i++;
}
printf(“\n Symbol Table\n”);
printf(“Symbol \t addr \t type”);
while(j<=n)
{
c=b[j];
if(isalpha(toascii(c)))
{
p=malloc(c);
add[x]=p;
d[x]=c;
printf(“\n%c \t%d \t identifier\n”,c,p);
x++;
j++;
10 COMPILER DESIGN
}
else
{
ch=c;
if(ch==‘+’||ch==‘-’||‘∗’||ch==‘=’)
{
p=malloc(ch);
add[x]=p;
d[x]=ch;
printf(“\n %c \t%d \t operator\n”,ch,p);
x++;
j++;
}}}}
OUTPUT:
Expression terminated by $:A+B+C=D$
Given Expression: A+B+C=D
Symbol Table
RESULT:
Thus the program for symbol table has been executed successfully.
*********
LABORATORY 11
EX. NO: 03
DATE:
AIM:
To write a program for implementing a Lexical analyzer using LEX tool in Linux
platform.
ALGORITHM:
Step 1 : Lex program contains three sections: definitions, rules, and user
subroutines. Each section must be separated from the others by a line
containing only the delimiter, %%. The general format for LEX tool is
as follows: definitions %% rules %% user_subroutines.
Step 2 : In definition section, the variables make up the left column, and their
definitions make up the right column. Any C statements should be
enclosed in %{..}%. Identifier is defined such that the first letter of an
identifier is alphabet and remaining letters are alphanumeric.
Step 3 : In rules section, the left column contains the pattern to be recognized
in an input file to yylex(). The right column contains the C program
fragment executed when that pattern is recognized. The various
patterns are keywords, operators, new line character, number, string,
identifier, beginning and end of block, comment statements,
preprocessor directive statements etc.
Step 4 : Each pattern may have a corresponding action, that is, a fragment of
C source code to execute when the pattern is matched.
Step 5 : When yylex() matches a string in the input stream, it copies the
matched text to an external character array, yytext, before it executes
any actions in the rules section.
Step 6 : In user subroutine section, main routine calls yylex(). yywrap() is used
to get more input.
Step 7 : The lex command uses the rules and actions contained in file to
generate a program, lexyy.c, which can be compiled with the cc
command. That program can then receive input, break the input into
the logical pieces defined by the rules in file, and run program
fragments contained in the actions in file.
12 COMPILER DESIGN
PROGRAM:
%{
int COMMENT=0;
%}
identifier [a-zA-Z][a-zA-Z0-9]*
%%
#.*{printf(“\n%s is a preprocessor directive”,yytex);}
int |
float |
char |
double |
while |
for |
struct |
typedef |
do |
if |
break |
continue |
void |
switch|
return |
else |
goto {printf(“\n\t%s is a keyword”,yytext);}
“/*” {COMMENT=1;} {printf(“\n\t%s is a COMMENT”,yytext);}
{identifier}\( {if(!COMMENT)printf(“\nFUNCTION\n\t%s”,yytext);}
\{ {if(!COMMENT)printf(“\n BLOCK BEGINS”);}
\} {if(!COMMENT)printf(“BLOCK BEGINS”);}
{identifier} (\[[0-9]*\])? {if(!COMMENT) printf(“\n %s IDENTIFIER”,yytext);}
\".*\" {if(!COMMENT)printf(“\n\t %s is a STRING”,YYTEXT);}
LABORATORY 13
\)(\:)? {if(!COMMENT)printf(“\n\t”);ECHO;printf(“\n”);}
\( ECHO;
= {if(!COMMENT)printf(“\n\t %s is an ASSIGNMENT OPERATOR”,yytext);}
\<= |
\>= |
\< |
== |
%%
int main(int argc, char **argv)
FILE *file;
file=fopen(“output.c”,"r");
if(!file)
exit(0);
}
yyin=file;
yylex();
printf(“\n”);
return(0);
int yywrap()
{
return(1);
}
14 COMPILER DESIGN
INPUT:
/*output.c*/
#include<stdio.h>
#include<conio.h>
void main()
{
int a,b,c;
a=1;
b=2;
c=a+b;
printf(“Sum:%d”,c);
}
OUTPUT:
To Compile and Run:
lex filename.l
cc filename.yy.c
./a.out
BLOCK BEGINS
int is a keyword
a IDENTIFIER,
b IDENTIFIER,
c IDENTIFIER;
LABORATORY 15
a IDENTIFIER
= is an ASSIGNMENT OPERATOR
1 is a NUMBER ;
b IDENTIFIER
= is an ASSIGNMENT OPERATOR
2 is a NUMBER ;
c IDENTIFIER
= is an ASSIGNMENT OPERATOR
a IDENTIFIER.
b IDENTIFIER;
FUNCTION
printf(
“SUMIED” is a STRING,
c IDENTIFIER
BLOCK ENDS
RESULT:
Thus the program for implementation of Lexical Analyzer using LEX tool has
been executed successfully.
*********
16 COMPILER DESIGN
EX. NO: 04
DATE:
AIM:
ALGORITHM:
PROGRAM:
LEX:
%{
#include<stdio.h>
#include “y.tab.h”
extern int yylval;
%}
%%
[0-9]+ {
yylval=atoi(yytext);
return NUMBER;
}
[\t];
[\n] return 0;
. return yytext[0];
%%
int yywrap()
{
return 1;
}
YACC:
%{
#include<stdio.h>
int flag=0;
%}
%token NUMBER
%left ‘+’ ‘-’
%left ‘*’‘/’‘%’
%left ‘(’’)’
%%
18 COMPILER DESIGN
ArithmeticExpression: E{
printf(“\nResult=%d\n”,$$);
return 0;
};
E:E‘+’E {$$=$1+$3;}
|E‘-’E {$$=$1-$3;}
|E‘*’E {$$=$1*$3;}
|E‘/’E {$$=$1/$3;}
|E‘%’E {$$=$1%$3;}
|‘(’E‘)’ {$$=$2;}
| NUMBER {$$=$1;};
%%
void main()
{
printf(“\nEnter Any Arithmetic Expression which can have operations Addition,
Subtraction, Multiplication, Division, Modulus and Round brackets:\n”);
yyparse();
if(flag==0)
printf(“\nEntered arithmetic expression is Valid\n\n”);
}
void yyerror()
{
printf(“\nEntered arithmetic expression is Valid\n\n”);
}
void yyerror()
{
printf(“\nEntered arithmetic expression is Invalid\n\n”);
flat=1;
}
LABORATORY 19
OUTPUT:
Enter Any Arithmetic Expression which can have operations Addition, subtraction.
Multiplication, Divison, Modulus and Round brackets:
((5+6+10+4+5)/5)×2
Result=0
virusovirus.desktop:-/Desktop/syedviruss ./a.out
Enter Any Arithmetic Expression which can have operations Addition, Subtraction,
Multiplication, Division, Modulus and Round brackets:
(9=0)
Result:
The above C program to implement a calculator using LEX and YACC was
successfully executed and verified.
*********
20 COMPILER DESIGN
EX. NO: 05
DATE:
AIM:
To Generate three address code for a simple program using Lex & YACC
ALGORITHM:
Step 4 : If match found then convert it into char and store it in yylval.p where
p is pointer declared in YACC.
PROGRAM:
%{
#include
#include
#include “y.tab.h”
%}
%%
LABORATORY 21
%type S
%type E
%%
S : E {printf(“x=%cn”,%%);}
;
E : NUM {}
| E ‘+’ E {p++; printf(“n %c = %c + %c”,p,$1,$3);$$=p;}
| E ‘-’ E {p++; printf(“n %c = %c - %c”,p,$1,$3);$$=p;}
| E ‘*’ E {p++; printf(“n %c = %c * %c”,p,$1,$3);$$=p;}
| E ‘/’ E {p++; printf(“n %c = %c / %c”,p,$1,$3);$$=p;}
| ‘(’E‘)’ {$$=p;}
| ‘-’ E %prec UMINUS {p++;printf(“n %c = -%”,p,$2);$$=p;}
;
%%
OUTPUT:
Enter Expression x => 1+2-3*3/1+4*5
A = 1+2
B = 3*3
C = B/1
D = A-C
E = 4*5
F = D+E
X = F
[a40@localhost ~]$ ./a.out
Enter Expression x => 1+2*(3+4)/5
A = 3+4
B = 2*A
C = B/5
D = 1+C
X = D
LABORATORY 23
RESULT:
The above program to Generate three address code for a simple program using
Lex & Yacc was successfully executed and verified.
*********
24 COMPILER DESIGN
EX. NO: 06
DATE:
ALGORITHM:
1. Start
9. Perform these steps 5 to 8 for all the input symbols in the file.
11. Get the operand before the operator from the three address code.
12. Check whether the operand is used in any other expression in the three
address code.
13. If the operand is not used, then eliminate the complete expression from the
three address code else go to 14.
14. Perform steps 11 to 13 for all the operands in the three address code till end
of file is reached.
15. Stop.
LABORATORY 25
PROGRAM:
#include<stdio.h>
#include<conio.h>
#include<string.h>
struct op
{
char 1;
char r[20];
}
op[10],pr[10];
void main()
{
int a,i,k,j,n,z=0,m,q;
char *p,*l;
char temp,t;
char *tem;
clrscr();
printf(“Enter the Number of Values:”);
scanf(“%d”,&n);
for(i=0;i<n;i++)
{
printf(“left: ”);
op[i].1=getche();
printf(“\tright: ”);
scanf(“%s”,op[i].r);
}
printf(“Intermediate Code\n”);
for(i=0;i<n;i++)
{
26 COMPILER DESIGN
printf(“%c=”,op[i].l);
printf(“%s\n”,op[i].r);
}
for(i=0;i<n-1;i++)
{
temp=op[i].l;
for(j=0;j<n;j++)
{
p=strchr(op[j].r,temp);
if(p) {
pr[z].l=op[i].l;
strcpy(pr[z].r,op[i].r);
z++;
}
}
}
pr[z].l=op[n-1].l;
strepy(pr[z].r,op[n-1].r);
z++;
print(“\nAfter Dead Code Elimination\n”);
for(k=0;k<z;k++) {
printf(“%c\t=”,pr[k].l);
printf(“%s\n”,pr[k].r);
}
for(m=0;m<z;m++) {
tem=pr[m].r;
for(j=m+1;j<z;j++)
{
p=strstr(tem,pr[j].r);
if(p)
LABORATORY 27
{
t=pr[j].1;
pr[j].l=pr[m].l;
for(i=0;i<z;i++)
{
l=strchr(pr[i].r,t) ;
if(l)
{
a=1-pr[i].r;
printf(“pos: %d”,a);
pr[i].r[a]=pr[m].l;
} }
} }
}
printf(“Eliminate Common Expression\n”);
for(i=0;i<z;i++) {
printf(“%c\t=”,pr[i].l);
printf(“%s\n”,pr[i].r);
}
for(i=0;i<z;i++) {
for(j=i+1;j<z;j++) {
q=stremp(pr[i].r,pr[j].r);
if((pr[i].l==pr[j].l)&&!q) {
pr[i].l=‘\0’;
strcpy(pr[i].r,‘\0’);
}
}
}
printf(“Optimized Code\n”);
for(i=0;i<z;i++) {
28 COMPILER DESIGN
if(pr[i].1!=‘\0’) {
printf(“%c=”,pr[i].l);
printf(“%s\n”,pr[i].r);
}
}
getch();
}
OUTPUT:
Intermediate Code
a=9
b=c+d
e=c+d
f=b+e
r=f
e =c+d
f =b+c
r =f
LABORATORY 29
Optimized Code
b=c+d
f=b+b
r=f
*********
30 COMPILER DESIGN
EX. NO: 07
DATE:
AIM:
To implement the back end of the compiler which takes the three address code
and produces the 8086 assembly language instructions that can be assembled and
run using a 8086 assembler. The target assembly instructions can be simple move,
add, sub, jump. Also simple addressing modes are used.
ALGORITHM:
Step 2 : Open the source file and store the contents as quadruples.
Step 4 : Write the generated code into output definition of the file in outp.c
PROGRAM:
#include<stdio.h>
#include<stdio.h>
//#include<conio.h>
#include<string.h>
void
{ main()
LABORATORY 31
char icode[10][30],str[20],opr[10];
int i=0;
//clrscr();
printf(“\n Enter the set of intermediate code (terminated by exit):\n”);
do
{
scanf(“%s”,icode[i]);
} while(strcmp(icode[i++], “exit”)!=0);
printf(“\n target code generation”);
printf(“\n****************”);
i=0;
do
{
strcpy(str,icode[i]);
switch(str[3])
{
case ‘+’:
strcpy(opr,"ADD);
break;
case ‘-’:
strcpy(opr,"SUB");
break;
case ‘*’:
strcpy(opr,"MUL");
break;
case ‘/’:
strepy(opr,"DIV");
break;
}
printf(“\n\tMov %c,R%d”,str[2],i);
32 COMPILER DESIGN
printf(“\n\t%s%c,R%d”,opr,str[4],i);
printf(“\n\tMov R%d,%c”,i,str[0]);
}while(stremp(icode[++i],"exit")!=0);
//getch();
}
OUTPUT:
d=2/3
c=4/5
a=2*e
exit
RESULT:
Thus the program was implemented to the Three Address Code has been
successfully executed.
*********
B.E./B.Tech. DEGREE EXAMINATION, APRIL/MAY 2017
Sixth Semester
Computer Science and Engineering
CS6660 − COMPILE DESIGN
(Common to : Information Technology)
(Regulations 2013)
PART – B (5 × 16 = 80 Marks)
11. (a) What are the phases of the compiler? Explain the phases in detail. Write
down the output of each phase for the expression a : = b + c ∗ 60. (16)
(Or)
12. (a) Convert the Regular Expression abb (a / b)* to DFA using direct method
and minimize it. (16)
(Or)
(iii) Draw the transition diagram for relational operators and unsigned
numbers. (6)
S → (L) a
L → L, S S.
and show whether the following string will be accepted or not. (a,(a,(a,a))). 16
(Or)
E → E+T T
T → TF F
F → F∗ a b
Construct the SLR parsing table for the above grammar. (16)
14. (a) What are the different storage allocation strategies? (16)
(Or)
(b) (i) Explain in detail about Specification of a simple type checker (10)
15. (a) Discuss the various issues in design of Code Generator. (16)
(Or)
(ii) Construct the DAG for the following Basic Block. (8)
1. t1 : = 4∗i
2. t2 : = a [t1]
3. t3 : = 4∗i
4. t4 : = b[t3]
5. t5 : = t2∗t4
6. t6 : = prod+t5
7. prod : = t6
8. t7 : = i+1
9. i : = t7
*********
Q.P.4 COMPILER DESIGN
1. What is an interpreter?
2. What do you mean by Cross-Compiler?
3. What is the role of lexical analysis phase?
4. Define Lexeme.
5. Draw syntax tree for the expression a=b ∗ − c+b∗ − c.
6. What are the three storage allocation strategies?
PART – B (5 × 16 = 80 Marks)
11. (a) What are compiler construction tools? Write note on each Compiler
Construction tool.
(Or)
12. (a) (i) Discuss the issues involved in designing Lexical Analyzer.
(Or)
QUESTION PAPERS Q.P.5
(b) Write an algorithm to convert NFA to DFA and minimize DFA. Give an
example.
(Or)
E → E+T T
T → T∗F F
F →(E) id
14. (a) Explain the specification of simple type checker for statements,
expressions and functions.
(Or)
(Or)
*********
Q.P.6 COMPILER DESIGN
5. What are the different stages that a parser can recover from a syntactic error?
6. Define LOWER (0) item.
7. List three kinds of intermediate representation.
PART – B (5 × 16 = 80 Marks)
(Or)
(b) Discuss in detail about the operations of compiler which transforms the
source program from one representation into another. Illustrate the output
for the input: a = (b + c) ∗ (b + c) ∗ 2. (13)
QUESTION PAPERS Q.P.7
(i) The role of Lexical analyzer with the possible error Recovery actions. (5)
(Or)
(Or)
14. (a) Apply the S-attributed definition and constructs syntax trees for a simple
expression grammar involving only the binary operators + and −. As usual,
these operators are at the same precedence level and are jointly left
associative. All non-terminal have one synthesized attribute node, which
represents a node of the syntax tree.
(Or)
(b) Explain in detail about issues in the design of a code generator. (13)
Q.P.8 COMPILER DESIGN
PART C – (1 × 15 = 15 marks)
(Or)
*********
INDEX
A G
Activation Record, 4.10 Global Data Flow Analysis, 5.22
Ambiguous Grammar, 2.21
Analysis and Synthesis Model, 1.6
H
Automata, 1.42 Heap Allocation, 4.7
B I
Implementation of Three
Bottom up Parsing, 2.36
Address Code, 3.10
Induction Variable, 5.27
C
Input Buffering, 1.24
Canonical LR Parsing (CLR), 2.61 Instruction Costs, 4.18
Comparison of LR Parsers, 2.73 Introduction to Code Optimization, 5.1
Compiler Construction Tools, 1.10 Introduction to Lexical Analysis, 1.11
Concept of Shift Reduce Parsing, 2.41 Issues of Code Generator, 4.14
Construction of DAG, 5.15
Construction of LL(1) Parser, 2.34 L
Context – free Grammar, 2.3 LALR, 2.70
Copy Propagation, 5.6, 5.25 Language Processing System, 1.4
Language for Specifying
D Lexical Analyzer, 1.33
Lex Program to Count Total
Dead Code Eliminations, 5.7 Number of Tokens, 1.41
Declarations, 3.23 Lexical Errors, 1.23
Dependency Graphs, 3.4 Loop Optimizations, 5.7
Design of a Simple Code LR Parser, 2.50
Generator, 4.19
M
E
Minimizing DFA, 1.63
Efficient Data Flow Analysis, 5.23
Either, 3.2 N
Error Handling, 2.25 NFA with ∈ Closure, 1.46
F O
Function preserving Operator Precedence Parser, 2.46
transformations examples, 5.5 Optimization of Basic Blocks, 5.19
I.ii COMPILER DESIGN
S
Semantic Routines, 1.8
Simple LR Parsing (SLR), 2.58
Source Language Issues, 4.3
Specification of Tokens, 1.25
Stack Allocation of Space, 4.5
Static Allocation, 4.5
Storage Allocation Strategies, 4.5
Storage Organization, 4.4
Strength Reduction, 5.6
Syntax Directed Definitions, 3.1
Syntax Tree, 3.7
T
Target Machine Description, 4.17
The Phases of the Compiler, 1.7
Three Address Code, 3.9
Token, Patterns, Lexemes, 1.23
Top Down Parsing, 2.26
Translation of Expressions, 3.25
Translator, 1.1