Subject code: 6CS63/06IS662
NO. of lectures per week: 04
Total No. of lecture hrs: 52
IA marks : 25
Exam hrs: 03
Exam marks:100
Unit 6: Intermediate code generation
Syllabus:
Variants of syntax trees;three-address
code;types & declarations;Translation
of expressions;type checking;control
flow;back patching;switch statements;
Intermediate code for procedues.
:8 hrs
Introduction
Analysis-synthesis model:
Front end analyses a source code and
creates an intermediate representation
From this intermediate representation the
back end generates the object code
The front end is program dependent and
the back end is machine dependent
We assume that a compiler front end is
organized as in the figure shown below
Here parsing, static checking, and
intermediate-code generation are done
sequentially.
Front end
Source Intermediate
Static
Parser Code
Pgm. checker
Generator
Intermediate
Code
Code
Generator
Back end
Directed Acyclic Graphs (DAG)
Leaves correspond to atomic operands
Interior nodes correspond to operators
A node N in a DAG can have more than
one parent if N represents a common
subexpression
Advantages:
Represents expressions more succinctly
Gives the compiler more clues for
generation of efficient code
Constructing a DAG
A syntax directed definition is used to
construct a DAG
The steps are similar to the construction of
syntax trees
But before creating a new node we need
to check whether an identical node already
exists
If such a node exists the existing node is
returned
else a new node is created.
The value-number method
Nodes of a syntax tree or DAG can be stored as an
array of records. Each row of the array represents a
node
In each record the first field represents an operation
code
For leaves one additional field holds the lexical value
For interior nodes there are two additional fields for
left and right children
We refer to each node with integer index of the
array called the value number
Algorithm for value-number
method
Suppose that nodes are stored in an array
and each node is referred to by its value
number.
Input: label op,node l and node r
Output:the value of node in the array with signature
<op, l, r>
Method:search the array for the node with label
op,left child l & right child r.If found return its value
number.If not found,we create in the array a new
node with label op left child l and right child r &
return its value number
DAG Construction
i = i + 10
= 1 id * To i
2 num 10
+
3 + 1 2
i
10
4 = 1 3
Hash table and buckets
Using hash tables are more efficient.
Hash table is a array of ‘buckets’.
The index of the bucket is computed using
a hash function h for a signature <op,l,r>.
The buckets can be implemented as linked
lists.An array of pointers indexed by the
hash value points to first nodes of the
buckets.Thus the node <op,l,r> can be
found on the list whose header index is
given by h(op,l,r).
Using Hash tables & Buckets
Pointer
To list of
nodes
(sub-
trees)
Three-address code
In three-address code,there is at most one
operator on the right side of an instuction
If more than one operator is to be used
then they are simplified
Eg.
X+y*z can be written as
T1=y*z
T2=x+T1
Three address code is built from two
concepts: addresses and instructions
Addresses
Address are of three types
A name: source program names can
appear as addresses in three adderss
code
A constant: compiler must be able to
deal with different types of constants
A compiler generated temporary are
used as addresses
Three address instructions
Assignment instructions: x=y op z;
Assignments of the form: x=op y;
Copy instructions: x=u;
An unconditional jump: goto L
Conditional jumps(1): if x goto L
Conditional jumps(2): if x relop y goto L
For procedure calls and returns
Indexed copy instructions
Address and pointer assignments: x=&y
Quadruples
Quadruples are used to implement the three
address instructions in compilers. They have
four fields: op,arg1,arg2 & result.Exceptions
are:
Instructions with unary operators Eg x=y
do not use arg2
Conditional & unconditional jumps put the
target label in result.
Operators like param [used pass the
parameter] use neither arg2 nor result
Quadruple representation
b*-c+b*-c
position Op Arg1 Arg2 result
1 minus c t1
2 * b t1 t2
3 minus c t3
4 * b t3 t4
5 + t2 t4 t5
6 = t3 a
7
TRIPLES
They are also used in the implemention of
three adress instructions but use only three
fields. The result field is missing here
Using the triples we refer to the result of
an operation by its position rather than by a
temporary name.
When instructions are moved around we
need to change all references to that result
Indirect Triples
They consist of listing of pointers to
tripples.
Here we can move an instruction by
reordering the instruction list without
affecting the tripples themselves.
Triple representation
b*-c + b*-c
position op Arg1 arg2
0 minus c
1 * b (0)
2 minus c
3 * b (2)
4 + (1) (3)
5 = a (4)
Static single-assignment form
SSA is an intermediate repersentation that facilitates
certain code optimisations.
All assignments are to variables with distinct names
p1 = a+ b
q1 = p1-c
p2= q1*d
p3= e-p2
q2= p3+q1
If (flag) x = -1 ;else x = 1;
If (flag) x1 = -1 ;else x2 = 1;
x3 =Ǿ(x1,x2);
Types and Declarations
ex : int[2][3]
D T id ; | D | ε
T B C | record ‘ { ’ D ‘}’
B int | float
C ε | [ num] C
Example: T
C
int
[2] C
[3] C
ε
Storage layout
Computing types and widths
T B {t= B.type ; w = B.width}
C
B int {B.type = integer ; B.width = 4}
B float {B.type = float ; B.width = 8}
Cε {C.type = t ; C.width =w}
C [num]C1 {array(num.value,C1.type);
C.width = num.value*C1.width}
Translation of expressions
Production Semantic rules
Sid = E S.CODE = E.CODE|| GEN(TOP.GET(ID.LEXEME ‘=‘
E.ADDR))
E E1 + E2 E.ADDR = NEW TEMP() E.CODE = E1.CODE || E2.CODE
|| GEN(E.ADDR ‘=‘ E1.ADDR ‘+’ E2.ADDR)
|-E1 E.ADDR = NEW TEMP()
E.CODE = E1.CODE || GEN(E.ADDR ‘=‘ ‘MINUS’
E1.ADDR )
|(E1) E.ADDR = E1.ADDR
E.CODE = E1.CODE
|ID E.ADDR = TOP.GET(ID.LEXEME)
E.CODE = “”
Switch Statements
There is a selector expression which needs
to be evaluated followed by a set of values
that it can take.
The expression is evaluated and
depending on the value generated
particular set of statements are executed
There is always a set of default statements
which is executed if no other value
matches the expression
INCREMENTAL TRANSLATION
Code attributes are usually long strings and
hence are generated incrementally
Consider:
production : E -> E1 + E2
semantic rule : {E.addr=new temp()
gen(E.addr ’=‘ E1.addr ‘+’
E2.addr)}
Here,
gen() creates add instruction and appends it to
previously generated instructions that compute
E1 into E1.addr and E2 into E2.addr
ARRAY REFERRENCES
Usually array elements are numbered from 0
to n-1
If width of each element is w and base is the
relative address of the allocated storage,
‘i’th element begins @ locn.
base + i*w
In ‘t’ dimensions address of a[i1][i2]….[it] is
base + i1*w1+ i2 * w2 + …………… + it * wt
where wj is the width in ‘j’th dimension
THIS IS IMPLEMENTED BY A CORRESPONDING
PRODUCTION/SEMANTICS
TYPE CHECKING
TO CATCH TYPE MISMATCHES
RULE:
IF f HAS TYPE st AND x HAS TYPE s THEN
EXPRESSION f(x) HAS TYPE t
TYPE CONVERSIONS
THERE IS A HIERARCHY IN TYPE CONVERSIONS
Different types have different machine representations and
machine instructions. Hence they need to be converted into one
common type before the actual operationJava has Twotypes of
conversions:
double double
float float
long long
int int
short char char short byte
byte
1.Widening 2.Narrowing
conversions conversions
TYPE CONVERSIONCONTD
Consider the production: E -> E1 + E2
Its semantic can be explained with the 2 functions:
max(t1,t2) : takes 2 types t1 and t2 and returns maximum of the two in the
widening hierarchy
widen(a,t,w) : performs type conversion by widening address a of type t into
a value of type w
pseudocode:
widen(addr a, type t, type w)
{ if(t=w) return a;
else if(t=int and w=float)
{ temp=new Temp();
gen(temp ‘=‘ ‘(float)’ a);
return temp;
}
else error;
}
here, a is returned if a and w are of same type
else, conversion is done in a temporary that is returned
Flow of control statements
Consider the following statements:
S->if (b) s1
where s represents statements and b represents boolean
expressions.
The translation of this to b.true
statement consists of b.code
b.code followed by to b.false
b.true:
s1.code as shown. S1.code
Based on the values of ........
b.false:
b, there are jumps
within b.code.
If block
Similar are the other flow control statements
Sdd for some control statements
production Semantic rules
P S S.Next = newlabel()
P.Code = S.code || label(S.next)
S->assign
S.code= assign.code
S->if(b) s1 b.true= vewlabel()
b.false=s1.next=s.next
s.code=b.code||label(b.true) ||s1.code
S->if(b)s1 else s2 b.true=nwelabel()
b.false=nwelabel()
S1.next=s2.next=s.next
s.code=b.code || label(b.true)|| s1.code
||gen(‘goto’ s.next) || label(b.false)|| s2.code
Sdd for some control statements
contd
production Semantic rules
S->while(b) s1 begin=newlabel()
b.true=newlabel()
b.false=s.next
s1.next=begin
s.code=label(begin) || b.code
|| label(b.true)||s1.code
|| gen(‘goto’ begin)
S->s1 s2 S1.next=newlabel()
S2.next=s.next
s.code=s1.code || label(s1.next) || s2.code
BACKPATCHING
• while generating code for boolean expressions and flow-of-control
statements,a list of jumps are passed as synthesized attributes
• when jump is generated, the target is temporarily not specified
• it is added to a list of jumps without a definite target
• but, all these jumps have the same target
• when the proper label is determined, it is assigned to all the jumps
in the list
Switch Statements
There is a selector expression which needs
to be evaluated followed by a set of values
that it can take.
The expression is evaluated and
depending on the value generated
particular set of statements are executed
There is always a set of default statements
which is executed if no other value
matches the expression
Translation of a switch statement
Code to evaluate E into t
goto test
L1: code for S1
goto next
L2: code for S2
goto next
….
Ln: code for Sn
goto next
test: if t=V1 goto L1
if t=V2 goto L2 …
goto Ln
next:
Translation of switch-statement
If the number of cases is small say 10 then
we use a sequence of conditional jumps
If the number of values exceeds 10 it is
more efficient to construct a hash table for
the values with labels of the various
statements as entries.
Intermediate code procedures
In three address code a function call is
unraveled into the evaluation of parameters in
preparation of a call followed by the call itself.
the statement: n=f(a[i]); is translated to:
1) t1=i * 4
2) t2=a[t1]
3) param t2 /* makes t2 an actual parameter */
4) t3=call f,1
5) n=t3