CODE GENERATION
LAST PHASE OF COMPILER CONSTRUCTION
1
CODE GENERATION
➢ The code generation is the task of mapping
intermediate code to machine code.
➢ Requirements:
➢ Correctness
➢ Must preserve semantic meaning of source program
➢ Make effective use of available resources
➢ Must run efficiently
2
INPUT TO THE CODE GENERATOR
➢ We assume, front end has
➢ Scanned, parsed and translate the source program into a
reasonably detailed intermediate representations
➢ Type checking, type conversion and obvious semantic errors have
already been detected
➢ Symbol table is able to provide run-time address of the data
objects
➢ Intermediate representations may be
➢ Postfix notations
➢ Three address representations
➢ Syntax tree
3 3
➢ DAG
TARGET PROGRAMS
➢ The output of the code generator is the target program.
➢ Target architecture:
➢ Must be well understood
➢ Significantly influences the difficulty of code generation
➢ Eg. RISC, CISC
➢ Target program may be
➢ Absolute machine language
➢ It can be placed in a fixed location of memory and immediately executed
➢ Re-locatable machine language
➢ Subprograms to be compiled separately
➢ A set of re-locatable object modules can be linked together and loaded for
execution by a linker 4
ISSUES IN DESIGN OF CODE GENERATOR
➢ Instruction Selection
➢ Register Allocation
➢ Evaluation Order
5 5
INSTRUCTION SELECTION
➢ There may be a large number of ‘candidate’ machine
instructions for a given IR instruction
➢ Level of IR
➢ High: Each IR translates into many machine instructions
➢ Low: Reflects many low-level details of machine
➢Nature of the instruction set
➢ Uniformity and completeness
➢ Each has own cost and constraints
➢ Accurate cost information is difficult to obtain
➢ Cost may be influenced by surrounding context 6 6
INSTRUCTION SELECTION
➢ For each type of three-address statement, a code
skeleton can be designed that outlines the target code to
be generated for that construct.
Say,
x := y + z
Mov y, R0
Add z, R0
Mov R0, x
Statement by statement generation often
produces poor code 7 7
INSTRUCTION SELECTION
a := b + c
d := a + e
MOV b, R0
ADD c, R0
MOVR0,
MOV R0, aa If a is subsequently
used
MOVa,a, R0
MOV R0
ADD e, R0
MOV R0, d
8
INSTRUCTION SELECTION
MACHINE IDIOMS
9
REGISTER ALLOCATION
➢ How to best use the bounded number of registers.
➢ Use of registers
➢ Register allocation
➢ We select a set of variables that will reside in registers at each point in the
program
➢ Register assignment
➢ We pick the specific register that a variable will reside in.
➢ Complications:
➢ special purpose registers
➢ operators requiring multiple registers.
➢ Optimal assignment is NP-complete 10 10
REGISTER ALLOCATION
r4
11
REGISTER ALLOCATION
12
EVALUATION ORDER
➢ Choosing the order of instructions to best utilize
resources
➢ Picking the optimal order is NP-complete problem
➢ Simplest Approach
➢ Don’t mess with re-ordering.
➢ Target code will perform all operations in the same order as the
IR code
➢ Trickier Approach
➢ Consider re-ordering operations
➢ May produce better code
13
➢ ... Get operands into registers just before they are need
➢ ... May use registers more efficiently
MOVING RESULTS BACK TO MEMORY
➢ When to move results from registers back into memory?
➢ After an operation, the result will be in a register.
➢ Immediately
➢ Move data back to memory just after it is computed.
➢ May make more registers available for use elsewhere.
➢ Wait as long as possible before moving it back
➢ Only move data back to memory “at the end”
➢ or “when absolutely necessary”
➢ May be able to avoid re-loading it later!
14
CODE GENERATION ALGORITHM #1
Simple code generation algorithm:
Define a target code sequence to each intermediate code statement type.
15
CODE GENERATION ALGORITHM #1
16
EXAMPLE TARGET MACHINE
17
EVALUATING A POTENTIAL CODE SEQUENCE
➢ Each instruction has a “cost”
Cost = Execution Time
➢ Execution Time is difficult to predict.
➢ Pipelining, Branches, Delay Slots, etc.
➢ Goal: Approximate the real cost
A “Cost Model”
18
A BETTER COST MODEL
19
COST GENERATION EXAMPLE
20
BASIC BLOCKS
21
BASIC BLOCKS
22
BASIC BLOCKS
23
CONTROL FLOW GRAPH
24
SETHI-ULLMAN ALGORITHM
IDENTIFYING NO. OF REGISTERS REQUIRED
Intuition:
1. Label each node according to the number of
registers that are required to generate code for the node.
2. Generate code from top down always generating code
first for the child that requires the most registers.
25
SETHI-ULLMAN ALGORITHM
(INTUITION)
Left Right
leaf leaves
Bottom-Up Labeling: visit a node after all its children are labeled.
26
LABELING ALGORITHM
(1) if n is a leaf then
(2) if n is the leftmost child of its parent then
(3) label(n) : = 1
(4) else label(n) : = 0
else begin / * n is an interior node * /
(5) let c1, c2 , , ck be the children of n ordered by label
so that label (c1 ) label (c2 ) label (ck )
(6) label (n ) : = max(label (ci ) + i − 1)
1i k
end
27
LABELING ALGORITHM
label (c1 ) label (c2 ) label (ck )
28
If k = 1 (a node with two children), then the following relation
label (n1 ) : = max(label (ci ) + i − 1)
1i k
becomes :
maxlabel (c1 ), label (c2 ) if label (c1 ) label (c2 )
label (n) =
label (c1 ) + 1 if label (c1 ) = label (c2 )
28
EXAMPLE
Let’s
assume we have following set of instructions:
t1=a+b
t2=c+d
t3=e+t2
t4=t1+t3
Draw the tree corresponding to the given instructions
29
EXAMPLE
t4
t1 t3
a b e t2
c d
30
EXAMPLE
Labeling leaves:
t4 leftmost is 1, others are 0
t1 t3
a 1 b 0 e 1 t2
c 1 d 0
31
EXAMPLE
Labeling t2:
t4 label(c) ≠ label (d)
32
max(label(1),label(2))
Thus label(t2)=1
t1 t3
a 1 b 0 e 1 t2
c 1 d 0
32
EXAMPLE
Labeling t3:
label(e) = label (t2)
t4
max(label(1),label(2))
label(t3)=label(1)+1 =2
t1 t3
a 1 b 0 e 1 t2 1
c 1 d 0
33
EXAMPLE
Labeling t1:
t4 label(a) ≠ label (b)
34
max(label(1),label(2))
Thus label(t1)=1
t1 t3 2
a 1 b 0 e 1 t2 1
c 1 d 0
34
EXAMPLE
Labeling t4:
t4 label(a) ≠ label (b)
35
max(label(1),label(2))
Thus label(t4)=2
t1 1 t3 2
a 1 b 0 e 1 t2 1
c 1 d 0
35
EXAMPLE
t4 2
t1 1 t3 2
a 1 b 0 e 1 t2 1
c 1 d 0
36
THANKS
37