Code Generation
Code generation phase
Final phase in the compiler model
Takes as input:
◦ intermediate representation (IR) code
◦ symbol table information
Produces output:
◦ semantically equivalent target program
Code generation phase
Compilers need to produce efficient
target programs
Includes an optimization phase
prior to code generation
May make multiple passes over the
IR before generating the target
program
Target Program Code
The back-end code generator of a
compiler may generate different
forms of code, depending on the
requirements:
◦ Absolute machine code (executable
code)
◦ Relocatable machine code (object files
for linker)
◦ Assembly language (facilitates
debugging)
◦ Byte code forms for interpreters (e.g. 3
Code generation phase
Code generator has three primary
tasks:
◦ Instruction selection
◦ Register allocation and assignment
◦ Instruction ordering
Instruction selection
◦ choose appropriate target-machine
instructions to implement the IR
statements
Register allocation and
assignment
Code generation phase
Instruction ordering
◦ decide in what order to schedule the
execution of instructions
Design of all code generators involve
the above three tasks
Details of code generation are
dependent on the specifics of IR,
target language, and run-time system
The Target Machine
Implementing code generation requires
thorough understanding of the target
machine architecture and its instruction set
Our (hypothetical) machine:
◦ Byte-addressable (word = 4 bytes)
◦ Has n general purpose registers R0, R1, …, Rn-
1
◦ Two-address instructions of the form
op source, destination
◦ Op – op-code
◦ Source, destination – data fields
6
The Target Machine: Op-
codes
Op-codes (op), for example
MOV (move content of source to
destination)
ADD (add content of source to destination)
SUB (subtract content of source from
destination)
There are also other ops
7
The Target Machine: Address
modes
Added
Mode Form Address
Cost
Absolute M M 1
Register R R 0
Indexed c(R) c+contents(R) 1
Indirect register *R contents(R) 0
Indirect indexed *c(R) contents(c+contents(R)) 1
Literal #c N/A 1
Instruction Costs
Machine is a simple processor with fixed
instruction costs
Define the cost of instruction
= 1 + cost(source-mode) + cost(destination-
mode)
9
Examples
Instruction Operation Cost
MOV R0,R1 Store content(R0) into register R1 1
MOV R0,M Store content(R0) into memory location M 2
MOV M,R0 Store content(M) into register R0 2
MOV 4(R0),M Store contents(4+contents(R0)) into M 3
MOV *4(R0),M Store contents(contents(4+contents(R0))) into M 3
MOV #1,R0 Store 1 into R0 2
ADD 4(R0),*12(R1) Add contents(4+contents(R0))
to value at location contents(12+contents(R1)) 3
10
Instruction Selection
Instruction selection is important to obtain
efficient code
Suppose we translate three-address code
x:= y + z
to: MOVy,R0
ADD z,R0
MOV R0,x
a:=a+1 MOV a,R0
ADD #1,R0
MOV R0,a
Cost = 6
Better Best
ADD #1,a INC a
Cost = 3 Cost = 2 11
Instruction Selection: Utilizing
Addressing Modes
Suppose we translate a:=b+c into
MOV b,R0
ADD c,R0
MOV R0,a
Assuming addresses of a, b, and c are
stored in R0, R1, and R2
MOV *R1,*R0
ADD *R2,*R0
Assuming R1 and R2 contain values of b
and c
ADD R2,R1
MOV R1,a
12
Need for Global Machine-
Specific Code Optimizations
Suppose we translate three-address code
x:=y+z
to: MOVy,R0
ADD z,R0
MOV R0,x
Then, we translate
a:=b+c
d:=a+e
to: MOV a,R0
ADD b,R0
MOV R0,a
MOV a,R0 Redundant
ADD e,R0
MOV R0,d
13
Register Allocation and
Assignment
Efficient utilization of the limited set of
registers is important to generate good
code
Registers are assigned by
◦ Register allocation to select the set of variables
that will reside in registers at a point in the code
◦ Register assignment to pick the specific register
that a variable will reside in
Finding an optimal register assignment in
general is NP-complete
14
Example
t:=a*b t:=a*b
t:=t+a t:=t+a
t:=t/d t:=t/d
{ R1=t } { R0=a, R1=t }
MOV a,R1 MOV a,R0
MUL b,R1 MOV R0,R1
ADD a,R1 MUL b,R1
DIV d,R1 ADD R0,R1
MOV R1,t DIV d,R1
MOV R1,t
15
Choice of Evaluation Order
When instructions are independent, their
evaluation order can be changed
MOV a,R0
ADD b,R0
MOV R0,t1
t1:=a+b MOV c,R1
t2:=c+d ADD d,R1
a+b-(c+d)*e t3:=e*t2 MOV e,R0
t4:=t1-t3 MUL R1,R0 MOV c,R0
MOV t1,R1 ADD d,R0
reorder SUB R0,R1 MOV e,R1
MOV R1,t4 MUL R0,R1
t2:=c+d MOV a,R0
t3:=e*t2 ADD b,R0
t1:=a+b SUB R1,R0
t4:=t1-t3 MOV R0,t4 16
Two Classes of Storage in
Processor
Registers
◦ Fast access, but only a few of them
◦ Address space not visible to programmer
Doesn’t support pointer access!
Memory
◦ Slow access, but large
◦ Supports pointers
4 Distinct Regions of Memory
Code space – Instructions to be
executed
◦ Best if read-only
Static (or Global) – Variables that retain
their value over the lifetime of the
program
Stack – Variables that is only as long as
the block within which they are defined
(local)
Heap – Variables that are defined by
calls to the system storage allocator
Memory Organization
Code Code and static
data sizes determined
Static Data by the compiler
Stack and heap sizes
Stack vary at run-time
Stack grows downward
... Heap grows upward
Heap Some machines have
stack/heap switched
Storage Class Selection
Standard (simple) approach
◦ Globals/statics – memory
◦ Locals
Composite types (structs, arrays, etc.) – memory
Scalars
Preceded by register keyword? – register
Rest – memory
All memory approach
◦ Put all variables into memory
◦ Compiler register allocation relocates some memory
variables to registers later
Class Problem
Specify whether each variable is stored in register or memory.
For memory, which area of the memory?
int a;
float j;
void foo(int b, double c)
{
int d;
int h[10];
register char i = 5;
int *m = new int[10];
}