Code Generation
Position of a Code Generator in
the Compiler Model
Intermediate Intermediate
code Code code Code
Front-End
Optimizer Generator
Lexical error
Syntax error
Semantic error
Symbol Table
Requirements
• Output code must be correct
• Must be of high quality; must make
effective use of resources
• Code generator must run efficiently
• Good Vs Optimal
Issues in the design of a Code
Generator
• Input to the Code Generator
• Target Programs
• Memory Management
• Instruction Selection
• Register Allocation
• Choice of evaluation order
Input to the Code Generator
• Input
– Intermediate representation produced by the
front end
• Postfix form, quads, syntax trees, DAGs
– Symbol table
– Intermediate code available
– Type checking done
– Input is free of errors
Target Programs
• Absolute machine language
• Relocatable machine language
– Expensive
– Flexible
• Assembly language
– Easier
Memory management
• Name in 3 address stmt refers to symbol
table (width, relative address)
• Static allocation Vs Stack allocation
• Labels have to be converted to addresses
• For each quad – machine instruction
address is tracked
• Backpatching
Instruction Selection
• Uniformity and completeness of instruction set
• Instruction speed
• For each type of statement a skeleton can be
maintained
x := y + z
MOV y, R0
ADD z, R0
MOV R0, x
• May produce redundant code. Ex: a:= b + c d:=
a+e
Instruction Selection
• There may be several ways of
implementing an instruction
– Increment
• Accurate timing information may not be
available
Register Allocation
• Efficient utilization of registers is needed
• Register Allocation
• Register Assignment
– Optimal assignment is an NP-complete
problem
• Register pairs may be required for some
instructions
Choice of Computation order
• Fewer registers maybe required
• Choosing order is also NP complete
Target machine
• Byte addressable
• 4 bytes/word
• n general purpose registers
• 2 address instruction format
– op source, destination
• Opcodes
– MOV
– ADD
– SUB
Target Machine Addressing modes
Mode Form Address Added Cost
Absolute M M 1
Register R R 0
Indexed c(R) c+contents(R) 1
Indirect register *R contents(R) 0
Indirect indexed *c(R) contents(c+contents(R)) 1
Instruction Cost
• Define the cost of instruction
= 1 + cost(source-mode) + cost(destination-mode)
• MOV R0, M
– stores contents of R0 into M
• MOV 4(R0), M
– Stores value contents(4 + contents(R0)) into M
• MOV *4(R0),M
– Stores value contents(contents(4 + contents(R0))) into M
• MOV #1, R0
Instruction costs
Instruction Operation Cost
MOV R0,R1 Store content(R0) into register R1 1
MOV R0,M Store content(R0) into memory location M 2
MOV M,R0 Store content(M) into register R0 2
MOV 4(R0),M Store contents(4+contents(R0)) into M 3
MOV *4(R0),M Store contents(contents(4+contents(R0))) into M 3
MOV #1,R0 Store 1 into R0 2
ADD 4(R0),*12(R1) Add contents(4+contents(R0))
to contents(contents(12+contents(R1))) 3
Instruction Selection
Can translate a:=b+c into
MOV b,R0
ADD c,R0
MOV R0,a (or)
MOV b,a
ADD c,a
Assuming addresses of a, b, and c are stored in R0, R1, and R2
MOV *R1,*R0
ADD *R2,*R0
Assuming R1 and R2 contain values of b and c
ADD R2,R1
MOV R1,a