Module 6
Code Generation
• The final phase of a compiler is code generator
• It receives an intermediate representation (IR) with supplementary
information in symbol table
• Produces a semantically equivalent target program
• Code generator main tasks:
• Instruction selection
• Register allocation and assignment
• Instruction ordering
Register and Address Descriptors
• A register descriptor is used to keep track of which variable is stored
in a register.
• The register descriptors show that initially all the registers are empty.
• An address descriptor is used to keep track of location where the
variable is stored. Location may be register, memory address or stack.
Code-generation algorithm
• The algorithm takes a sequence of three-address statements as input. For each three address statement of the form
a:= b op c perform the various actions. These are as follows:
1.Invoke a function getreg to find out the location L where the result of computation b op c should be stored.
2.Consult the address description for y to determine y'. If the value of y currently in memory and register both
then prefer the register y' . If the value of y is not already in L then generate the instruction MOV y' , L to
place a copy of y in L.
3.Generate the instruction OP z' , L where z' is used to show the current location of z. if z is in both then prefer
a register to a memory location. Update the address descriptor of x to indicate that x is in location L. If x is in
L then update its descriptor and remove x from all other descriptor.
4.If the current value of y or z have no next uses or not live on exit from the block or in register then alter the
register descriptor to indicate that after execution of x : = y op z those register will no longer contain y or z.
Generating Code for Assignment
Statements
• The assignment statement d:= (a-b) + (a-c) + (a-c) can be translated
into the following sequence of three address code
t:= a-b
u:= a-c
v:= t +u
d:= v+u
Statement Code Generated Register descriptor Address descriptor
Register empty
t:= a - b MOV a, R0 R0 contains t t in R0
SUB b, R0
u:= a - c MOV a, R1 R0 contains t t in R0
SUB c, R1 R1 contains u u in R1
v:= t + u ADD R1, R0 R0 contains v u in R1
R1 contains u v in R1
d:= v + u ADD R1, R0 R0 contains d d in R0
MOV R0, d d in R0 and memory
Generate code for the following three- 1. LD R1, #1
address statements assuming all variables ST x, R1
are stored in memory locations. 2. LD R1, a
1. x = 1 ST x, R1
3. LD R1, a
2. x = a
ADD R1, R1, #1
3. x = a + 1
ST x, R1
4. x = a + b
4. LD R1, a
LD R2, b
ADD R1, R1, R2
ST x, R1
Generating Code for Assignment
Statements
• The assignment d = (a-b)
+ (a-c) + (a-c) might be
translated into the
following three-address
code sequence:
• Code sequence for the
example is:
• The two statements
LD R1, b
x=b*c LD R2, c
y=a+x MUL R1, R1, R2
LD R3, a
ADD R3, R3, R1
ST y, R3
• The three-statement sequence
x = a[i] Answer
y = b[i] LD R1, i
z=x*y MUL R1, R1, #4
LD R2, a(R1)
LD R1, b(R1)
MUL R1, R2, R1
ST z, R1
Issues in the Design of Code Generation
• Input to the code generator
• Target program
• Memory management
• Instruction selection
• Register allocation
• Evaluation order
Input to the code generator
• The input to the code generator contains the intermediate representation of the source program and
the information of the symbol table. The source program is produced by the front end.
• Intermediate representation has the several choices:
a) Postfix notation
b) Syntax tree
c) Three address code
• We assume front end produces low-level intermediate representation i.e. values of names in it can
directly manipulated by the machine instructions.
• The code generation phase needs complete error-free intermediate code as an input requires.
Target Program
• The target program is the output of the code generator. The output can be:
a) Assembly language: It allows subprogram to be separately
compiled.
b) Relocatable machine language: It makes the process of code
generation easier.
c) Absolute machine language: It can be placed in a fixed location in
memory and can be executed immediately.
Memory Management
• During code generation process the symbol table entries have to be mapped to
actual addresses
• Mapping name in the source program to address of data is co-operating done
by the front end and code generator.
• Local variables are stack allocation in the activation record while global
variables are in static area.
Instruction Selection
• Nature of instruction set of the target machine should be complete and
uniform.
• When you consider the efficiency of target machine then the instruction
speed and machine idioms are important factors.
• The quality of the generated code can be determined by its speed and size.
Register Allocation
• Register can be accessed faster than memory. The instructions involving
operands in register are shorter and faster than those involving in
memory operand.
• The following sub problems arise when we use registers:
1. Register allocation: In register allocation, we select the set of
variables that will reside in register.
2.Register assignment: In Register assignment, we pick the register
that contains variable.
Evaluation order
• The efficiency of the target code can be affected by the order in which
the computations are performed.
• Some computation orders need fewer registers to hold results of
intermediate than others.
Target Machine
• The target computer is a type of byte-addressable machine. It has 4 bytes to a word.
• The target machine has n general purpose registers, R0, R1,...., Rn-1. It also has two-address
instructions of the form: op source, destination
Where, op is used as an op-code and source and destination are used as a data field.
• It has the following op-codes:
ADD (add source to destination)
SUB (subtract source from destination)
MOV (move source to destination)
• The source and destination of an instruction can be specified by the combination of registers and
memory location with address modes.
MODE FORM ADDRESS EXAMPLE
absolute M M Add R0, R1
register R R Add temp, R1
indexed c(R) C+ contents(R) ADD 100 (R2),
R1
indirect register *R contents(R) ADD * 100
indirect indexed *c(R) contents(c+ (R2), R1
contents(R))
literal #c c ADD #3, R1
Next-Use Information
• In compiler design, the next use information is a type of data flow analysis
that can be used to optimize the allocation of registers in a computer’s
central processing unit (CPU).
• The goal of next use analysis is to determine which variables in a program
are needed in the immediate future and should therefore be stored in a
register for faster access, rather than in main memory.
• Example x = y + z;
a = x + b;
c = x + d;
• To perform the next-use analysis, the compiler examines each instruction
in the program and determines the next time that each variable is used. If a
variable is not used again until much later in the program, it may not be
worth keeping in a register and could be stored in the main memory
instead. On the other hand, if a variable is used multiple times in quick
succession, it may be more efficient to keep it in a register and avoid the
overhead of repeatedly loading and storing it in the main memory.
• Next use analysis can be combined with other optimization techniques,
such as register allocation and live range analysis, to further improve the
performance of a compiled program.
Register Allocation and Assignment
• Local register allocation
• Register allocation is only within a basic block. It follows top-down
approach.
• Assign registers to the most heavily used variables
• Traverse the block
• Use count as a priority function
• Assign registers to higher priority variables first
Need of global register allocation
• Local allocation does not take into account that some instructions (e.g. those in loops) execute
more frequently. It forces us to store/load at basic block endpoints since each block has no
knowledge of the context of others.
• To find out the live range(s) of each variable and the area(s) where the variable is used/defined
global allocation is needed. Cost of spilling will depend on frequencies and locations of uses.
• Register allocation depends on:
• Size of live range
• Number of uses/definitions
• Frequency of execution
• Number of loads/stores needed.
Register allocation – Global Register
• Global register allocation can be seen as a graph coloring problem.
• Basic idea:
1. Identify the live range of each variable
2. Build a register interference graph (RIG) that represents conflicts
between live ranges (two nodes are connected if the variables they
represent are live at the same moment)
3. Try to assign as many colors to the nodes of the graph as there are
registers so that two neighbors have different colors
Run time Organization
• The run-time environment is the structure of the target
computers registers and memory that serves to manage
memory and maintain information needed to guide a
programs execution process.
1. Fully Static
• Fully static runtime environment may be useful for the languages in which
pointers or dynamic allocation is not possible in addition to no support for
recursive function calls.
• Every procedure will have only one activation record which is allocated
before execution.
• Variables are accessed directly via fixed address.
2. Stack Based
• In this, activation records are allocated (push of the activation record)
whenever a function call is made.
• The necessary memory is taken from the stack portion of the program.
• When program execution return from the function, the memory used
by the activation record is deallocated (pop of the activation record).
Thus, the stack grows and shrinks with the chain of function calls.
3. Fully Dynamic
• Functional language use this style of call stack management.
• The activation record is deallocated only when all references to them
have disappeared, and this requires the activation records to
dynamically freed at arbitrary times during execution.
• Memory manager (garbage collector) is needed.
• The data structure that handles such management is heap an this is
also called as Heap Management.
Activation Records
• Information needed by a single execution of a procedure is managed
using a contiguous block of storage called “activation record”.
• An activation record is allocated when a procedure is entered and it is
deallocated when that procedure is exit.
• It contain temporary data, local data, machine status, optional access
link, optional control link, actual parameters and returned values.
contents of activation records
• Return Value: It is used by calling procedure to return a value to calling
procedure.
• Actual Parameter: It is used by calling procedures to supply parameters to
the called procedures.
• Control Link: It points to activation record of the caller.
• Access Link: It is used to refer to non-local data held in other activation
records.
• Saved Machine Status: It holds the information about status of machine
before the procedure is called.
• Local Data: It holds the data that is local to the execution of the procedure.
• Temporaries: It stores the value that arises in the evaluation of an expression.