Introduction to Compilers
SS 2012
Jun.-Prof. Dr. Christian Plessl
Custom Computing
University of Paderborn
Version 1.1.2 2012-05-01
Outline
compiler structure, intermediate code
code generation
code optimization
retargetable compiler
Translation Process
skeletal source program
preprocessor
source program
compiler
assembler program
assembler
relocatable machine code
linker / loader
library
absolute machine code
3
Compiler Phases
source program
lexical analysis
analysis
syntactic analysis
semantic analysis
symbol table
intermediate code generat.
error handling
code optimization
code generation
synthesis
target program
4
Overview Analysis Phase
lexical analysis
scanning of the source program and splitting into symbols
regular expressions: recognition by finite automata
syntactic analysis
parsing symbol sequences and construction of sentences
sentences are described by a context-free grammar
!A Identifier := E
E E + E | E * E | Identifier | Number
semantic analysis
make sure the program parts "reasonably" fit together,
e.g. implicit type conversions
Overview Synthesis Phase
generation of intermediate code
machine independent simplified retargeting
should be easy to generate
should be easily translatable into target program
optimization
goals for GP processors: fast code, fast translation
goals for specialized processors: fast code, short code (low memory
requirements), low power consumption
both intermediate and target code can be optimized
code generation
translate intermediate representation into assembler code for target
architecture
apply target specific optimizations
6
Example (1)
source program
lexical analysis
assignment symbol
position := initial + rate * 60
id1 :=
id2 +
identifiers
operators
id3
*
60
number
Example (2)
syntactic analysis
semantic analysis
:=
id1
:=
id1
+
id2
id2
*
id3
60
*
id3
IntToReal
60
Example (3)
intermediate code generat.
tmp1 := IntToReal(60)
tmp2 := id3*tmp1
tmp3 := id2+tmp2
id1 := tmp3
code optimization
tmp1 := id3 * 60.0
id1 := id2 + tmp1
code generation
ld.s
li.s
mul.s
ld.s
add.s
st.s
$f1,
$f2,
$f2,
$f1,
$f2,
$f2,
id3
60.0
$f2, $f1
id2
$f2, $f1
id1
9
Syntax Tree and DAG
a := b*(c-d) + e*(c-d)
syntax tree
DAG (directed acyclic graph)
:=
:=
+
*
*
-
*
e
d
10
3 Address Code (1)
3 address instructions
maximal 3 addresses (2 operands, 1 result)
maximal 2 operators
assignment instructions
control flow instructions
x := y op z
x := op y
x := y
goto L
if x relop y goto L
x := y[i]
x[i] := y
subroutines
x := &y
y := *x
*x := y
param x
x = call p,n
return y
11
3 Address Code (2)
Generation of 3 address code from a DAG
(valid but not optimal)
:=
t1 := c - d
t2 := e * t1
t3 := b * t1
t4 := t2 + t3
a := t4
t4
a
t3
t2
*
t1
b
e
c
12
3 Address Code (3)
advantages of 3 address code
dissection of long arithmetic expressions
temporary names facilitate reordering of instructions
forms a valid schedule
definition: A 3 address instruction
x := y op z
defines x and
uses y and z
13
Basic Blocks (1)
definition: A basic block is a sequence of instructions where
the control flow enters at the beginning and exits at the end,
without stopping in-between or branching (except at the
end).
t1
t2
t3
t4
if
:=
:=
:=
:=
t4
c - d
e * t1
b * t1
t2 + t3
< 10 goto L
14
Basic Blocks (2)
determining the basic blocks from a sequence of 3 address
instructions:
1. determine the block beginnings:
the first instruction
targets of un/conditional jumps
instructions that follow un/conditional jumps
2. determine the basic blocks:
there is a basic block for each block beginning
the basic block consists of the block beginning and runs until the
next block beginning (exclusive) or until the program ends
15
Control Flow Graphs
degenerated" control flow graph (CFG)
shows possible control flows in a program
degenerated means that the nodes of the CFG
are basic blocks (instead of instructions)
i := 0
t2 := 0
t2 := t2 + i
i := i + 1
if i < 10 goto L
x := t2
i < 10
i >= 10
16
DAG of a Basic Block
Definition: A DAG of a basic block is a directed acyclic
graph with following node markings:
Leaves are marked with a variable / constant name. Variables with
initial values are assigned the index 0.
Inner nodes are marked with an operator symbol. From the operator
we can conclude whether the value or the address of the variable is
being used.
Optionally, a node can be marked with a sequence of variable names.
Then, all variables are assigned the computed value.
17
Example (1)
C program
int i, prod, a[20], b[20];
...
prod = 0;
i = 0;
do {
prod = prod + a[i]*b[i];
i++;
} while (i < 20);
3 address code
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
prod := 0
i := 0
t1 := 4 * i
t2 := a[t1]
t3 := 4 * i
t4 := b[t3]
t5 := t2 * t4
t6 := prod + t5
prod := t6
t7 := i + 1
i := t7
if i < 20 goto (3)
18
Example (2)
basic blocks
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
prod := 0
B1
i := 0
t1 := 4 * i
B2
t2 := a[t1]
t3 := 4 * i
t4 := b[t3]
t5 := t2 * t4
t6 := prod + t5
prod := t6
t7 := i + 1
i := t7
if i < 20 goto (3)
control flow graph
B1
B2
19
Example (3)
basic block B2
DAG for B2
t6, prod
+
t1 := 4 * i
t2 := a[t1]
t3 := 4 * i
t4 := b[t3]
t5 := t2 * t4
t6 := prod + t5
prod := t6
t7 := i + 1
i := t7
if i < 20 goto (3)
prod0
t5
t2
t4
[]
[]
<
t1, t3
*
b
t7, i
20
a
4
i0
1
20
Compiler and Code Generation
compiler structure, intermediate code
code generation
code optimization
code generation for specialized processors
retargetable compiler
21
Code Generation
requirements
correct code
efficient code
fast code generation
code generation = software synthesis
allocation:
mostly the components are fixed (registers, ALUs)
binding:
register binding (register allocation, register assignment)
instruction selection
scheduling:
instruction sequencing
22
Register Binding
goal: efficient use of registers
minimize number of LOAD/STORE instructions (RISC)
instructions with register operands are generally shorter and faster
than instructions with memory operands (CISC)
register allocation, register assignment
allocation: determine for each point in time the set of variables that
should be held in registers
assignment: assign these variables to registers
optimal register binding
NP-complete problem
additionally: restrictions for register use by the processor
architecture, compiler, and operating system
23
Instruction Selection
naive approach
use code pattern for each 3 address instruction
x := y + z
u := x - w
problems
lw
lw
add
sw
$r1,
$r2,
$r1,
$r1,
y
z
$r1, $r2
x
lw
lw
sub
sw
$r1,
$r2,
$r1,
$r1,
x
w
$r1, $r2
u
often inefficient code generated code optimization
there might be several matching target instructions
some instructions work only with particular registers
exploitation of special processor features difficult,
e.g. auto-increment / decrement addressing
24
Instruction Scheduling (1)
Optimal instruction sequence
minimal number of instructions for the given number of registers
NP-complete problem
t4
t1
t3
t2
t3
t1
t4
:=
:=
:=
:=
c + d
e - t2
a + b
t1 - t3
t1
t2
t3
t4
:=
:=
:=
:=
a + b
c + d
e - t2
t1 - t3
t2
b e
+
c
25
Instruction Scheduling (2)
with 2 registers (R0,R1):
t2
t3
t1
t4
:=
:=
:=
:=
t1
t2
t3
t4
c + d
e - t2
a + b
t1 - t3
:=
:=
:=
:=
a + b
c + d
e - t2
t1 - t3
machine model (here): CPU with
memory operands
- register/register instructions
ADD R0, R1 (R1 = R1 + R0)
- register/memory instructions
MOV
ADD
MOV
SUB
MOV
ADD
SUB
MOV
c, R0
d, R0
e, R1
R0, R1
a, R0
b, R0
R1, R0
R0, t4
MOV e, R0 (R0 = *e)
R0:c
R0:t2
R0:t2
R0:t2
R0:a
R0:t1
R0:t4
R1:R1:R1:e
R1:t3
R1:t3
R1:t3
load contents of address e into
register R0
ADD a, R0 (R0 = R0 + *a)
add contents of address to
register R0
26
Instruction Scheduling (3)
with 2 registers (R0,R1):
t2
t3
t1
t4
:=
:=
:=
:=
c + d
e - t2
a + b
t1 - t3
t1
t2
t3
t4
:=
:=
:=
:=
a + b
c + d
e - t2
t1 - t3
machine model (here): CPU with
memory operands
- register/register instructions
ADD R0, R1 (R1 = R1 + R0)
- register/memory instructions
MOV
ADD
MOV
ADD
MOV
MOV
SUB
MOV
SUB
MOV
a, R0
b, R0
c, R1
d, R1
R0, t1
e, R0
R1, R0
t1, R1
R0, R1
R1, t4
R0:a R1:R0:t1 R1:R0:t1 R1:c
R0:t1 R1:t2
R0:t1 R1:t2
R0:e R1:t2
R0:t3 R1:t2
R0:t3 R1:t1
R0:t3 R1:t4
MOV e, R0 (R0 = *e)
load contents of address e into
register R0
ADD a, R0 (R0 = R0 + *a)
add contents of address to
register R0
t1 and t2 are used below red line (register needs to be temporarily saved to memory,
this is denoted as register spill)
27
Compiler and Code Generation
compiler structure, intermediate code
code generation
code optimization
retargetable compiler
28
Code Optimization
transformations on the intermediate code and on the
target code
peephole optimization
small window (peephole) is moved over the code
several passes, because an optimization can generate new optimization
opportunities
local optimization
transformations inside basic blocks
global optimization
transformations across several basic blocks
29
Peephole Optimizations (1)
deletion of unnecessary instructions
(1)
(2)
lw $r1, a
sw $r1, a
(1)
lw $r1, a
if (1) and (2) are in the same basic block
algebraic simplifications
x := y + 0*(z**4/(y-1));
x := x * 1;
x := x + 0;
x := y;
delete
30
Peephole Optimizations (2)
strength reductions
x := y*8;
x := y << 3;
x := y**2;
x := y * y;
31
Local Optimizations
common sub-expression elimination
(1)
(2)
(3)
(4)
a
b
c
d
:=
:=
:=
:=
b
a
b
a
+
+
-
c
d
c
d
(1)
(2)
(3)
(4)
a
b
c
d
:=
:=
:=
:=
b + c
a - d
b + c
b
variable renaming
t := b + c
u := b + c
normal form of a basic block: each variable is defined only once
instruction interchange
t1 := b + c
t2 := x + y
t2 := x + y
t1 := b + c
32
Global Optimizations (1)
dead code elimination
an instruction that defines x can be deleted if x is not used afterward
copy propagation
(1)
(2)
(3)
(4)
x := t1
a[t2] := t3
a[t4] := x
goto L
(1)
(2)
(3)
(4)
x := t1
a[t2] := t3
a[t4] := t1
goto L
if x is not used after (1), (1) is dead code
33
Global Optimizations (2)
control flow optimizations
(1)
(2) L1
goto L1
.
goto L2
(1)
(2) L1
goto L2
.
goto L2
if L1 is not reachable:
delete (2) (dead code elimination)
34
Global Optimizations (3)
code motion
t = limit*4+2;
while (i <= t)
{
....
}
while (i <= limit*4+2)
{
....
}
if limit is not modified in the loop body
induced variables and strength reduction
(1)
(2)
(3)
(4)
j := n
j := j - 1
t4 := 4 * j
t5 := a[t4]
if t5 > v goto (1)
(1)
(2)
(3)
(4)
j := n
t4 := 4 * j
j := j - 1
t4 := t4 - 4
t5 := a[t4]
if t5 > v goto (1)
35
Compiler and Code Generation
compiler structure, intermediate code
code generation
code optimization
retargetable compiler
36
Retargetable Compiler
portable compiler
developer retargetable
code generation by tree pattern matching
compiler-compiler
user retargetable (semi-automatic)
compiler is generated from a description of the target architecture
(processor model)
machine independent compiler
automatically retargetable
compiler generates code for several processors / processor variants
for parametrizable architectures
37
Tree Pattern Matching
rules for transforming a syntax tree (DAG) are given as
tree patterns
replacement
example:
pattern { action }
reg i
reg i
{ ADD Rj, Ri }
reg j
stepwise replacement by tree pattern matching until the tree
contains only one node
38
Target Instructions (1)
(1)
reg i
const c { MOV #c, Ri }
(4)
mem
:= { MOV Rj, *Ri }
ind
(2) reg i
reg i
mem a { MOV a, Ri }
(5) reg i
(3)
mem
:= { MOV Ri, a }
mem a
reg i
reg j
ind { MOV c(Rj), Ri }
+
const c
reg j
39
Target Instructions (2)
(7) reg i
+ { ADD Rj, Ri }
reg i
(6) reg i
reg j
+ { ADD c(Rj), Ri }
reg i
ind
(8) reg i
+
const c
reg j
+ { INC Ri }
reg i
const 1
40
Tree Pattern Matching - Example (0)
a[i] := b + 1
:=
+
ind
+
+
const _a
mem b
const 1
ind
reg SP
const _i
+
reg SP
41
Tree Pattern Matching - Example (0)
a[i] := b + 1
b is a global variable
stored on the heap
compiler knows address (absolute
addressing)
a and i are local variables
stored on the stack
compiler knows offset from the SP
(relative addressing)
offset is stored in constants _a and
_i
how to compute address of a[i]?
get value of i (read memory at
address SP+_i
a[i] is located at address SP+_a+i
0x000
0x100
42
0x104
0x108
0xF00
stack pointer (SP)
0xF04
0xF08
a[0]
0xF0C
0xF10
11
a[1]
0xF14
a[2]
0xF1B
a[3]
_i=0x8
_a=0xC
heap memory
stack memory
42
Tree Pattern Matching - Example (1)
a[i] := b + 1
:=
+
ind
+
(1) { MOV #_a, R0 }
const _a
mem b
const 1
ind
reg SP
const _i
+
reg SP
43
Tree Pattern Matching - Example (2)
a[i] := b + 1
:=
+
ind
(7) { ADD SP, R0 }
+
reg 0
mem b
const 1
ind
reg SP
const _i
+
reg SP
44
Tree Pattern Matching - Example (3)
:=
a[i] := b + 1
ind
(6) { ADD _i(SP), R0 }
reg 0
mem b
const 1
ind
+
const _i
reg SP
45
Tree Pattern Matching - Example (4)
a[i] := b + 1
:=
+
ind
reg 0
mem b
const 1
(2) { MOV b, R1 }
46
Tree Pattern Matching - Example (5)
a[i] := b + 1
:=
+
ind
reg 0
reg 1
const 1
(8) { INC R1 }
47
Tree Pattern Matching - Example (6)
a[i] := b + 1
:=
ind
reg 1
reg 0
(4) { MOV R1, *R0 }
MOV
ADD
ADD
MOV
INC
MOV
#_a, R0
SP, R0
_i(SP), R0
b, R1
R1
R1, *R0
48
Compiler Compiler
front-end
back-end
source
program
(DFL)
parsing,
flow graph
generation
pattern
matching
processor
model
(HDL)
instruction
set
extraction
pattern
matcher
generator
optimization
executable
code
RECORD Compiler Compiler:
R.Leupers, Retargetable Generator of Code
Selectors from HDL Processor Models,
European Design and Test Conference, 1997.
49
Instruction Set Extraction
instruction
xx011zz
control bits
xx
reg
acc
01
operation reg[zz] <- reg[xx] + acc!
pattern
zz
reg
reg
+
acc
reg
50
Changes
v1.1.2 (2012-05-01)
tree pattern matching: show DAG to be matched before explanation
v1.1.1 (2012-04-27)
fixed semantic of ADD R0,R1 operation on slide 26 and added a new slide
27 for illustrating the differences between the generated code
move "control flow optimization" to slide 27 because it is not a local but a
global optimization
v1.1.0 (2012-04-24)
updated for SS2012, minor corrections
v1.0.3 (2010-05-05)
fix minor typos in explanation of a[i]= b + 1 memory layout description
v1.0.2 (2010-05-02)
add discussion of how a[i]= b + 1 is stored in memory
v1.0.1 (2010-04-27)
slide 11: clarified that call instruction in 3 addr code returns a value
51