Compiler Construction
Group Name
Ahmed Raza 13031519-039
Rehman Nazir 13031519-121
Single Pass compiler with example
What is a Pass?
What is a Single Pass compiler?
Difference between Single and Multi Pass Compiler?
Example
Issue with Single Pass Compiler
The Major Phases of a Compiler
Source Program
Syntax Analysis
Error Reports
Abstract Syntax Tree
Semantic Analysis
Decorated Abstract Syntax Tree
Code Generation
Object Code
Error Reports
Compiler Passes
A pass is a complete traversal of the source program, or a complete
traversal of some internal representation of the source program (such
as an AST).
A pass can correspond to a phase but it does not have to!
Sometimes a single pass corresponds to several phases that are
interleaved in time.
What and how many passes a compiler does over the source program
is an important design decision.
Single Pass Compiler
A single pass compiler makes a single pass over the source text, parsing, analyzing, and
generating code all at once.
Dependency diagram of a typical Single Pass Compiler:
Compiler Driver
calls
Syntactic Analyzer
calls
Contextual Analyzer
5
calls
Code Generator
Multi Pass Compiler
A multi pass compiler makes several passes over the program. The output of a preceding
phase is stored in a data structure and used by subsequent phases.
Dependency diagram of a typical Multi Pass Compiler:
Compiler Driver
calls
calls
calls
Syntactic Analyzer
input
Source Text
output
AST
Contextual Analyzer
input
output
Decorated AST
Code Generator
input
output
Object Code
Example: Single Pass Compilation of ...
Source Program
Let var n: integer;
var c: char
in begin
c := &;
n := n+1
end
Ident
n
c
Type
int
char
Address
0[SB]
1[SB]
Assembly Code
PUSH 2
LOADL 38
STORE 1[SB]
LOAD 0[SB]
LOADL 1
CALL add
STORE 0[SB]
POP 2
HALT
Compiler Design Issues
Single Pass
Speed
better
worse
Memory
better for
large programs
(potentially) better for
small programs
Modularity
worse
better
worse
better
impossible
possible
Flexibility
Global optimization
Source Language
Multi Pass
single pass compilers are not possible for many
programming languages
Language Issues
Example Pascal:
Pascal was explicitly designed to be easy to implement with a
single pass compiler:
Every identifier must be declared before its first use.
?
var n:integer;
procedure inc;
begin
n:=n+1
end
procedure inc;
begin
n:=n+1
end;
Undeclared Variable!
var n:integer;
Language Issues
Example Pascal:
Every identifier must be declared before it is used.
How to handle mutual recursion then?
procedure ping(x:integer)
begin
... pong(x-1); ...
end;
procedure pong(x:integer)
begin
... ping(x1); ...
end;
10
Language Issues
Example Pascal:
Every identifier must be declared before it is used.
How to handle mutual recursion then?
forward procedure pong(x:integer)
procedure ping(x:integer)
begin
... pong(x-1); ...
OK!
end;
procedure pong(x:integer)
begin
... ping(x1); ...
end;
11
Virtual Machine for Compiler
What is Compiler?
Phases of Compiler
What is a Virtual Machine?
Java Virtual Machine
Structure of JVM
Loaders
Linkers
Assembler
What is a Compiler?
A compiler is a piece of software that translates a source program
from source language to equivalent program in target language. An
important feature of the compiler is to identify error in source program
during compilation/translation process.
Phases of Compiler
Lexical Analyzer
The lexical analyzer reads the stream of characters making up the
source program and groups the characters into meaningful sequences
called lexem
For each lexeme, the lexical analyzer produces as output a token of
the form (token-name, attribute-value)
It is also called Scanner.
Syntax Analyzer
analyzes the source code (token stream) against the production rules
to detect any errors in the code.
The output of this phase is a parse tree
Goal: Recover the structure described by that series of tokens.
Goal: Report errors if those tokens do not properly encode a structure
Semantic Analyzer
The semantic analyzer uses the syntax tree and the information in the
symbol table to check the source program for semantic consistency.
Functions
Scope checking: verify that all applied occurrences of identifiers are
declared
Type checking: verify that all operations in the program are used
according to their type rules
Outputs semantically verified parse tree
Intermediate Code Generations
This a phase where compiler produces and low level intermediate
representation of the which can have a variety of forms
Postfix
Syntax Tree
Three Address Code
Three Address Code
Three address code is the sequence of statements of the general form
as given
a := b op c
Where a, b and c are identifiers names and op represents operator. And
the symbol := stands for assignment operator
This representation is called three address code because it has three
addresses, two for operands and one for the result
Quadruples
three address code with four fields
Op (operator)
Arg1(argument1)
Arg2(argument2)
Result
t1 := i + j
Quadruple representation of
i := t1 + k
Op
ARG1
ARG2
Result
(0)
t1
(1)
t1
Triples
three address code with three fields
Op (operator)
Arg1(argument1)
Arg2(argument2)
Op
ARG1
ARG2
(0)
(1)
(0)
Code Optimization
It is a program transformation technique, which tries to improve the
code by making it consume less resources (i.e. CPU, Memory) and
deliver high speed
Optimization depends on various factors like
Memory
Algorithm
Execution Time
Programming Language and
Others
Code Generation
The code generator takes as input an intermediate representation of
the source program and maps it into the target language.
The target program may be in
Assembly Language
Relocatable machine code or
Absolute machine code
What is a Virtual Machine?
A virtual machine (VM) is a software implementation of a machine
that executes programs like a physical machine. It shares physical
hardware resources with the other users but isolates the OS or
application to avoid changing the end-user experience.
Why use Virtual Machines?
compilers for languages like Java and C#
To make language platform independent
Security ( Sand Boxing)
strong syntactic and structural constraints
Garbage Collection
Robustness
other
Architecture of JVM
Class Loader Subsystem
Load is the part responsible for loading bytecode into the memory. Class loader loads files from different sources
using different loader such as
Bootstrap Class Loader responsible for loading java internal classes from rt.jar which is distributed with JVM.
Extension class loader responsible for loading additional application jars that reside in jre/lib/ext
Application class loader loads classes from valued specified in your CLASSPATH environment variables and
from cp parameterized folder.
Link is the phase where much of the work is done. It consists of three
parts
Verify This is the part where the bytecode is verified according to the JVM
class specifications.
Prepare This is the part where the memory is allocated for the static variables
inside the class file. The memory locations are than initialized with the default
values.
Resolve In this part all the symbolic references to the current classes are
resolved with actual reference. For example one class has reference to other
class.
Initialization This is the phase where the actual values of the static
variable define in source code are set unlike prepare where the
default value are set.
Runtime Data Areas
Method Areas The place where metadata corresponding to
class is stored.
Heap Areas The place where object data is stored
PC Registers They are called program counter registers i.e.
point to the next instruction to be executed per thread.
Stack Areas Contain stack frame corresponding to the
current method execution per thread
Native Method Stacks Contains stack for the native method
execution per thread. The stack contains memory portions
for different parts of functions like parameters or local
variables etc
Execution Engine
Assembler
It is a program which converts assembly language into machine code.
Assembler performs the translation in similar way as compiler. But
assembler is used to translate low-level programming language
whereas compiler is used to translate high-level programming
language.
An assembler performs the following functions
Convert mnemonic operation codes to their machine language codes
Convert symbolic (e.g., jump labels, variable names) operands to their
machine addresses
Use proper addressing modes and formats to build efficient machine
instructions
Translate data constants into internal machine representations
Output the object program and provide other information (e.g., for linker and
loader)
Two Pass Assembler
Pass 1
Assign addresses to all statements in the program
Save the values (addresses) assigned to all labels (including label and
variable names) for use in Pass 2 (deal with forward references)
Perform some processing of assembler directives (e.g., BYTE, RESW,
these can affect address assignment)
Pass 2
Assemble instructions (generate opcode and look up addresses)
Generate data values defined by BYTE, WORD
Perform processing of assembler directives not done in Pass 1
Write the object program and the assembly listing
Linker
A programming tool which combines one or more partial Object Files
and libraries into a (more) complete executable object file
Linker (cont.)
Three tasks
Searches the program to find library routines used by program, e.g. printf(),
math routines,
Determines the memory locations that code from each module will occupy
and relocates its instructions by adjusting absolute references
Resolves references among files
Loader
Part of the OS that brings an executable file residing on disk into
memory and starts it running
Steps
Read executable files header to determine the size of text and data segments
Create a new address space for the program
Copies instructions and data into address space
Copies arguments passed to the program on the stack
Initializes the machine registers including the stack ptr
Jumps to a startup routine that copies the programs arguments from the
stack to registers and calls the programs main routine
Loader (cont.)
References
http://www.slideshare.net/tousifirshad/virtual-machine-45872339
http://docs.oracle.com/javase/specs/jvms/se8/html/jvms-1.html#jvms-1.2
https://www.tutorialspoint.com/compiler_design/compiler_design_intermediate_code_generati
ons.htm
https://www.quora.com/Why-dont-C-and-Java-compile-to-machine-code
http://www.artima.com/insidejvm/ed2/jvmP.html
https://www.youtube.com/watch?v=ZBJ0u9MaKtM
https://en.wikipedia.org/wiki/Source_code_editor
http://www.slideshare.net/BinYang7/linker-and-loader-upload?qid=a64e8e07-b14a-445b-95078b4a5dbccf30&v=&b=&from_search=6
http://www.slideshare.net/akshaykhatri125/linker-and-loader-18359420
Compilers : principles, techniques, and tools 1 Alfred V. Aho ... [et al.]. -- 2nd ed.
M._Joseph_-_Elements_of_Compiler_Design