23CS2204
Compiler Design
Dr. Sadu Chiranjeevi
Assistant Professor
Department of Computer Science and Engineering
[email protected]
1
Acknowledgements
• Most of the text in the slide is based on
classic text Compilers: Principles,
Techniques, and Tools by Aho, Sethi,
Ullman and Lam
2
Motivation
• Language processing is an important
component of programming
3
Motivation
• Language processing is an important
component of programming
• A large number of systems software
and application programs require
structured input
– Operating Systems (command line processing)
– Databases (Query language processing)
– Type setting systems like Latex
3
Motivation
• Language processing is an important
component of programming
• A large number of systems software
and application programs require
structured input
– Operating Systems (command line processing)
– Databases (Query language processing)
– Type setting systems like Latex
• Software quality assurance and
software testing 3
Motivation
• Where ever input has a structure
one can think of language
processing
4
Motivation
• Where ever input has a structure
one can think of language
processing
• Why study compilers?
– Compilers use the whole spectrum of
language processing technology
4
Expectations?
• What will we learn in the course?
5
Objectives
• Be able to build a compiler for a (simplified)
(programming) language
• Know how to use compiler construction tools,
such as generators of scanners and parsers
• Be able to define LL(1), LR(1), and LALR(1)
grammars
• Be familiar with compiler analysis and
optimization techniques
• … learn how to work on a larger software project!
Bit of History
• How are programming languages implemented? Two
major strategies:
– Interpreters (old and much less studied)
– Compilers (very well understood with
mathematical foundations)
8
Bit of History
• How are programming languages implemented? Two
major strategies:
– Interpreters (old and much less studied)
– Compilers (very well understood with
mathematical foundations)
• Some environments provide both interpreter and
compiler. Lisp, scheme etc. provide
– Interpreter for development
– Compiler for deployment
–
8
Bit of History
• How are programming languages implemented? Two
major strategies:
– Interpreters (old and much less studied)
– Compilers (very well understood with
mathematical foundations)
• Some environments provide both interpreter and
compiler. Lisp, scheme etc. provide
– Interpreter for development
– Compiler for deployment
• Java
– Java compiler: Java to interpretable bytecode
– Java JIT: bytecode to executable image
8
Some early machines and
implementations
• IBM developed 704 in 1954. All
programming was done in assembly
language. Cost of software
development far exceeded cost of
hardware. Low productivity.
9
Some early machines and
implementations
• IBM developed 704 in 1954. All
programming was done in assembly
language. Cost of software
development far exceeded cost of
hardware. Low productivity.
• Speedcoding interpreter: programs ran
about 10 times slower than hand written
assembly code
9
Some early machines and
implementations
• IBM developed 704 in 1954. All
programming was done in assembly
language. Cost of software
development far exceeded cost of
hardware. Low productivity.
• Speedcoding interpreter: programs ran
about 10 times slower than hand written
assembly code
• John Backus (in 1954): Proposed a
program that translated high level
expressions into native machine code.
Skeptism all around. Most people thought
it was impossible
9
Some early machines and
implementations
• IBM developed 704 in 1954. All
programming was done in assembly
language. Cost of software
development far exceeded cost of
hardware. Low productivity.
• Speedcoding interpreter: programs ran
about 10 times slower than hand written
assembly code
• John Backus (in 1954): Proposed a
program that translated high level
expressions into native machine code.
Skeptism all around. Most people thought
it was impossible
• Fortran I project (1954-1957): The
first compiler was released 9
Fortran I
• The first compiler had a huge impact on the programming
languages and computer science. The whole new field of
compiler design was started
10
Fortran I
• The first compiler had a huge impact on the programming
languages and computer science. The whole new field of
compiler design was started
• More than half the programmers were using Fortran by 1958
10
Fortran I
• The first compiler had a huge impact on the programming
languages and computer science. The whole new field of
compiler design was started
• More than half the programmers were using Fortran by 1958
• The development time was cut down to half
10
Fortran I
• The first compiler had a huge impact on the programming
languages and computer science. The whole new field of
compiler design was started
• More than half the programmers were using Fortran by 1958
• The development time was cut down to half
• Led to enormous amount of theoretical work (lexical
analysis, parsing, optimization, structured programming,
code generation, error recovery etc.)
10
Fortran I
• The first compiler had a huge impact on the programming
languages and computer science. The whole new field of
compiler design was started
• More than half the programmers were using Fortran by 1958
• The development time was cut down to half
• Led to enormous amount of theoretical work (lexical
analysis, parsing, optimization, structured programming,
code generation, error recovery etc.)
• Modern compilers preserve the basic structure of the
Fortran I compiler !!!
10
The big picture
• Compiler is part of program
development environment
• The other typical components of this
environment are editor, assembler, linker,
loader, debugger, profiler etc.
• The compiler (and all other tools) must
support each other for easy program
development
11
Source
Programmer Program
Editor
12
Source Assembly
Programmer Program code
Editor Compiler
12
Source Assembly
Programmer Program code
Editor Compiler Assembler
Machine
Code
12
Source Assembly
Programmer Program code
Editor Compiler Assembler
Machine
Code
Linker
Resolved
Machine
Code
12
Source Assembly
Programmer Program code
Editor Compiler Assembler
Machine
Code
Linker
Resolved
Machine
Code
Loader
Executable
Image
Execution on
the target machine
12
Source Assembly
Programmer Program code
Editor Compiler Assembler
Machine
Code
Linker
Resolved
Machine
Code
Loader
Executable
Image
Execution on
the target machine
Normally end
up with error 12
Source Assembly
Programmer Program code
Editor Compiler Assembler
Machine
Code
Linker
Resolved
Machine
Code
Debugger Loader
Debugging Execute under
Control of Executable
results Image
debugger
Execution on
the target machine
Normally end
up with error 12
Source Assembly
Programmer Program code
Editor Compiler Assembler
Machine
Programmer Code
does manual
correction of Linker
the code Resolved
Machine
Code
Debugger Loader
Debugging Execute under
Control of Executable
results Image
debugger
Execution on
the target machine
Normally end
up with error 12
Preprocessors, Compilers,
Assemblers, and Linkers
Skeletal Source Program
Preprocessor
Source Program
Compiler
Target Assembly Program
Assembler
Relocatable Object Code
Linker/Loader Libraries and
Relocatable Object Files
Absolute Machine Code
Compilers and Interpreters
• “Compilation”
– Translation of a program written in a source
language into a semantically equivalent
program written in a target language
Input
Source Target
Compiler
Program Program
Error messages Output
Compilers and Interpreters
(cont’d)
• “Interpretation”
– Performing the operations implied by the
source program
Source
Program
Interpreter Output
Input
Error messages
The Analysis-Synthesis Model of
Compilation
• There are two parts to compilation:
– Analysis determines the operations implied by
the source program which are recorded in a tree
structure
– Synthesis takes the tree structure and translates
the operations therein into the target program
Other Tools that Use the
Analysis-Synthesis Model
• Editors (syntax highlighting)
• Pretty printers (e.g. Doxygen)
• Static checkers (e.g. Lint and Splint)
• Interpreters
• Text formatters (e.g. TeX and LaTeX)
• Silicon compilers (e.g. VHDL)
• Query interpreters/compilers (Databases)
Phases of Compiler
Phases of Compiler
Phases of Compiler
Lexical Analysis
• It is the first phase of compilation
process.
• It takes source code as input.
• Reads the source program one
character at a time and converts it into
meaningful lexemes.
• Lexical analyzer represents these
lexemes in the form of tokens.
• The lexical analyzer is also referred
to as a scanner.
Lexical Analysis
Lexical Analysis
Lexical Analysis: Functions
• Identification of illegal tokens.
• Identification of lexical units in source
code.
• Classification of lexical units e.g, constants,
keywords into different tables. (it ignores
comments)
Lexical Analysis: Challenges
• We must know what the word
separators are
• The language must define rules for
breaking a sentence into a sequence of
words.
• Normally white spaces and
punctuations are word separators in
languages.
Syntax Analysis
• Converts the stream of tokens into a parse
tree.
• A syntax analyzer can also be referred to as
the parser.
• All tokens are checked against the grammar
of the source code to ensure correctness.
Syntax Analysis: Functions
• Report syntax errors.
• Construction of a parse tree.
• Obtaining tokens from the lexical analyzer
• Checking for syntax errors.
Semantic Analysis
• A semantic analyzer determines the validity
of the parse tree.
• An annotated syntax tree is the output
Semantic Analysis: Functions
• Type checking.
• Checking if source language permits
provided operands or not.
• Collection of type information.
• Saving gathered information to symbol
table or syntax tree.
• Report semantic errors.
• Checking for semantic errors.
Intermediate Code Generation
• An intermediate code generator generates
three address code (assembly-like
instructions with three operands per
instruction).
• Each operand acts like a register.
• The code is intermediate, that is, it is
neither high-level or machine code.
• This phase acts as a bridge from analysis to
synthesis.
Intermediate Code Generation:
Functions
• Maintaining precedence ordering of the
source language.
• Translation of intermediate code into target
language.
• Holding values computed during translation
process.
• Holding operands of an instruction
Code Optimization
• It reduces the size of the program by
reducing the number unnecessary of lines of
code in the three address code.
• Note that this alteration will not lose the
meaning of the code.
Code Optimization: Functions
• Removal of unused variables and
unreachable code.
• Establishing trade offs between execution
and compilation speed.
• Generates streamlined code from its
intermediate representation.
• Removal of unaltered statements from a
loop.
Code Generation
• Assembly code is generated from the
optimized code.
• For each variable used by the program a
memory location is allocated.
• Functions:
• Converting intermediate code to target code.
• Selection and allocation of memory
locations and registers.
Symbol Table
• It is a data structure of compiler that stores
identifiers with their name and types
therefore enabling easier search and
retrieval.
• It interacts with the error handler and all
phases of the compiler for updates.
• It is responsible for scope management.
Symbol Table
• Literal constants and strings.
• Compiler generated temporaries.
• Function names.
• Variable names and constants.
• Labels in source languages.
Error Handling Routine
• It is responsible for detecting an error,
reporting it and implement a recovery
strategy for handling the error.
• Common errors that happen are;
– Invalid character sequences during scanning.
– Invalid token sequences.
– Scope error.
– Parsing in semantic analysis.
Error Handling Routine: Example
Errors
• Lexical analyzer - Wrong spelling of tokens.
• Syntax analyzer - Missing parenthesis.
• Intermediate code generator - Mismatch of
operands for an operator.
• Code optimizer - Unreachable statements.
• Code generator - Improper allocation of
registers or full memory.
• Symbol table - Multiple declared identifiers.
Functions in phases of the compiler
• lexical analysis - Creation of a new table.
• Syntax analysis - Add information regarding types scope.
etc.
• Semantic analysis - Use already stored information to
check semantics and update accordingly.
• Intermediate code generation - Reference for run time
allocation and storage of temporary variable information.
• Code optimization - Uses symbol table for machine-
dependent optimization.
• Code generation - Uses the address information of
identifiers stored in symbol table to generate code.
Phases of Compiler:
Example