ECS 142: Compilers Administrative Matters
Course Objectives Instructor
After completing this course, you should be able to demonstrate your understanding of Raju Pandey, 3041 Kemper Hall, 752-3584
modern compilers for imperative languages by Email:
[email protected] understanding different phases of a compiler, and Office Hrs: Tu/Thu. 1:45 PM - 3:00 PM.
designing and implementing a compiler for a subset of Java. TA
Saigopal Thota
This Course · · · Email:
[email protected] is not a “grand tour” of different compilers, interpreters, translators, pre-processors, Office hours: To be announced.
etc.. Course Details
instead, it takes an in-depth look at (i) the theory behind program transformation,
Midterm: 5 May (In class)
and (ii) design and implementation of compilers.
Final: 4 June, 10:30-12:30, 117 Olson.
An important course as it integrates ideas from
Language theory (Reg.Exps, CFGs, Pushdown automaton, Parsing, Attributed Communication
grammars, etc.) SmartSite newsgroups
Programming languages (Abstractions, semantics and their implementations) Course home page: http://www.cs.ucdavis.edu/ pandey
Computer architecture (Assembly programming)
Software engineering (reusability, object-oriented software development)
Organizing Theme: Phases
Study aspects of compilers in terms of different phases, and how each phase is
implemented.
Both theoretical and pragmatic aspects emphasized.
2/1 3/1
Activities Lecture Notes and Reading Material
Programming Project (50%) Follow text book mostly
Java− −: Implement a small subset of Java. Some additional material from other books - mostly related with project
Four parts Read text book: hard material
More on this later Transparencies available from Course Web Site: pdf versions can be accessed from
Homeworks (5%) course home page
Tests (45%)
Midterm (20%) Computing Resources
Two hours final (25%) CSIF workstations (Basement EU II)
Policies Software tools: g++, lex, yacc, make, STL
PC/Macs okay but need to be able to run your generated code on MIPS machines
Re-grades within one week.
NO MAKEUP EXAMS. directly. (SPIM is not sufficient)
NO CHEATING. Background
Attendance Theory : Regular expressions + Context free grammars
Not required Language: Types, Type-equivalence, Semantics of control constructs,
However, responsible for all material Object-orientation.
Textbooks and references Will need to learn (subset of) Java..
Architecture: Microprocessor architectures + Assembly programming.
Primary text book: Dragon book (Aho, Lam, Sethi, and Ullman) Software development: C++, make, gdb, STL, Lex, Yacc.
Reference books:
C++ text books (For instance, Lippman)
STL handbooks
Java (Programming language specification)
Lex and Yacc
Assembly programming on MIPS machines
4/1 5/1
Course project What is a compiler?
Problem statement: Write a compiler for a subset of Java. Transform a program represented in a notation (source language) into another
Output: MIPS assembly code. program represented in another notation (target language).
Substantial software development project. Execute generated program on a machine that can interpret target language
Four major parts: Focus: Must preserve semantics of original program
1. Symbol table (2%). A very general definition of a compiler.
2. Syntactic analysis (8%): Use tools (lex and yacc) for writing the lexical analyzer and Note that compiler is independent of application program that it compiles.
parser. Other examples of tools that transform source program into another source program
3. Semantic Analysis (25%): Three subparts Pre-processors: transform macros.
3.1 Processing of declarations Translators
3.2 Processing of expressions and statements
3.3 Type checking and other semantic analysis Interpreters: Execute the program without explicitly performing the translation.
4. Code generation (15%): Use MIPS simulator to debug and execute your generated Program is like any other data for interpreter.
code. Text formatters such as TEX, groff etc.
C++: language for writing the compiler. Differences arise in the view of the underlying machine that the translators assume.
Notes/advise about project: Higher the abstraction, lesser the work a compiler/translator has to do.
Freely use existing libraries (e.g., STL, LEDA etc.) Our focus: Compilers that take a source program, perform analysis on it, and
Develop software at a higher level of abstraction: focus more on putting software synthesize code that is closer to bare machine (as exemplified by microprocessors).
components together; less on building from first principle.
Be smart and methodical about (i) writing, (ii) maintaining, and (iii) testing your
software.
Use tools such as make, rcs/sccs, gnu testing tools etc. to help with software
development.
Use encapsulation, abstraction, inheritance, genericity..
6/1 7/1
Distinct Phases of Compilers Lexical Analysis
Divided into two major components: Break the source program into meaningful units, called tokens.
1. Analysis of source program to uncover the structure of the program, and Example:
2. Synthesis of target machine language program that, when executed, will correctly position := initial + rate * 60;
perform the operations of source program. Compiler sees the above as a sequence of characters, and categorizes them into
Source Program meaningful units:
Analysis identifiers: position, initial, rate
Lexical tokens Syntax syntax Semantic
constants: 60
Analyzer Analyzer tree Analyzer operators: :=, +, *
punctuation: ;
Symbol
Each unit may be assigned a unique identifier (possibly an integer).
Annotated Tree
Table Program is thus now a sequence of tokens.
Other tasks:
Synthes
Remove extra white spaces
IC Code Code
Generator Inter. Optimizer Inter. Generator Pass over comments
Code Code Convert all alphabets into one case (in languages that do not distinguish cases)
Target Pr
Interaction among different phases more complex than shown here.
8/1 9/1
Syntax Analysis Semantic Analysis
Determine the structure of programs, individual expressions, and statements, that is, Analyze source program for errors that cannot be checked in the earlier phases,
determine how a program unit is constructed from other units. mostly semantic errors:
Language: set of legal strings. A program is a string that may and may not belong Verification of scope rules: is the identifier defined in a scope? If so, where does its
to the language. definition come from?
Example: is position defined?
Language represented by formalism called context free grammar. Grammar used to Type checking: Check if types of operands of operators are valid.
define a set of rules that determine if a sentence belongs to a language. Example: Can rate be multiplied with *?
A parser uses the grammar to create a representation of a program. Infer types from expressions, operators, and identifier declarations.
:=
Gather semantic information so that they can be used during code generation phase.
id1 + Summarize source program information by annotating the tree. May even create a
id2 * different tree by removing redundant nodes in the parse tree.
An abstract tree, called syntax tree, may be created whose nodes are operators and
id3 6
leaves are operands.
Parser also detects syntactic errors:
Did ’;’ end the sentence?
Do ’(’ and ’)’ match?
10 / 1 11 / 1
Symbol Table Intermediate Code Generation
A major component of a compiler. Used to assist various phases. Some compilers generate an explicit intermediate representation of the source
Stores information about various symbols as the compiler goes through different program.
phases. Intermediate representation: Forms an abstract machine.
Various attributes: Intermediate representation closer to the machine form, and is usually easy to
Storage allocation/address produce. One such representation: “three-address code.”
Type Three-address code consists of a sequence of instructions, each with at most 3
Scope operands.
If a procedure, its name, number of parameters, etc.
temp1 := inttoreal(60)
All attributes that distinguish an identifier. temp2 := id3 * temp1
Implemented as a data structure containing a record for each identifier. (May have a temp3 := id2 * temp2
more complicated structure because of scoping rules but we will see that later.) id1 := temp1
Lexical analyzer phase may create an entry for an identifier in the table. Why intermediate form?
Most information added during the semantic analysis phase. Portability of compilers: Change the translator from IR to individual machines to port
compiler.
Write only the front end for different programming languages.
Project: No intermediate code step.
12 / 1 13 / 1
Code Optimization Code Generation
Improve intermediate code: Generation of code that can be directly (or by assembling) executed on the target
Evaluate constant expressions at compile time: for instance, multiply two constants etc. machine.
Delete dead code: code that is never reached. Issues:
Remove constant expressions from loops Selection of memory location for variables
Example: Selection of instructions
temp1 := id3 * 60.0 Mapping between registers and variables.
id1 := id2 * temp1 Compilers may be distinguished according to the kind of machine code that they
Can be very complex and slow as it may involve many passes over the generated generate:
code. Pure machine code: Generate code for a specific machine.
One optimization may lead to another... Augmented machine code: Generate code for a machine architecture augmented with
Can involve both local and global optimizations. operating system routines and language support routines. Combination of
machine+operating system+other routines: virtual machine.
Virtual machine code: Generated code is composed entirely of virtual instructions.
Attractive technique for producing a transportable compiler. Also programs can be
ported easily.
Examples: Java byte code representation of Java programs. Also, P-code
representation of Pascal programs (first introduced by N. Wirth’s group.)
14 / 1 15 / 1
Formats of generated code Compiler organization alternatives
Symbolic Format: Generate code in the assembly format. Compiler components can be linked together in a variety of different ways.
Advantage: cross-compilation. Also check the correctness of the compiler by looking Pass: how many times does the compiler need to go over source programs?
at the output. What can make single pass difficult?
Disadvantage: an extra step as output code must be processed by an assembler after Space constraints
the compiler. Usage before declaration not required
Relocatable binary format: Binary format in which external references and local
forward gotos, function definitions etc.
instruction and data addresses are not yet bound. Single pass for analysis and synthesis: Scanning, parsing, checking, and translation
Addresses assigned related to the beginning of the module or related to symbolically to target machine code are interleaved.
named locations (typical output of assembler.) Parser requests tokens from the lexical analyzer as it needs them during the creation of
A linkage step is required to add any support libraries and other precompiled parse tree.
routines. As parse tree is constructed, semantic analysis and code generation takes place.
Memory Image (Load and Go) form: Compiled output loaded in compiler’s address One pass analysis and IR synthesis + a code generation pass
space and executed. +ve: flexible (can have multiple code generation phases)
Advantage: Single step to execution. Good for software development where repeated Multi-pass analysis: Used when compilers were required to fit in constrained spaces.
changes are made in the program. Also, where variables/functions are used before they are defined.
Disadvantage: Ability to interface with externals, libraries, and pre-compiled routines Multi-pass synthesis: Allow one or more optimizer passes to transform IR before it is
limited. processed by the code generator.
16 / 1 17 / 1