Compiler Design Notes
Compiler design is a complex process of translating a program written in a high-level
programming language into machine code or intermediate code. A compiler performs various
stages of translation, optimization, and error-checking to generate an efficient executable.
1. Introduction to Compiler Design
Definition: A compiler is a program that translates a source code written in a high-level
language into machine code or an intermediate representation. The output is typically
an executable or bytecode.
Purpose of Compiler:
o Translation: Convert high-level code into machine code.
o Optimization: Enhance performance of the program.
o Error Checking: Identify syntax and semantic errors.
2. Phases of a Compiler
A compiler is divided into several phases, each responsible for a specific task.
2.1. Lexical Analysis (Scanner)
Function: Converts the raw source code (sequence of characters) into a sequence of
tokens (meaningful chunks).
Tokens: Basic units of syntax (keywords, identifiers, literals, operators).
Components:
o Lexer: The program that performs lexical analysis.
o Regular Expressions: Used to define token patterns.
o Finite Automata: Helps in implementing lexers (DFA/NFA).
Output: Token stream.
2.2. Syntax Analysis (Parser)
Function: Analyzes the token stream to ensure that the code adheres to the grammar of
the language. It constructs a parse tree (or syntax tree).
Grammar Types: Context-free grammar (CFG) is most commonly used.
Components:
o Parser: A program that performs syntax analysis.
o Parse Tree: A tree structure that represents the syntactic structure of the code.
o Context-Free Grammar: Defines the syntax rules for a language (productions).
Parsing Techniques:
o Top-Down Parsing: Recursive descent parser, LL parser.
o Bottom-Up Parsing: Shift-reduce, LR parser.
2.3. Semantic Analysis
Function: Ensures that the program has a meaningful structure by checking for semantic
errors (e.g., type mismatches, undeclared variables).
Tasks:
o Symbol Table Construction: Stores information about variables, functions, and
objects.
o Type Checking: Verifies that operations are applied to compatible types.
o Scope Checking: Ensures that variables are declared before use and are in the
correct scope.
2.4. Intermediate Code Generation
Function: Translates the source code into an intermediate form, which is easier to
manipulate than machine code and more abstract than source code.
Intermediate Code (IC):
o Three-Address Code (TAC): Each instruction has at most three operands (e.g., x =
y + z).
o Abstract Syntax Tree (AST): Intermediate representation of code that retains the
structure.
o Benefits: Easier optimization, portability across target machines.
2.5. Code Optimization
Function: Improves the intermediate code to make the final output more efficient in
terms of execution time, memory usage, etc.
Types of Optimization:
o Loop Optimization: Unrolling loops, reducing redundant calculations.
o Constant Folding: Precomputing constant expressions.
o Dead Code Elimination: Removing code that never executes.
o Inlining: Replacing function calls with the function’s body.
Machine-Independent Optimizations: Performed on the intermediate code.
Machine-Dependent Optimizations: Performed on the machine code.
2.6. Code Generation
Function: Converts the optimized intermediate code into target machine code (or
bytecode for virtual machines).
Tasks:
o Instruction Selection: Mapping intermediate operations to machine-level
instructions.
o Register Allocation: Assigning variables to machine registers.
o Code Emission: Generating the final machine code (or bytecode).
2.7. Code Linking and Assembly
Linking: Combines object files into a single executable, resolving references between
modules.
Assembly: Low-level machine code instructions are generated by an assembler, which
are then converted to binary.
3. Components of a Compiler
Lexical Analyzer (Lexer): Converts source code into tokens.
Syntax Analyzer (Parser): Validates the structure and builds a parse tree.
Semantic Analyzer: Performs type and scope checks.
Intermediate Code Generator: Converts syntax tree into intermediate code.
Optimizer: Enhances intermediate code for better performance.
Code Generator: Converts optimized intermediate code into machine code.
Error Handler: Detects and reports errors during various phases of compilation.
4. Symbol Table
Definition: A data structure used by the compiler to store information about variables,
functions, objects, types, scopes, and more.
Attributes in Symbol Table:
o Name: Identifier name (variable, function, etc.).
o Type: Data type of the variable or function.
o Scope: The region of the program where the symbol is valid.
o Address/Location: Memory location of the symbol (in case of variables).
Operations:
o Insert: Add symbols to the table.
o Lookup: Retrieve symbol information during analysis.
o Delete: Remove symbols that go out of scope.
5. Types of Errors
Lexical Errors: Invalid tokens or malformed strings.
Syntax Errors: Incorrect grammar or structure.
Semantic Errors: Mismatched types or undefined symbols.
Runtime Errors: Errors that occur during execution (e.g., division by zero).
Logical Errors: Incorrect program logic.
6. Examples of Compiler Tools
Lex (Lexical Analyzer Generator): Generates lexical analyzers from regular expressions.
Yacc/Bison (Parser Generator): Generates parsers from context-free grammar
specifications.
LLVM: A modular compiler framework that allows for optimization and code generation.
GCC (GNU Compiler Collection): A widely used open-source compiler for C/C++, Fortran,
and other languages.
Java Compiler (javac): Compiles Java code into bytecode.
7. Advanced Topics in Compiler Design
Just-In-Time (JIT) Compilation: Compiler optimization technique used by runtime
environments (e.g., Java, .NET) to compile code at runtime.
Garbage Collection: Automatic memory management and reclamation of unused
memory.
Multi-pass Compilation: The compiler may use multiple passes over the code to
generate the final output (e.g., first pass for lexical and syntax analysis, second pass for
code generation).
Compiler Construction Tools:
o ANTLR: A powerful parser generator for reading, processing, and executing
structured text.
o Flex: A tool for generating lexical analyzers.