Unit-1
Introduction to compilers
Topics
Compilers, Analysis of the Source Program
The Phases of a Compiler
Compiler Construction Tools
Cousins of the Compiler
The Grouping of Phases
Compilers
A compiler is software that translates or converts a program written in
a high-level language (Source Language) into a low-level language
(Machine Language or Assembly Language).
History of Compilers
In the 1950s, Grace Hopper developed the first compiler, leading to
languages like FORTRAN (1957), LISP (1958), and COBOL (1959).
The 1960s saw innovations like ALGOL, and the 1970s introduced C
and Pascal.
Compilers
Source code to Machine code
Machine code
Compilers
Role of Compilers
• A program written in a high-level language cannot run without
compilation.
• Each programming language has its own compiler, but the
fundamental tasks performed by all compilers remain the same.
• Translating source code into machine code involves multiple stages,
such as lexical analysis, syntax analysis, semantic analysis, code
generation, optimization.
History of Compilers
FORTRAN (1957)-Gfortran (GNU-Fortran)
Example:
Program hello
print *, “Hello, Fortran!”
End program hello
For Compilation:
gfortran hello.f90 –o hello
./hello
History of Compilers
LISP (1958): List Processing
Example:
(defun hello-world ()
(format t “Hello, world!~%”))
(hello-world)
To run using SBCL: Here SBCL(Steel Bank Common Lisp)
sbcl –script hello.lisp
History of Compilers
COBOL (1959)-Common Business Oriented Language
Example:
IDENTIFICATION DIVISION.
PROGRAM-ID. Helloworld.
PROCEDURE DIVISION.
DISPLAY “Hello world”.
STOP RUN.
Compilation:
Cobc –x –o Helloworld Helloworld.cob
./Helloworld
History of Compilers
ALGOL(1960’s):Algorithm Language
Example:
begin
integer I, sum;
sum := 0;
for I := 1 step 1 until 10 do
sum := sum + i;
print(sum)
end
1)Translate algol to Ca60 sum.a60 > sum.c
2)Compilegcc sum.c –o sum
3)Run./sum
History of Compilers
C and Pascal(1970’s).pas files)
Example:
Program Helloworld
Begin
writeln(‘Hello, World!);
end
Compile(FPC-Free Pascal Code)
Sudo apt install fpc
Fpc hello.pas
./helloLinux
Hello.exeWindows
Phases of Compilers
Phases of a Compiler
There are two major phases of compilation, which in turn have many
parts. Each of them takes input from the output of the previous level and
works in a coordinated way.
Phases of Compilers
HLL-High Level Language
Characters
Tokens Syntax Directed Translation
Front end
Parse Tree
SDT
3 Address Code
Back end
Machine code
Phases of Compilers
Lexical Analysis:
Preprocessor remove header files from program and provide lexical analysis
phase of stream of character.
And after it generates tokens as output
Ex: M = 3 / Y
• Here M and Y is identifier
• = is Assoignment
• / is Operator
• 3 is consider as Constant
Phases of Compilers
Ex: M = 3 / Y calculate how many tokens here
Solution
M is Token 1
= is Token 2
3 is Token 3
/ is Token 4
And Y is Token 5
So, answer is 5 Tokens of the above grammar have.
Phases of Compilers
Lexical Analysis roles:
• Remove comments
• Remove white space
• And remove blank space
1. Here we can say Lexeme, Scanner, and Tokenizing is the procedure of
Lexical Analysis.
2. Here use (DFA, NFA) to create character to Token.
3. After Tokenizing that tokens are saved into symbol table.
Phases of Compilers
4. Symbol Table already connected with Lexical Phase
5. Symbol Table store all the type of information of tokens like
• which is the reference value of token
• what is the size of token etc.
• Take this grammar as example M = 3 / Y here in program m is
integer or float or whatever data type which is defined so this type
of information stored Symbol Table.
Phases of Compilers
Syntax Analysis:
• In Syntax Analysis actually there is a Parser work
• That Parser Check that token have any error or not?
• Create a Parse Tree and check weather the grammar is ambiguate or
not?
Phases of Compilers
Example: Syntax Analysis of Parse Tree
M=3/Y
S
Id = ε
M 3 / id
Y
Phases of Compilers
Ex 2: X = Y + Z + 5
Phases of Compilers
Semantic Analysis:
• It means to check logical type of error (weather variable declare
properly or not?
• Check the Dynamic and static scope of grammar
• In compile time this phase work
• In this phase SDT Syntax Directed Translation) used to detect any type
of errors in grammar.
• Also, which action was taken after compilation is decided by SDT like
print value, procedure mathematic equation etc.
Phases of Compilers
Intermediate Code Generation (Uses 3 Address Code)
Here we can use Post Tag to convert Parse tree into 3 address code
This phase work in backend which uses Linux, Windows etc
After SDT grammar goes to 3 address code in the following example we
will see hoe it works:
Phases of Compilers
Example 1:
q1 = a + b
Solution
M = q1
Example 2:
S=P+Q*R
Solution:
m1 = Q * R
m2 = P + m1
A=m2
Phases of Compilers
Code Optimization:
• Main goal is to reduce lines of code, optimize code, or we can remove
unnecessary thing or provide substitute for that.
• In this phase code divide into block and in inner block we perform
local optimization and in outer block we perform global optimization.
Phases of Compilers
Example:
q1 = 2 * a
m1 = a + a
Here Multiplication operation is time consuming but addition isn’t so
we can do this type of code reductio operations on grammar.
Phases of Compilers
Target Machine code
• Here after all the phases the final outcome needs to generate as
machine code which depends which language in machine OS work.
• Weather machine is single accumulator or not?
• Single accumulator, stack accumulator, register organization these are
accumulation of machine.
• At the end Error handler handles each and every types of error which
occurs in any phase.
Phases of Compilers
Target Machine code
LDA X Load Accumulator
SHL
store Y, AC
R1 = R1 * R2 Register Accumulator
MUL R1, R2
Store Y, R1
Phases of Compilers
1. Analysis Phase
An intermediate representation is created from the given source code :
• Lexical Analyzer(Token parser)
• Syntax Analyzer(code breaker in token form)
• Semantic Analyzer(Type checking-the declaration and semantics of
programs are same foe ex float value in point etc.)
• Intermediate Code Generator(postfix notation, three address code etc.)
2. Synthesis Phase
• Code Optimizer
• Code Generator
Analysis v/s Synthesis Phase
Analysis phase of compiler
• Analysis phase reads the source program and splits it into multiple
tokens and constructs the intermediate representation of the source
program.
• And also checks and indicates the syntax and semantic errors of a
source program.
• It collects information about the source program and prepares the
symbol table. Symbol table will be used all over the compilation
process.
• This is also called as the front end of a compiler.
Analysis v/s Synthesis Phase
Synthesis phase of compiler
It will get the analysis phase input(intermediate representation and
symbol table) and produces the targeted machine level code.
• This is also called as the back end of a compiler.
Analysis v/s Synthesis Phase
COUSINS OF COMPILER
1. Macro processing:
A macro is a rule or pattern that specifies how a certain input
sequence should be mapped to an output sequence according to a
defined procedure. The mapping process that initiates a macro into a
specific output sequence is known as macro expansion.
2. File Inclusion:
Preprocessor includes header files into the program text. When the
preprocessor finds an #include directive it replaces it by the entire
content of the specified file.
COUSINS OF COMPILER
3. Rational Preprocessors:
These processors change older languages with more modern flow-of-
control and data-structuring facilities.
4. Language extension :
These processors attempt to add capabilities to the language by what
amounts to built-in macros. For example, the language Equal is a
database query language embedded in C.
COUSINS OF COMPILER
Assembler
• Assembler creates object code by translating assembly instruction
mnemonics into machine code. There are two types of assemblers
• One-pass assemblers go through the source code once and assume
that all symbols will be defined before any instruction that references
them.
• Two-pass assemblers create a table with all symbols and their values
in the first pass, and then use the table in a second pass to generate
code
Language Processing System
A Language Processing System is a system that takes a program
written in a programming language and translates it into machine-
understandable form. This is essential because computers only
understand machine language (binary code), while humans write code
in high-level or low-level languages.
Language Processing System
Language Processor
1. Editor
• Tool used to write and edit source code.
• Example: Visual Studio Code, Sublime Text
2. Preprocessor
• Handles preprocessor directives before compilation.
• Removes comments, expands macros, includes header files.
• Input: Source code (.c, .cpp)
• Output: Preprocessed code
Language Processor
3. Compiler
• Translates high-level language (like C/C++, Java) into assembly or
machine code.
• Performs syntax, semantic analysis, and optimization.
• Input: Preprocessed code
• Output: Object code (.o or .obj)
4. Assembler
• Converts assembly code into machine code.
• Input: Assembly code
• Output: Object file (binary code)
Language Processor
5. Linker
• Combines multiple object files and libraries into a single executable
file.
• Resolves function calls and variable references across files.
6. Loader
• Loads the executable into memory for execution.
• Managed by the operating system.
Examples
•C/C++ → Uses compiler and linker
•Python → Uses interpreter (CPython)
•Java → Compiler (to bytecode) + JVM (interpreter or JIT)
Translator
Translator in compiler design translates this high-level language into
machine code. The translator takes input as the source code (high-level
language) and gives output as the machine code.
This article will help you understand the translators in compiler design
and why there is a need for translators.
Translator
Feature Compiler Interpreter Assembler
Translates high-level
Executes high-level
programming Converts assembly language
Function programming code
languages into machine code into machine code.
directly, line by line.
code.
Processes assembly code into
Converts the entire Translates and executes
executable machine code,
Execution program before code line by line on the
typically in a one-to-one
execution. fly.
translation.
Execution is fast after Slower than compiled
Fast, but writing code in
compilation because code because
Speed assembly language is time-
the program is directly translation occurs at
consuming and complex.
in machine language. runtime.
Translator
Feature Compiler Interpreter Assembler
Errors are found and
Syntax errors are detected
Errors are detected and must be corrected at
Error during the assembly process,
must be corrected runtime, making it
Detection but logical errors can only be
before execution. easier to debug specific
found during execution.
lines of code.
Used for large Suitable for scripting, Used for low-level
applications where small programs, and programming tasks that
Usage
execution speed is rapid development require direct hardware
critical. cycles. manipulation.
Python, JavaScript, x86 assembly language, ARM
Examples C, C++, Rust
Ruby assembly language
Translator
Example
Stage Example Output
Input
<KEYWORD, int>, <ID, a>, <ASSIGN, =>,
Lexical int a = 5;
<NUM, 5>
Syntax Tokens Parse Tree (with rules like stmt → type id = num)
Semantic Tree Type-checked tree (e.g., int = int ✔)
Intermediate Checked Tree t1 = 5, a = t1
Optimization IR Removes unused vars, simplifies expressions
Code Gen Optimized IR MOV R1, #5 MOV a, R1
Translator
Architecture of translator it means phases of compiler.
With Example how we can evaluate we seen in the above example
Now solve this Grammar:
I=P*R*N/100
Translator
Lexeme Token Type
I Identifier
= Assignment Operator
P Identifier
* Multiplication Op
Lexical Analysis(Tokenizing)
R Identifier
* Multiplication Op
N Identifier
/ Division Operator
100 Constant
; Semicolon
Translator
I=P*R*N/100
Parse Tree =
I /
* 100
* N
P R
Translator
3. Semantic Analysis
• Ensures all variables (I, P, R, N) are declared.
• Checks type compatibility (e.g., all numeric).
• Reports type mismatch if any.
4. Intermediate Code Generation
Generates 3-address code (TAC):
t1 = P * R
t2 = t1 * N
t3 = t2 / 100
I = t3
Translator
5. Intermediate code generation and code optimization
MOV R1, P
MUL R1, R
MUL R1, N
MOV R2, #100
DIV R1, R2
MOV I, R1
Compiler Construction Tools
Compiler Construction Tools
Compiler construction tools are specialized software that help
developers create compilers more efficiently. Here are the key tools:
1. Parser Generators: It creates syntax analyzers (parsers) based on
grammatical descriptions of programming languages.
2. Scanner Generators: It produces lexical analyzers using regular
expressions to define the tokens of a language.
Compiler Construction Tools
3. Syntax-Directed Translation Engines: It generates intermediate
code in three-address format from input comprising a parse tree.
4. Automatic Code Generators: It converts intermediate language
into machine language using template matching techniques.
5. Data-Flow Analysis Engines: It supports code optimization by
analyzing the flow of values throughout different parts of the
program.
6. Compiler Construction Toolkits: It provides integrated routines to
facilitate the construction of various compiler components.
Types of Compilers
Types of Compiler
• Self Compiler: When the compiler runs on the same machine and
produces machine code for the same machine on which it is running
then it is called as self compiler or resident compiler.
• Cross Compiler: The compiler may run on one machine and produce
the machine codes for other computers then in that case it is called a
cross-compiler. It is capable of creating code for a platform other than
the one on which the compiler is running.
Types of Compilers
• Source-to-Source Compiler: A Source-to-Source Compiler or
transcompiler or transpiler is a compiler that translates source code
written in one programming language into the source code of another
programming language.
• Single Pass Compiler: When all the phases of the compiler are
present inside a single module, it is simply called a single-pass
compiler. It performs the work of converting source code to machine
code.
Types of Compilers
• Two Pass Compiler: Two-pass compiler is a compiler in which the
program is translated twice, once from the front end and the back from
the back end known as Two Pass Compiler.
• Multi-Pass Compiler: When several intermediate codes are created in
a program and a syntax tree is processed many times, it is called
Multi-Pass Compiler. It breaks codes into smaller programs.
• Just-in-Time (JIT) Compiler: It is a type of compiler that converts
code into machine language during program execution, rather than
before it runs. It combines the benefits of interpretation (real-time
execution) and traditional compilation (faster execution).
Operations of Compilers
Operations of Compiler
These are some operations that are done by the compiler.
• It breaks source programs into smaller parts.
• It enables the creation of symbol tables and intermediate
representations.
• It helps in code compilation and error detection.
• it saves all codes and variables.
• It analyses the full program and translates it.
• Convert source code to machine code.
Operations of Compilers
Operations of Compiler
These are some operations that are done by the compiler.
• It breaks source programs into smaller parts.
• It enables the creation of symbol tables and intermediate
representations.
• It helps in code compilation and error detection.
• it saves all codes and variables.
• It analyses the full program and translates it.
• Convert source code to machine code.
Operations of Compilers
Features of Compiler
1. Lexical Analysis
• Breaks source code into tokens (keywords, identifiers, operators).
• Removes white spaces and comments.
2. Syntax Analysis
• Checks the grammar of the code using parse trees.
• Detects syntax errors (e.g., missing ; or braces).
3. Semantic Analysis
• Ensures logical correctness of the program.
• Checks type compatibility, undeclared variables, etc.
Operations of Compilers
Features of Compiler
4. Intermediate Code Generation
• Converts code into a platform-independent intermediate
representation.
• Useful for portability and optimization.
5. Code Optimization
• Improves performance by reducing redundant or inefficient code.
• Example: Replacing x = x + 0 with x.
6. Code Generation
• Produces the target machine code (assembly or binary).
• Ensures correct memory and register usage.
Operations of Compilers
Features of Compiler
7. Error Detection and Reporting
• Identifies and reports lexical, syntactic, semantic, and runtime
errors.
• Provides meaningful error messages to the programmer.
8. Symbol Table Management
• Maintains a symbol table for all identifiers (variables, functions).
• Stores scope, type, memory location, etc.
Compilers
Advantages of Compiler Design
• Efficiency: Compiled programs are generally more efficient than
interpreted programs because the machine code produced by the
compiler is optimized for the specific hardware platform on which it
will run.
• Portability: Once a program is compiled, the resulting machine code
can be run on any computer or device that has the appropriate
hardware and operating system, making it highly portable.
Compilers
Advantages of Compiler Design
• Error Checking: Compilers perform comprehensive error checking
during the compilation process, which can help catch syntax, semantic,
and logical errors in the code before it is run.
• Optimizations: Compilers can make various optimizations to the
generated machine code, such as eliminating redundant instructions or
rearranging code for better performance.
Compilers
Disadvantages of Compiler Design
• Longer Development Time: Developing a compiler is a complex and
time-consuming process that requires a deep understanding of both the
programming language and the target hardware platform.
• Debugging Difficulties: Debugging compiled code can be more
difficult than debugging interpreted code because the generated
machine code may not be easy to read or understand.
Compilers
Disadvantages of Compiler Design
• Lack of Interactivity: Compiled programs are typically less
interactive than interpreted programs because they must be compiled
before they can be run, which can slow down the development and
testing process.
• Platform-Specific Code: If the compiler is designed to generate
machine code for a specific hardware platform, the resulting code may
not be portable to other platforms.