Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
10 views51 pages

CSC501 (Autumn 2025-2026) L01 - L02

compiler design by hafizul sir

Uploaded by

Abir Saha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views51 pages

CSC501 (Autumn 2025-2026) L01 - L02

compiler design by hafizul sir

Uploaded by

Abir Saha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 51

Compiler Design

CSC 501

SK Hafizul Islam

Department of CSE, IIIT Kalyani

July 16, 2025

(Department of CSE, IIIT Kalyani) Compiler Design July 16, 2025 1 / 35


Agenda I

1 Books
2 Time table
3 Evaluation Plan
4 Policies
5 Online Resources
6 Prerequisite
7 Objectives
8 Learning Outcomes
9 Course descriptions
10 Introduction
Computer Program
Machine Language
Assembly Language
11 Suggested Readings
(Department of CSE, IIIT Kalyani) Compiler Design July 16, 2025 2 / 35
Books

A. V. Aho, M. S. Lam, R. Sethi, J. D. Ullman, Compiler Design: Principles, Techniques


and Tools, 2nd Ed., Prentice-Hall, 2006.
A. W. Appel, M. Ginsburg, Modern Compiler Implementation in C, Cambridge University
Press, 2004.
K. Cooper, L. Torczon, Engineering a Compiler, 2nd Ed., Morgan Kaufmann, 2011.
K. C. Louden, Compiler Construction: Principles and Practice, Cengage Learning, 1997.
D. Grune, H. Bal, C. Jacobs, K. Langendoen, Modern Compiler Design, Wiley, 2000.
A. A. Aaby, Compiler Construction using Flex and Bison.

(Department of CSE, IIIT Kalyani) Compiler Design July 16, 2025 3 / 35


Time table

Wednesday: 09:15 AM - 11:00 AM


Friday: 04:05 PM - 05:50 PM
Consultation hour: Monday, 3:00 PM - 4:00 PM
Contact me at [email protected]

(Department of CSE, IIIT Kalyani) Compiler Design July 16, 2025 4 / 35


Evaluation Plan

Component Weightage Date and Time


Quiz Test/Assignment/Project 20%(10% × 2) Will announce later
Mid Semester Examination 30% Will announce by Exam Cell
End Semester Examination 50% Will announce by Exam Cell

(Department of CSE, IIIT Kalyani) Compiler Design July 16, 2025 5 / 35


Academic Honesty and Integrity Policy

Academic honesty and integrity are to be maintained by all the students throughout the
semester, and no academic dishonesty is acceptable. In particular, no form of plagiarism
shall be tolerated. The student shall be awarded ZERO mark, and the case may be reported
to the appropriate committee of the Institute for necessary action.
Do not late in the class
Keep you mobile phones switched off

(Department of CSE, IIIT Kalyani) Compiler Design July 16, 2025 6 / 35


Policies

Notices
☞ All notices related to the course will be put to the Google Classroom/Group email.
➠ Link: https://meet.google.com/dei-fwvu-age
➠ Password: hi7sm5bz
De-Registration Policy
☞ A student will be de-registered from this course if the attendance in mid-semester is below
50% and overall is below 75%.
☞ Higher attendance will to move the borderline cases to the next higher grade.
Makeup Policy
☞ For Mid-Sem./End-Sem., as per Institute rules.
☞ No Makeup for Assignment/Surprise Quiz/Project

(Department of CSE, IIIT Kalyani) Compiler Design July 16, 2025 7 / 35


Online Resources

Compiler Design (Prof. S. Chattopadhyay, IIT Kharagpur)


https://nptel.ac.in/courses/106105190
Compiler Design (Prof. Sanjeev K. Aggarwal, IIT Kznpur)
https://nptel.ac.in/courses/106104123
Compiler Design (Prof. Rupesh Nasre, IIT Madras)
https://nptel.ac.in/courses/106106237
Compiler Design (Prof. Y. N. Srikanth, IISc Bangalore)
https://nptel.ac.in/courses/106108052

(Department of CSE, IIIT Kalyani) Compiler Design July 16, 2025 8 / 35


Prerequisite

C/C++ Programming (CSC 101)


Data Structures and Algorithms (CSC 201)
Discrete Mathematics (CSC 303)
Formal Languages and Automata Theory (CSC 402)

(Department of CSE, IIIT Kalyani) Compiler Design July 16, 2025 9 / 35


Objectives

The objectives of this course are to


☞ Understand the basic concepts and phases of a compiler.
☞ Learn formal methods and tools used for lexical, syntax, and semantic analysis.
☞ Explore techniques for intermediate code generation, code optimization, and target code gen-
eration.
☞ Gain knowledge of runtime environments and memory organization.
☞ Develop the ability to implement key components of a compiler using programming tools.

(Department of CSE, IIIT Kalyani) Compiler Design July 16, 2025 10 / 35


Learning Outcomes

Upon successful completion of this course, students will be able to:


☞ Describe the major phases of a compiler and the data structures used in each phase.
☞ Design lexical analyzers using regular expressions and finite automata.
☞ Construct parsers for CFGs and resolve syntactic and semantic errors.
☞ Generate intermediate code and apply basic optimization techniques.
☞ Explain memory management during program execution using runtime environments.
☞ Implement simplified compiler components using tools like Lex/Flex and Yacc/Bison.
☞ Analyze and debug compiler-generated errors and suggest corrections.

(Department of CSE, IIIT Kalyani) Compiler Design July 16, 2025 11 / 35


Course descriptions

Introduction
☞ What is Machine Language?
☞ What is Assembly Language?
☞ What is a Interpreter?
☞ What is a Compiler?
☞ Why a Compiler is needed?
☞ Structure of a Compiler

(Department of CSE, IIIT Kalyani) Compiler Design July 16, 2025 12 / 35


Course descriptions

Lexical Analysis
☞ The source code (program) is the input to the Lexical Analyzer.
✔ Removes whitespaces and comments
✔ Recognizes lexemes and classifies them into token types
✔ Reports lexical errors
✔ Supplies tokens to the syntax analyzer
☞ A sequence of tokens (e.g., identifiers, keywords, operators) is the output of the Lexical Ana-
lyzer.
☞ Techniques Used:
✔ Regular Expressions to define token patterns
✔ Finite Automata (DFA/NFA) to recognize patterns
☞ Tool used for the implementation of Lexical analyzer
✔ Lex/Flex

(Department of CSE, IIIT Kalyani) Compiler Design July 16, 2025 13 / 35


Course descriptions

Syntax Analysis
☞ A sequence of tokens is the input to the Syntax Analyzer.
✔ Checks grammar correctness of source code (structure/form)
✔ Detects syntax errors and reports them clearly
✔ Builds parse trees based on grammar rules
☞ The Parse Tree or Abstract Syntax Tree (AST) is the output of the Syntax Analyzer.
☞ Techniques Used:
✔ Context-Free Grammar (CFG)
✔ Pushdown Automata (PDA)
☞ Types of Parsers:
✔ Top-Down Parsing (e.g., Recursive Descent, Predictive, LL(1))
✔ Bottom-Up Parsing (e.g., Shift-Reduce, Operator Precedence, LR(0), SLR, LR(1), LALR(1))
☞ Tool used for the implementation of Syntax Analyzer
✔ Yacc/Bison

(Department of CSE, IIIT Kalyani) Compiler Design July 16, 2025 14 / 35


Course descriptions

Semantic Analysis
☞ Parse Tree is the input to the Semantic Analyzer.
✔ Type checking (e.g., int + string is error)
✔ Scope resolution (ensuring variables/functions are declared)
✔ Function/procedure calls (number/type of arguments match)
✔ Name binding (connect variable use with declaration)
✔ Symbol tables
✔ Reports semantic errors like undeclared variables, type mismatches, etc.
☞ Annotated Parse Tree or Intermediate Representation (IR) is the output to the Semantic
Analyzer.

(Department of CSE, IIIT Kalyani) Compiler Design July 16, 2025 15 / 35


Course descriptions

Intermediate Code Generation


☞ Annotated Parse Tree is the input to the Intermediate Code Generator.
☞ Machine-independent intermediate code of the program is the output to the Intermediate Code
Generator.
☞ Common Forms of Intermediate Code:
✔ Three-Address Code (TAC)
✔ Quadruples and Triples
✔ Static Single Assignment (SSA)
✔ Postfix notation (for expressions)

(Department of CSE, IIIT Kalyani) Compiler Design July 16, 2025 16 / 35


Course descriptions

Code Optimization
☞ Generate faster, smaller, and more efficient target code
☞ Types of Optimization:
✔ Machine-independent optimization (e.g., constant folding)
✔ Machine-dependent optimization (e.g., register allocation)
☞ Optimization Techniques
✔ Constant folding: Compute constant expressions at compile time
✔ Dead code elimination: Remove code that doesn’t affect output
✔ Common subexpression elimination: Avoid redundant calculations
✔ Loop optimization: Move invariant code outside loops
✔ Strength reduction: Replace expensive operations with cheaper ones
☞ Levels of Optimization:
✔ Local (within a basic block)
✔ Global (across basic blocks/functions)
✔ Loop-level (targeting loops for performance)

(Department of CSE, IIIT Kalyani) Compiler Design July 16, 2025 17 / 35


Course descriptions

Code Generation
☞ Optimized Intermediate Representation (IR) is the input to the Code Generator.
✔ Instruction selection: Map IR operations to machine instructions
✔ Register allocation: Assign variables to CPU registers efficiently
✔ Instruction scheduling: Reorder instructions to minimize stalls/delays
✔ Address and offset computation for variables and memory locations
☞ Target Machine Code (assembly or binary) is output to the Code Generator.

(Department of CSE, IIIT Kalyani) Compiler Design July 16, 2025 18 / 35


Course descriptions

Error Handling and Recovery


☞ Detect and report errors during compilation, and recover to continue processing.
☞ Types of Errors:
✔ Lexical errors: Invalid characters or tokens (e.g., x = 5)
✔ Syntax errors: Violation of grammar rules (e.g., missing semicolon)
✔ Semantic errors: Meaning-related issues (e.g., type mismatch)
✔ Runtime errors: Detected during program execution (e.g., divide by zero)
☞ Error Recovery Strategies:
✔ Panic Mode: Skip tokens until a synchronizing token is found
✔ Phrase-Level Recovery: Make minimal changes to continue parsing
✔ Error Productions: Extend grammar to handle common errors
✔ Global Correction: Modify input to correct minimal errors (rare in practice)

(Department of CSE, IIIT Kalyani) Compiler Design July 16, 2025 19 / 35


Introduction

(Department of CSE, IIIT Kalyani) Compiler Design July 16, 2025 20 / 35


Computer Program

A computer program is a set of instructions (smallest unit of execution) that are used to
execute particular tasks to get particular results.

(Department of CSE, IIIT Kalyani) Compiler Design July 16, 2025 21 / 35


Computer Program

A computer program is a set of instructions (smallest unit of execution) that are used to
execute particular tasks to get particular results.
Programming languages are essential to implement the real-life problem into a computer
program.

(Department of CSE, IIIT Kalyani) Compiler Design July 16, 2025 21 / 35


Computer Program

A computer program is a set of instructions (smallest unit of execution) that are used to
execute particular tasks to get particular results.
Programming languages are essential to implement the real-life problem into a computer
program.
Before a program can be run, it must must be translated into machine code, which can
be executed by a computer.

(Department of CSE, IIIT Kalyani) Compiler Design July 16, 2025 21 / 35


Machine Language

Machine language, also known as machine code, is the most basic programming language
that CPU can execute directly.

(Department of CSE, IIIT Kalyani) Compiler Design July 16, 2025 22 / 35


Machine Language

Machine language, also known as machine code, is the most basic programming language
that CPU can execute directly.
It consists of binary code, which is a series of 0s and 1s, and it is the lowest level of
programming language, closest to the hardware.

(Department of CSE, IIIT Kalyani) Compiler Design July 16, 2025 22 / 35


Machine Language

Machine language, also known as machine code, is the most basic programming language
that CPU can execute directly.
It consists of binary code, which is a series of 0s and 1s, and it is the lowest level of
programming language, closest to the hardware.
Each instruction in machine language performs a very specific task, such as a store,
load, jump, or arithmetic operation.

(Department of CSE, IIIT Kalyani) Compiler Design July 16, 2025 22 / 35


Machine Language

Machine language, also known as machine code, is the most basic programming language
that CPU can execute directly.
It consists of binary code, which is a series of 0s and 1s, and it is the lowest level of
programming language, closest to the hardware.
Each instruction in machine language performs a very specific task, such as a store,
load, jump, or arithmetic operation.

(Department of CSE, IIIT Kalyani) Compiler Design July 16, 2025 22 / 35


Machine Language

Machine language, also known as machine code, is the most basic programming language
that CPU can execute directly.
It consists of binary code, which is a series of 0s and 1s, and it is the lowest level of
programming language, closest to the hardware.
Each instruction in machine language performs a very specific task, such as a store,
load, jump, or arithmetic operation.

Example 1
Write a program for addition of two numbers in machine language.

(Department of CSE, IIIT Kalyani) Compiler Design July 16, 2025 22 / 35


Machine Language

We consider a simplified hypothetical 8-bit CPU with a small set of instructions:


➠ Load: Load a value into a register.
➠ Add: Add the value in one register to another.
➠ Store: Store the value from a register into memory.
➠ Halt: Stop the execution of the program.
Let’s define some simple opcodes (operation codes) for these instructions:
➠ 0001 for Load
➠ 0010 for Add
➠ 0011 for Store
➠ 1111 for Halt

(Department of CSE, IIIT Kalyani) Compiler Design July 16, 2025 23 / 35


Machine Language

We assume that the CPU has two registers: R1 and R2.


Numbers are stored in memory locations 0x 10 and 0x 11, and store the result in memory
location 0x 12.
➠ Load the value from memory location 0x 10 into R1.
➠ Load the value from memory location 0x 11 into R2.
➠ Add the value in R2 to R1.
➠ Store the result from R1 into memory location 0x 12.
➠ Halt the program.
Let’s assume the following binary representations:
➠ R1 is represented as 0001
➠ R2 is represented as 0010
Memory addresses:
➠ 0x10 is 00010000
➠ 0x11 is 00010001
➠ 0x12 is 00010010
(Department of CSE, IIIT Kalyani) Compiler Design July 16, 2025 24 / 35
Machine Language

Here is the program in machine code:


0001 0001 00010000; Load memory[0x10] into R1
0001 0010 00010001; Load memory[0x11] into R2
0010 0001 0010; Add R2 to R1
0011 0001 00010010; Store R1 into memory[0x12]
1111; Halt

(Department of CSE, IIIT Kalyani) Compiler Design July 16, 2025 25 / 35


Problems with Machine Language

Difficult to Write and Read: Machine code is composed of binary digits (0s and 1s),
which are extremely difficult for humans to read, write, and understand.

(Department of CSE, IIIT Kalyani) Compiler Design July 16, 2025 26 / 35


Problems with Machine Language

Difficult to Write and Read: Machine code is composed of binary digits (0s and 1s),
which are extremely difficult for humans to read, write, and understand.
No Abstraction: There is no abstraction; every instruction must be specified in terms of
the hardware’s instruction set.

(Department of CSE, IIIT Kalyani) Compiler Design July 16, 2025 26 / 35


Problems with Machine Language

Difficult to Write and Read: Machine code is composed of binary digits (0s and 1s),
which are extremely difficult for humans to read, write, and understand.
No Abstraction: There is no abstraction; every instruction must be specified in terms of
the hardware’s instruction set.
High Error-Proneness: Writing code in machine language involves manually encoding
each instruction, which is highly error-prone. Identifying and fixing errors in machine code
is extremely challenging due to the lack of readability and high detail level.

(Department of CSE, IIIT Kalyani) Compiler Design July 16, 2025 26 / 35


Problems with Machine Language

Difficult to Write and Read: Machine code is composed of binary digits (0s and 1s),
which are extremely difficult for humans to read, write, and understand.
No Abstraction: There is no abstraction; every instruction must be specified in terms of
the hardware’s instruction set.
High Error-Proneness: Writing code in machine language involves manually encoding
each instruction, which is highly error-prone. Identifying and fixing errors in machine code
is extremely challenging due to the lack of readability and high detail level.
Hardware Dependency: Machine code is specific to a particular CPU architecture. Code
written for one type of CPU will not run on another without modification.

(Department of CSE, IIIT Kalyani) Compiler Design July 16, 2025 26 / 35


Assembly Language

Low-level programming language - closer to hardware than high-level languages

(Department of CSE, IIIT Kalyani) Compiler Design July 16, 2025 27 / 35


Assembly Language

Low-level programming language - closer to hardware than high-level languages


Uses mnemonics instead of binary (e.g., MOV, ADD, SUB)

(Department of CSE, IIIT Kalyani) Compiler Design July 16, 2025 27 / 35


Assembly Language

Low-level programming language - closer to hardware than high-level languages


Uses mnemonics instead of binary (e.g., MOV, ADD, SUB)
Uses symbols to represent operations, addresses, labels, macros, and constants making it
more readable than binary machine code

(Department of CSE, IIIT Kalyani) Compiler Design July 16, 2025 27 / 35


Assembly Language

Low-level programming language - closer to hardware than high-level languages


Uses mnemonics instead of binary (e.g., MOV, ADD, SUB)
Uses symbols to represent operations, addresses, labels, macros, and constants making it
more readable than binary machine code
Faster and more efficient than high-level languages

(Department of CSE, IIIT Kalyani) Compiler Design July 16, 2025 27 / 35


Assembly Language

Low-level programming language - closer to hardware than high-level languages


Uses mnemonics instead of binary (e.g., MOV, ADD, SUB)
Uses symbols to represent operations, addresses, labels, macros, and constants making it
more readable than binary machine code
Faster and more efficient than high-level languages
Still hardware-dependent - specific to CPU architecture (e.g., 8086, ARM, MIPS).

(Department of CSE, IIIT Kalyani) Compiler Design July 16, 2025 27 / 35


Assembly Language

Low-level programming language - closer to hardware than high-level languages


Uses mnemonics instead of binary (e.g., MOV, ADD, SUB)
Uses symbols to represent operations, addresses, labels, macros, and constants making it
more readable than binary machine code
Faster and more efficient than high-level languages
Still hardware-dependent - specific to CPU architecture (e.g., 8086, ARM, MIPS).
Requires an assembler to convert assembly code into machine language

(Department of CSE, IIIT Kalyani) Compiler Design July 16, 2025 27 / 35


Assembly Language

Low-level programming language - closer to hardware than high-level languages


Uses mnemonics instead of binary (e.g., MOV, ADD, SUB)
Uses symbols to represent operations, addresses, labels, macros, and constants making it
more readable than binary machine code
Faster and more efficient than high-level languages
Still hardware-dependent - specific to CPU architecture (e.g., 8086, ARM, MIPS).
Requires an assembler to convert assembly code into machine language
Requires deep understanding of computer architecture and registers

(Department of CSE, IIIT Kalyani) Compiler Design July 16, 2025 27 / 35


Assembly Language

Low-level programming language - closer to hardware than high-level languages


Uses mnemonics instead of binary (e.g., MOV, ADD, SUB)
Uses symbols to represent operations, addresses, labels, macros, and constants making it
more readable than binary machine code
Faster and more efficient than high-level languages
Still hardware-dependent - specific to CPU architecture (e.g., 8086, ARM, MIPS).
Requires an assembler to convert assembly code into machine language
Requires deep understanding of computer architecture and registers
Often used in embedded systems, bootloaders, and device drivers

(Department of CSE, IIIT Kalyani) Compiler Design July 16, 2025 27 / 35


Registers in 8086

It has 8 general-purpose registers, each 16 bits wide, and they can be accessed as two 8-bit
halves.
AX AH AL
BX BH BL
CX CH CL
DX DH DL
SP
BP
SI
DI

(Department of CSE, IIIT Kalyani) Compiler Design July 16, 2025 28 / 35


Registers in 8086

General-Purpose Registers:
☞ AX (Accumulator Register): It is divided into two 8-bit registers AH and AL. Used in
arithmetic, logic, and data transfer operations.
☞ BX (Base Register): It is divided into two 8-bit registers BH and BL. It is used to hold the
base address (offset) for memory access.
☞ CX (Count Register): It is is divided into two 8-bit registers CH and CL. It is used as a
counter in loops and string operations.
☞ DX (Data Register): It is divided into two 8-bit registers DH and DL. It is used in multipli-
cation/division and I/O operations.

(Department of CSE, IIIT Kalyani) Compiler Design July 16, 2025 29 / 35


Registers in 8086

Segment Registers
☞ CS (Code Segment): Holds the starting address of the segment containing the executable
code.
☞ DS (Data Segment): Holds the starting address of the segment containing data.
☞ SS (Stack Segment): Holds the starting address of the stack segment.
☞ ES (Extra Segment): Used as an additional data segment for certain operations, particularly
string operations (MOVS, LODS, STOS).
The 8086 uses segmented memory architecture, and these segment registers hold the base
addresses of various memory segments (each up to 64 KB).

(Department of CSE, IIIT Kalyani) Compiler Design July 16, 2025 30 / 35


Registers in 8086

Pointer and Index Registers


☞ SP (Stack Pointer): It points to the topmost item of the stack.
☞ BP (Base Pointer): It is primarily used in accessing parameters passed by the stack. Its
offset address is relative to the stack segment.
☞ SI (Source Index): It is used in the pointer addressing of data and as a source in some
string-related operations. Its offset is relative to the data segment.
☞ DI (Destination Index): It is used in the pointer addressing of data and as a destination in
some string-related operations. Its offset is relative to the extra segment.
Pointer and Index Registers are used to efficiently access memory and handle stack and
string operations.

(Department of CSE, IIIT Kalyani) Compiler Design July 16, 2025 31 / 35


Registers in 8086

Special-Purpose Registers
☞ IP (Instruction Pointer): Holds the address of the next instruction to be executed.
Special-purpose registers are used to control program execution, manage memory access, and
monitor the status of the processor.
☞ Flags Register: Indicates the status of the processor. Each bit in this register represents a
different flag: Carry Flag (CF), Parity Flag (PF), Auxiliary Carry Flag (AF), Zero Flag (ZF),
Sign Flag (SF), Trap Flag (TF), Interrupt Enable Flag (IF), Direction Flag (DF), Overflow
Flag (OF).

(Department of CSE, IIIT Kalyani) Compiler Design July 16, 2025 32 / 35


Registers in 8086

Write a program for addition of two numbers in 8086 microprocessor.

MOV AX, [2000H]; Load the first number from memory location 2000H into AX
MOV BX, [2002H]; Load the second number from memory location 2002H into BX
ADD AX, BX; Add the value in BX to AX
MOV [2004H], AX; Store the result from AX into memory location 2004H
HLT; Halt the program

(Department of CSE, IIIT Kalyani) Compiler Design July 16, 2025 33 / 35


Suggested Readings

Chapter 1, A. V. Aho, M. S. Lam, R. Sethi, J. D. Ullman, “Compiler Design: Principles,


Techniques and Tools”, 2nd Ed., Prentice-Hall, 2006.

(Department of CSE, IIIT Kalyani) Compiler Design July 16, 2025 34 / 35


Thank You

(Department of CSE, IIIT Kalyani) Compiler Design July 16, 2025 35 / 35

You might also like