COMPILER
Presented to: Presented by:
Sir Naeem Sahil (BSIT-2024-015)
1
COMPILER
A compiler is a computer program that transforms source code
written in a programming language (the source language) into
another computer language (the target language), with the latter
takes binary form known as object code
It create an executable program
2
Cause
Software for early computers was written in
assembly language
The benefits of reusing software on different
CPUs started to become significantly greater
than the cost of writing a compiler
The first real compiler
FORTRAN compilers of the late 1950s
18 person-years to build
3
Structure of Compiler
Any compiler must perform two major tasks
Analysis of the source program
Synthesis of a machine-language program
4
THE STRUCTURE OF A COMPILER (2)
Source
Program Tokens Syntactic Semantic
Scanner Parser
(Character Stream) Structure Routines
Intermediate
Representation
Symbol and Optimizer
Attribute
Tables
(Used by all Phases of The Compiler)
Code
Generator
5
Target machine code
THE STRUCTURE OF A COMPILER (3)
Source
Program Tokens Syntactic Semantic
Scanner Parser
(Character Stream) Structure Routines
Intermediate
Scanner Representation
The scanner begins the analysis of the source program by
reading the input, character by character, and grouping
Symbol and Optimizer
characters into individual words and symbols (tokens)
Attribute
Tables
RE ( Regular expression )
NFA ( Non-deterministic Finite Automata )
(Used by) all
DFA ( Deterministic Finite Automata
LEX Phases of
The Compiler) Code
Generator
6
Target machine code
THE STRUCTURE OF A COMPILER (4)
Source
Program Tokens Syntactic Semantic
Scanner Parser
(Character Stream) Structure Routines
Intermediate
Parser Representation
Given a formal syntax specification (typically as a [CFG] ), the
parse reads tokens and groups them icontext-free grammar nto
Symbol and Optimizer
units as specified by the productions of the CFG being used.
As syntactic structure isAttribute
recognized, the parser either calls
Tables
corresponding semantic routines directly or builds a syntax tree.
CFG ( Context-Free Grammar )
BNF ( Backus-Naur Form ) (Used by all
GAA ( Grammar Analysis Algorithms
Phases of)
LL, LR, SLR, LALR Parsers Code
YACC
The Compiler)
Generator
7
Target machine code
THE STRUCTURE OF A COMPILER (5)
Source
Program Tokens Syntactic Semantic
Scanner Parser
(Character Stream) Structure Routines
Intermediate
Semantic Routines Representation
Perform two functions
Check the static semantics of each construct
Symbol and
Do the actual translation Optimizer
The heart of a compilerAttribute
Tables
Syntax Directed Translation
Semantic Processing Techniques
(Used
by all
IR (Intermediate Representation)
Phases of
The Compiler) Code
Generator
8
Target machine code
THE STRUCTURE OF A COMPILER (6)
Source
Program Tokens Syntactic Semantic
Scanner Parser
(Character Stream) Structure Routines
Intermediate
Optimizer Representation
The IR code generated by the semantic routines is analyzed and
transformed into functionally equivalent but improved IR code
This phase can be verySymbol and
complex and slow Optimizer
Peephole optimization Attribute
loop optimization, registerTables
allocation, code scheduling
(Used by all
Register and Temporary Management
Peephole Optimization Phases of
The Compiler) Code
Generator
9
Target machine code
THE STRUCTURE OF A COMPILER (7)
Source
Program Tokens Syntactic Semantic
Scanner Parser
(Character Stream) Structure Routines
Intermediate
Code Generator Representation
Interpretive Code Generation
Generating Code from Tree/Dag
Grammar-Based Code Generator
Optimizer
Code
Generator
10 Target machine code
THE STRUCTURE OF A COMPILER (8)
Code Generator
[Intermediate Code Generator]
Non-optimized Intermediate Cod
Scanner
[Lexical Analyzer]
Tokens
Code Optimizer
Parser
[Syntax Analyzer]
Optimized Intermediate Code
Parse tree
Code Optimizer
Semantic Process
[Semantic analyzer] Target machine code
Abstract Syntax Tree w/ Attributes
11
Language Description
Identifier Rules
•Identifier can be of maximum length 6.
•Identifiers are not case sensitive.
•An Indetifier can only have alphanumeric characters( a-z ,
A-Z , 0-9 ) and underscore(_).
•The first character of an identifier can only contain
alphabet( a-z , A-Z ).
•Keywords are not allowed to be used as Identifiers.
•No special characters, such as semicolon, period,
whitespaces, slash or comma are permitted to be used in
or as Identifier.
12
Data Types:
Our language supports only 3 datatypes
•Integer
•String
•Character
Expressions
1.Arithmetic operators (+, -, *, /, %)
2.Uniray operator
3.Paranthesis
4.Only Integer supported
5.Relational expression to be supported (>, <, >=, <=, ==, !=)
6. Character string and integer constants
13
Statements
•Declaration statement : int a;
•Declaration and Initialisation : int a=5;
•Assingment Statement : a=6;
Conditional statement
Simple if (nesting not allowed)
if then
Endif
Switch Statement (nesting not allowed)
Switch()
Cases
Value 1:
Break;
Value n:
break;
14
Endcase
Repetition Statement (nesting not allowed)
a.Repeat
Until ()
b.While (relational expression)
Endwhile
c.For = start value, end value, inc/dec
………
Endfor
4
I/O Statement
•Input ;
•Output ;
Program Structure
Decleration:
Start
End 15
1.Sample Program I
#mode 10
declaration
int r
int c
int in
int flg
start
r=0
flg = 1
while( flg == 1 )
if( c == 0) then
flg = 0
endif
c = c-1
endwhile
16
end
OUTPUT 1
START: LB02:
MOV AX, @DATA MOV AX,
MOV DS, AX SUB AX,
MOV AX, MOV c, AX
MOV r, AX JMP LB01
MOV AX, LB03:
MOV flg, AX MOV AX, 4C00H
LB01: INT 21H
MOV AX, END START
CMP AX,
JNE LB01
MOV AX,
CMP AX,
JNE LB01
17
MOV AX,
MOV flg, AX
Sample Program II
#mode 10 start end
declaration k=k*1
int a ; b if(i<9 )then
int i i=i+9
int k k=k*1
string mes1 endif
i=i-45
repeat
i=i+9*k+b
k=k*1
output "Hello World"
input k
until(i<2 )
while(k>3 ) 18
i=i+9
k=k*1 endwhile
OUTPUT
START: LB01:
MOV AX, @DATA MOV AX, i
MOV DS, AX SUB AX, 45
MOV AX, k MOV i, AX
MUL 1 LB02:
MOV k, AX MOV AX, i
MOV AX, i ADD AX, 9
CMP AX, 9 MUL k
JGE LB01 ADD AX, b
MOV AX, i MOV i, AX
ADD AX, 9 MOV AX, k
MOV i, AX MUL 1
MOV AX, k MOV k, AX 19
MUL 1
MOV k, AX
OUTPUT
LEA DX, "Hello World" MUL 1
CALL MESSAGE MOV k, AX
CALL INDEC JMP LB01
MOV k, AX MOV AX, i
MOV AX, i ADD AX, 9
CMP AX, 2 MOV i, AX
JGE LB01 MOV AX, k
LB03: MUL 1
MOV AX, MOV k, AX
CMP AX, 3 JMP LB01
JLE LB01 LB04:
MOV AX, i MOV AX, 4C00H
ADD AX, 9 INT 21H 20
MOV i, AX
MOV AX, k END START
SCREENSHOTS
21
22
23
Feasibility and future scope
With the growth of technology ease of working is given
priority.
We have emerged from C , C++ to python ,ruby , etc. which
require less lines of code .
Our project can be extended to form a new language which is
easy to learn, faster , has more inbuilt features and has many
more qualities of a good programming language.
24
Conclusion
In a compiler the process of Intermediate code generation is
independent of machine and the process of conversion of
Intermediate code to target code is independent of language
used.
Thus we have done the front end of compilation process.
It includes 3 phases of compilation
lexical analysis
syntax analysis
semantic analysis
Followed by intermediate code generation.
25
References
•Salomaa, Arto [1973]. Formal Languages. Academic Press,
New York
•Schulz, Waldean A. [1976]. Semantic Analysis and Target
Language Synthesis in a Translator.Ph.D. thesis, University of
Colorado, Boulder, CO.
•https://www.cs.vt.edu/undergraduate/courses/CS4304
26
27