UNIT 1
COMPILER DESIGN
Prof. Seema Mahalungkar
WHY TO LEARN COMPILER DESIGN
Computers are a balanced mix of software and hardware.
Hardware is just a piece of mechanical device and its functions
are being controlled by a compatible software. Hardware
understands instructions in the form of electronic charge,
which is the counterpart of binary language in software
programming. Binary language has only two alphabets, 0 and
1. To instruct, the hardware codes must be written in binary
format, which is simply a series of 1s and 0s. It would be a
difficult and cumbersome task for computer programmers to
write such codes, which is why we have compilers to write
such codes.
WHAT IS COMPILER
A compiler is a program that can read a program in one language – Source
language and translate it into an another language – target language.
Source language is any programming language such as c, c++, java, python, php
and so on.
Source language is known as high level language.
Target language is the machine level language that computer understands.
NEED OF COMPILER
INTERPRETER
An interpreter is another way of implementing a programming language.
Interpretation shares many aspects with compiling (Lexing, parsing and type-checking)
But Instead of producing a target program as a translation, an interpreter appears to directly
execute the operations specified in the source program on inputs supplied by the user
We generally write a computer program using a high-level language. A high-level language is
one that is understandable by us, humans. This is called source code.
However, a computer does not understand high-level language. It only understands the
program written in 0's and 1's in binary, called the machine code.
To convert source code into machine code, we use either a compiler or an interpreter.
Both compilers and interpreters are used to convert a program written in a high-level
language into machine code understood by computers. However, there are differences between
how an interpreter and a compiler works.
Difference Between Interpreter and Compiler
Interpreter Compiler
Scans the entire program and
Translates program one statement at a
translates it as a whole into machine
time.
code.
Interpreters usually take less amount Compilers usually take a large amount
of time to analyze the source code. of time to analyze the source code.
However, the overall execution time is However, the overall execution time is
comparatively slower than compilers. comparatively faster than interpreters.
Generates Object Code which further
No Object Code is generated, hence
requires linking, hence requires more
are memory efficient.
memory.
Programming languages like
Programming languages like C, C++,
JavaScript, Python, Ruby use
Java use compilers.
interpreters.
LANGUAGE PROCESSING SYSTEM
Pre-procesor :
Preprocessor is a separate step in the compilation process. In simple terms, a C
Pre-processor is just a text substitution tool and it instructs the compiler to do required
pre-processing before the actual compilation.
All pre-processor commands begin with a hash symbol (#).
Then the modified program is passed as a input to the compiler.
Assembler
An assembler translates assembly language programs into machine code. The output
of an assembler is called an object file, which contains a combination of machine
instructions as well as the data required to place these instructions in memory .
Linker
Linker is a computer program that links and merges various object files together in
order to make an executable file. All these files might have been compiled by separate
assemblers. The major task of a linker is to search and locate referenced
module/routines in a program and to determine the memory location where these codes
will be loaded.
Loader
Loader is a part of operating system and is responsible for loading executable files into
memory and execute them. It calculates the size of a program (instructions and data) and
creates memory space for it. It initializes various registers to initiate execution.
Structure of a Compiler :
There are two major parts of a compiler: Analysis and Synthesis
Analysis :
Known as the front-end of the compiler, the analysis phase of the compiler reads the
source program, divides it into core parts and then checks for lexical, grammar and syntax
errors.
The analysis part also collects information about the source program and stores it in a data
structure called symbol table, which is passed to the synthesis part.
Synthesis
The synthesis part constructs the desired target program from the intermediate
representation and from the symbol table.
Phases of a Compiler :
Lexical Analysis:
Lexical analyser phase is the first phase of compilation process. It takes
source code as input. It reads the source program one character at a time and
converts it into meaningful lexemes. Lexical analyser represents these lexemes
in the form of tokens.
Syntax Analysis
Syntax analysis is the second phase of compilation process. It takes tokens
as input and generates a parse tree as output. In syntax analysis phase, the parser
checks that the expression made by the tokens is syntactically correct or not.
Semantic Analysis
Semantic analysis is the third phase of compilation process. It checks
whether the parse tree follows the rules of language. Semantic analyser keeps
track of identifiers, their types and expressions. The output of semantic analysis
phase is the annotated tree syntax.
https://www.geeksforgeeks.org/semantic-analysis-in-compiler-desi
gn/
Intermediate Code Generation
In the intermediate code generation, compiler generates the source code into
the intermediate code. Intermediate code is generated between the high-level
language and the machine language. The intermediate code should be generated in
such a way that you can easily translate it into the target machine code.
Code Optimization
Code optimization is an optional phase. It is used to improve the intermediate
code so that the output of the program could run faster and take less space. It
removes the unnecessary lines of the code and arranges the sequence of statements
in order to speed up the program execution.
Code Generation
Code generation is the final stage of the compilation process. It takes the
optimized intermediate code as input and maps it to the target machine language.
Code generator translates the intermediate code into the machine code of the
specified computer.
Symbol Table Management
-This is the essential function of a compiler is to record the
variable names used in the source program.
- collect information about various attributes of each name.
- These attributes provide information about the storage
allocated for a name, its type, its scope.
- in the case of procedure names, the type of arguments ( call by
value , call by reference) and the returned type.
COMPILER CONSTRUCTION TOOLS
1. Parser generator
automatically produce syntax analyzers from a grammatical
description of programming language.
2. Scanner generators
produce lexical analyzers from regular expression description of
the tokens of a language.
3. Syntax directed translation engine
4. Code generator
5. Data flow analysis engine
6. Compiler construction toolkits