CSN-352
Introduction
Lecture-1_Jan2021
Definition
●
A software system that translates the programming language into a machine executable format is called
compiler.
●
A compiler is a program that can read a program in one language - the source language and translate it
into an equivalent program in another language - the target language;
●
The target program can take input from user and produce output on console.
●
It encompasses:
– programming languages,
– machine architecture,
– language theory,
– algorithms, and
– software engineering.
Evolution of Programming
Language
●
Based on generation:
– First Generation (machine languages)
– Second Generation (Assembly languages)
– Third generation (Fortran, Cobol, Lisp, C, C++, C#, and Java)
– Fourth Generation (Designed for specific application, SQL and Postscritp for text formatting)
– Fifth generation (logic and constraint-based languages , Prolog and OPS5)
●
Imperative (How a computation is to be done) Vs Declarative (What computation is to be done)
– C, C++ are imperative, and Functional languages (Haskal, Prolog, etc) are declarative
●
Based on Computer architecture
– Von Neumann language (Fortran and C)
●
Based on programming style
– Object Oriented Programming languages (C++, Java)
●
Scripting languages with high level operators
– Perl, PHP, Ruby, Java Script, Python,
Interpreter
●
Interpreter directly execute the operations specified in the source program on inputs supplied by the
user.
●
Executes program statement by statement in a sequence.
●
Slower in execution.
Overall Processing System
Several other programms are required
To create an executatble program
-
Larger programs are compiled in pieces:
Linker links all executable object files
Loader keeps all into memory for the execution
Strucutre of a functioning Compiler
Analysis + Synthesis
Analysis (front end)
- Check syntax and semantics
- raise informative errors to the user
- Create intermediate representation of the source program along with symbol
table (a data structures having information about the source program) and pass to
the Synthesis phase
Synthesis (back end)
- Constructs the target machine executable program
Phases of a Compiler
Analysis
Synthesis
Lexical Analyzer
- Reads the stream of characters making up the source program.
- Generate meaningful sequences called lexemes.
- For each lexeme, the lexical analyzer produces as output a token of the form
Token - <token-name; attribute-value>
- Token-name: an abstract symbol
- Attribute-value: an entry in the symbol table for the token-name.
- Blanks separating the lexemes would be discarded by the lexical analyzer.
position = initial + rate * 60
position - <id,1> // 1 is the symbol-table
entry for position
= - <=> //no value
initial - <id,2>
+ - <+>
rate - <id,3>
* - <*>
60 - <60>
Syntax Analyzer
- Takes tokens by lex analyser as input.
- Creates a tree representing the grammatical structure of the token stream.
- Syntax tree – each intermediate node represents an operation and children of the node
representing the argument of the operation.
Usual arithmatic precedence
– multiplication has higher
precedence than addition.
Semantic Analyser
- The semantic analyzer uses the syntax tree and the information in the symbol
table to check the source program for semantic consistency with the language
denition.
- Type Checking: gather type information and saves it in either the syntax tree
or the symbol table.
eg., each operator has matching operands
- Auto coercions: eg., addition of integer and float, integer will be converted to
float.
Intermediate Code Generation
- Explicit low level machine representation of the input received from semantic analysis.
- Called program for an abstract machine.
- Should be easy to produce and
- Should be easy to translate into the target machine
// three address code
representation, each operand
acts as a register to store.
Code Optimization
- Attempts to improve the intermediate machine code to result better target code.
- Faste execution
- Less power consumption
Code Generation
- The code generator takes as input an intermediate representation of the source
program and maps it into the target language.
- The intermediate instructions are translated into sequences of machine instructions that
perform the same task.
- Register or memory locations are selected in this phase for each variable.
- Frist operand is destination
- F for floating pont number
# for immidiate constant
Symbol Table
- The symbol table is a data structure containing a record for each variable
name, with fields for the attributes of the name.
- Need to chose a suitable data structures to allow quick access of symbol table by the
compiler.
Compiler Construction Tool
- These tools are modern software development environments containing tools such as
language editors, debuggers, version managers, etc.
- Use specialized language for implementing a specific component, have sophisiticated
algorithms.
- A compiler developer can use these tools at different phases of compiler developement.
-
Compiler Construction Tool
- Parser generators that automatically produce syntax analyzers from a
grammatical description of a programming language.
- Scanner generators that produce lexical analyzers from a regular-expression description of
the tokens of a language.
- Syntax-directed translation engines that produce collections of routines for walking a parse
tree and generating intermediate code.
- Code-generator that produces a code generator from a collection of rules for translating
each operation of the intermediate language into the machine language for a target machine.
- Data-flow analysis engines that facilitate the gathering of information about how values
are transmitted from one part of a program to each
other part. Data-flow analysis is a key part of code optimization.
- Compiler-construction toolkits that provide an integrated set of routines for constructing
various phases of a compiler.