Compiler Construction(CS F363)
Prof.R.Gururaj
BITS Pilani CS&IS Dept.
Hyderabad Campus
Syntax Directed Translator-2
(Ch.2 of T1)
Prof.R.Gururaj
BITS Pilani CS&IS Dept.
Hyderabad Campus
Introduction
Analysis phase:
breaks up the source program into pieces and produces an
internal representation called intermediate code.
Synthesis phase: translates the intermediate code into the target
program.
The analysis is organized around the syntax of language.
Syntax: proper form
Semantics: meaning
CFG or BNF to specifying the syntax of languages.
Prof.R.Gururaj CSF363 Compiler Construction BITS Pilani, Hyderabad Campus
Program Code and TAC
Prof.R.Gururaj CSF363 Compiler Construction BITS Pilani, Hyderabad Campus
Syntax Directed Translation
Besides specifying the syntax of the program, Grammar can be
used to help guide the translation of program.
There exists a technique for translation which is based on
Grammar, known as Syntax Directed Translation.
Prof.R.Gururaj CSF363 Compiler Construction BITS Pilani, Hyderabad Campus
Two forms of intermediate code
Syntax Tree: represents hierarchical structure.
Parser produces this.
Three-address code: Intermediate code generator does this.
Prof.R.Gururaj CSF363 Compiler Construction BITS Pilani, Hyderabad Campus
Prof.R.Gururaj CSF363 Compiler Construction BITS Pilani, Hyderabad Campus
Syntax Definition
CFGs used to specify the syntax of a language.
Ex: if …else statement in Java.
stmt if ( expre ) stmt else stmt
Grammar Definition:
1. Set of terminals (tokens)
2. Set of NTs (syntactic variables)
3. Set of productions
4. A starting NT
Prof.R.Gururaj CSF363 Compiler Construction BITS Pilani, Hyderabad Campus
Ex: Expression with digits and + and – operators.
list list + digit
list list – digit
list digit
digit 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Derivations: the process of deriving a valid string from starting
NT.
Parsing: taking a string of terminals and figuring out how it can
be derived from the start NT of the Grammar.
Prof.R.Gururaj CSF363 Compiler Construction BITS Pilani, Hyderabad Campus
Parse Tree
Root
Leaf
Interior node
Yield of the tree
Parse tree for 9-5+2 according to the previous grammar for
expressions.
Ambiguity and ambiguous grammars:
Prof.R.Gururaj CSF363 Compiler Construction BITS Pilani, Hyderabad Campus
Syntax Directed Translation
Besides specifying the syntax of the program, Grammar can be
used to help guide the translation of program.
There exists a technique for translation which is based on
Grammar, known as Syntax Directed Translation.
Syntax Directed Translation is done attaching rules or program
fragments to productions in Grammar.
Ex: expr expr1 + term
Translate expr1;
Translate term’
Handle +;
Prof.R.Gururaj CSF363 Compiler Construction BITS Pilani, Hyderabad Campus
Attributes
An attribute is any quantity associated with programming
construct.
Ex: type, number of instructions, location of first instruction etc.
We extend the notion of attributes to symbols that represent
programming constructs.
Prof.R.Gururaj CSF363 Compiler Construction BITS Pilani, Hyderabad Campus
SDT schemes
A translation scheme is notation for attaching program
fragments to productions of grammars.
The program fragments are executed when the production is
used in syntax analysis.
The combined result of all these fragment executions, in the
order induced by the syntax analysis and produce the
translation.
Prof.R.Gururaj CSF363 Compiler Construction BITS Pilani, Hyderabad Campus
Postfix notation
We look at an example that translates an infix notation to postfix
notation.
Prof.R.Gururaj CSF363 Compiler Construction BITS Pilani, Hyderabad Campus
Syntax Directed Definition
1. Associates set of attributes with each Grammar symbol.
2. With each production, a set of semantic rules for computing
the values of the attributes associated with the symbols
appearing in the production.
3. We associate an attribute t with each NT symbol.
An attribute is said to be synthesized attribute if its value at a
parse-tree node is determined by the values at the children of
that node. They can be evaluated during a single bottom-up
traversal.
Here t is attached with a string.
Prof.R.Gururaj CSF363 Compiler Construction BITS Pilani, Hyderabad Campus
SDT for Infix to Postfix translation
Grammar: expr expr+term
expr expr-term
expr term
term 0|1|…|9
String : 9-5+2
Prof.R.Gururaj CSF363 Compiler Construction BITS Pilani, Hyderabad Campus
Another SDT approach
In the previous SDT, the string representing the translation of
the non-terminal at the head of each production is the
concatenation of the translations of the NTs in the production
body in the same order.
This can also be implemented by printing only the additional
strings in the order they appear in the definition. It does not
need manipulation of strings. It requires that the programs
fragments to be executed.
We use Depth first traversal of the tree.
Prof.R.Gururaj CSF363 Compiler Construction BITS Pilani, Hyderabad Campus
procedure visit(Node N)
{
for (each child C of N, from left to right)
{
visit(C);
}
evaluate semantic rule at Node N;
}
Ex: infix to postfix using the second approach.
Dr.R.Gururaj CSF363 Compiler Construction BITS Pilani, Hyderabad Campus
Parsing
Parsing is the process of determining how a string of terminals
can be generated by a grammar.
Top-down parsing
Bottom-up parsing
Software tools for generating parsers directly from Grammars
often use bottom-up approach.
Prof.R.Gururaj CSF363 Compiler Construction BITS Pilani, Hyderabad Campus
Top-down parsing
Recursive-descent parsing is a top-down method of syntax
analysis in which a set of recursive procedures is used to
process the input.
Every NT is associated with a procedure.
A simple form of recursive-descent parsing is look-ahead
parsing.
Prof.R.Gururaj CSF363 Compiler Construction BITS Pilani, Hyderabad Campus
Look-ahead approach
The look-ahead symbol unambiguously determines the flow of
control through the procedure body.
The sequence of procedure calls during the analysis of an input
string implicitly defines a parse tree for the input.
The predictive parsing relies on the information about the first
symbol that can be generated by the production body.
We use the notion of FIRST (NT)
Prof.R.Gururaj CSF363 Compiler Construction BITS Pilani, Hyderabad Campus
The FIRST set must be considered if there are two productions
Ar; As.
The predictive parsing requires that FIRST( r) and FIRST(s)
must be disjoint.
The look-ahead symbol can be used to decide which production
to use.
Prof.R.Gururaj CSF363 Compiler Construction BITS Pilani, Hyderabad Campus
Syntax Tree
(Abstract) Syntax Tree: is the structure that helps in designing
translator.
In an AST for an expression, each interior node represents an
operator.
More generally any programming construct can be handled by
making up an operator for the construct and treating as
operands the syntactically meaningful components of that
construct.
Syntax Tree : Interior nodes represent programming constructs.
Parse Tree: Interior nodes represent NTs.
Prof.R.Gururaj CSF363 Compiler Construction BITS Pilani, Hyderabad Campus
Lexical Analysis
A sequence of input characters that comprises a single token is
called a lexeme.
<id, “count”>
id.lexeme= “count”
Here “count “ is the actual lexeme comprising this instance of
the token id.
Prof.R.Gururaj CSF363 Compiler Construction BITS Pilani, Hyderabad Campus
Process
Removal of white spaces.
Grouping of digits into integers.
Storing the strings in string table.
String tables can be implemented using Hashtable.
Reserved words are stored initially in the table.
Prof.R.Gururaj CSF363 Compiler Construction BITS Pilani, Hyderabad Campus
Symbol table
Symbol tables are the data structures that are used for holding
information about the source program constructs.
The information is collected incrementally by the analysis phase
and used by synthesis phases to generate the target code.
The role is to pass information from declarations to uses.
Info stored: identifier, type, position in storage, and other
relevant information.
Symbol table typically need to support multiple declarations of
the same identifier with in a program.
Prof.R.Gururaj CSF363 Compiler Construction BITS Pilani, Hyderabad Campus
Intermediate Code
There are two kinds of intermediate representations:
1. Trees: parse trees/ syntax trees
2. Linear representations: Three-address code
Representation for programming constructs.
Prof.R.Gururaj CSF363 Compiler Construction BITS Pilani, Hyderabad Campus
Static Checking
Syntactic checking:
Type checking: coercion ; overloading
Prof.R.Gururaj CSF363 Compiler Construction BITS Pilani, Hyderabad Campus
Three-address code
We walk through the syntax trees to generate three address
code.
Format x= y op z
Prof.R.Gururaj CSF363 Compiler Construction BITS Pilani, Hyderabad Campus
Summary
Grammar
Parsing
Three-address code
Syntax tree
SDT – rules, action body
Lexical analyzer
Symbol table
Intermediate code
Prof.R.Gururaj CSF363 Compiler Construction BITS Pilani, Hyderabad Campus