Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
88 views29 pages

CD Model Set-3 Answer Key

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
88 views29 pages

CD Model Set-3 Answer Key

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

SET- 3 Reg. No.

IFET COLLEGE OF ENGINEERING


(An Autonomous Institution)
INTERNAL ASSESSMENT EXAMINATION-II
DEPARTMENT OF CSE & IT-
SUB CODE: 19UCSPC601 MAX MARKS: 100
SUB NAME: COMPILER DESIGN DURATION: 180Min
DATE: 07.05.2024/AN YEAR/ SEMESTER: III/VI
ANSWER KEY
PART-A (10  2=20)
Answer All Questions
(Each answer should have minimum 7 lines)
1. What are the functions performed in synthesis phase? R CO1
 The synthesis phase of compiler involves generating the target code from the 1
intermediate code produced during the analysis phase.
Functions Performed in Synthesis Phase:
1
 Code generation, Code optimization, Code organization, Data allocation &
management, Register allocation, Error handling.
2. List the phases that constitute the front end of a compiler U CO1
 The frontend of the compiler primarily depends upon the source language and are largely 2
independent of the target mission.
Phases: lexical analysis, syntax analysis and semantic analysis
 The frontend produces an intermediate code
3. What is handle pruning? R CO2
 In bottom up parsing, the process of detecting handle and using them in reduction is 1
called handle pruning.
For example: consider the grammar,
E->E+E
E->id
Now consider the string id+id and the rightmost derivation is 1
E->E+E
E->E+E+E
E->E+E+id
E->E+id+id
E->id+id+id
The bold strings are called handles
RIGHT SENTENTIAL HANDLE PRODUCTION
FORM
id+id+id id E->id
E+id+id id E->id
E+E+id Id E->id
E+E+E E+E E->E+E
E+E E+E E->E+E
E

4. Why SLR and LALR are more economical to construct than canonical LR? S CO2
 SLR and LALR tables are easier and more economical to construct than canonical LR 1
tables because the canonical LR table usually has several thousand states for the same-
sized language.
 Look-Ahead LR (LALR): The most powerful alternative, with much lower memory
requirements.
 Simple LR parser (SLR): Has much lower memory requirements, but less language-
recognition power.
 LALR parsing attempts to reduce the number of states. The CALR parser has a large set 1
of items, so the LALR parser is designed to have fewer items but with a reduction in the
number of conflicts

5. Generate the associated semantic rules for the expressions. E→E1 mod E2 E→E1[E2] A CO3
E→*E1
2

6. Convert the assignment statement d:= (a-b) + (a-c) + (a-c) into three address code. A CO3
t:=a-b 2
u:=a-c
v:=t+u
d:=v+u
7. State the limitations of static allocation. U CO4
 The static allocation can be done only if the size of data object is known at compile 2
time.
 The data structure can not be created dynamically. In the sense that, the static allocation
can not manage the allocation of memory at run time.
 Recursive procedure are not supported by this type of allocation
8. How would you solve the issues in the design of code generators? S CO4
 Code generation is the final activity of compiler .basically code generation is a process 1
of creating assembly language/machine language statements which will perform the
operations specified by the source program when they run.
 Using an intermediate representation
1
 Using a target-specific code generator
 Using a garbage collector
 Using a code optimizer
 Using a data flow analysis
9. Apply the basic block concepts, how would you representing the dummy blocks with no S CO5
statements indicated in global dataflow analysis?
 In global dataflow analysis in compiler design,dummy blocks are introduced to 1
represent certain program structures or behaviors that may not have a direct
equivalent in the original code.
Dummy Block Representation For Three Statements: 1

10. Consider the expression and eliminate the common sub expression A CO5
a : =b + c
b :=a –d
c :=b + c
2

PART-B (Total=40 Marks)


(2  16=32&1  8=8)
Answer All Questions
(Each answer should be written for minimum 5 pages with minimum 25 lines per page)
11. A Describe the various phases of compiler and trace it with the program U (16) CO1
segment: (position: = initial + rate * 60).
Compiler is a translator which is used to convert programs in high-level
language to low level language. It translates the entire program and also
reports the errors in source program encountered during the translation. 2
PHASES OF A COMPILER : A compiler operates in various phases. A
phase is a logically interrelated operation that takes source program in one
representation and produces output in another representation. There are two
phases of compilation.
 Analysis (Machine Independent/Language Dependent)
 Synthesis (Machine Dependent/Language Independent) Compilation
process is partitioned into no-of-sub processes called ‗phases‘.
Analysis part The source program is read and broken down into constituent
pieces. The syntax and the meaningof the source string is determined and
then an intermediate code is created from the input source program. It is also
termed as front end of compiler. Again the analysis is carried out in four
phases:
 Lexical analysis  Syntax analysis  Semantic analysis  Intermediate
code generation .
Synthesis part The intermediate form of the source language is taken and
converted into an equivalent target program. During this process if certain
code has to be optimized for efficient execution then the required code is
optimized. It is also termed as back end of compiler. Again the synthesis is
carried out in two phases:
 Code optimization  Code generation

The different phases of compiler are as follows,

1. Lexical analysis 2. Syntax analysis 3. Semantic analysis 4. Intermediate 1


code generation 5. Code generation 6. Code optimization All of the
mentioned phases involve the following tasks,  Symbol table
management  Error handling

Lexical analysis
 Lexical analysis is the first phase of compiler.
 Lexical analysis is also called as Scanning/ linear analysis 2
 Source program is scanned to read the stream of characters and those
characters are grouped to form a sequence called lexemes which
produces token as output.
 Token is a sequence of characters that can be treated as a single
logical entity. The tokens are, 1) Identifiers 2) Keywords 3) Operators 4)
Special symbols 5) Constants
 Pattern: A set of strings in the input for which the same token is
produced as output. This set of strings is described by a rule called a
pattern associated with the token.
 Lexeme: A lexeme is a sequence of characters in the source program
that is matched by the pattern for a token.  Once a token is generated
the corresponding entry is made in the symbol table. Input:stream of
characters Output: Token Token Template: Table Token Separation
Give Input: position = initial + rate * 60
Syntax analysis
 The second phase of the compiler is syntax analysis or parsing or
hierarchical analysis.  Parser converts the tokens produced by lexical
2
analyzer into a tree like representation called parse tree.
 The hierarchical tree structure generated in this phase is called syntax
tree or parse tree.  A parse tree describes the syntactic structure of the
input. o Input: Tokens o Output: Syntax tree
 In a syntax tree, each interior node represents an operation and the
children of the node represent the arguments of the operation.
Semantic Analysis
 The semantic analyzer usesthe syntax tree and the information in the
symbol table to check the source program for semantic consistency with
the language definition. 2
 It ensures the correctness of the program, matching of the parenthesis
is also done in this phase.
 It also gathers type information and saves it in either the syntax tree or
the symbol table, for subsequent use during intermediate-code
generation.
 An important part of semantic analysis is type checking, where the
compiler checks that each operator has matching operands.
The compiler must report an error if a floating-point number is used to
index an array. The language specification may permit some type
conversions like integer to float for float addition is called coercions.
Intermediate code generation
 After syntax and semantic analysis of the source program, many
compilers generate an explicit low-level or machine-like intermediate
representation. 2
 Three-address code is one of the intermediate representations, which
consists of a sequence of assembly-like instructions with three operands
per instruction. Each operand can act like a register.
 The output of the intermediate code generator consists of the three-
address code sequence for position = initial + rate * 60 t1 = int to
float(60) t2 = id3 * t1 t3 = id2 + t2 id1 = t3 Properties of good
intermediate representations
 It should be easy to produce,
 And easy to translate into target program.
 It should not include too much machine specific detail.
 Convenient to produce in the semantic analysis phase.
 Convenient to translate into code for the desired target architecture
Code optimization
 The machine-independent code-optimization phase attempts to
improve the intermediate code so that better target code will result
 Optimization has to improve the efficiency of code so that the target
program running time and consumption of memory can be reduced. t1 = 2
id3 * 60.0 id1 = id2 + t1
 The optimizer can deduce that the conversion of 60 from integer to
floating point can be done once and for all at compile time, so the int to
float operation can be eliminated by replacing the integer 60 by the
floating-point number 60.0.
Example:
Show how each of the following C source inputs can be optimized using
global optimization techniques: a)if (x>0) {x = 2; y = 3;} else {y = 4; x
= 2;}
b)if (x>0) x = 2; else if (x<=0) x = 3; else x = 4;
Solution:
a)if (x>0) {x = 2; y = 3;} else {y = 4; x = 2;}
The optimized code is, if (x>0) y = 3; else y = 4; x = 2;
b)if (x>0) x = 2; else if (x<=0) x = 3; else x = 4;
The optimized code is, if (x>0) x = 2; else x = 3;
Code Generation
 Code generation is the final phase of compiler
 The code generator takes as input from code optimization and
produces target code or object code as output. 2
 If the target language is machine code, then the registers or memory
locations are selected for each of the variables used by the program.
 The intermediate instructions are translated into sequences of machine
instructions.
 For example, using registers R1 and R2, the intermediate code might
get translated into the machine code LDF R2, id3 MULF R2, R2 , #60.0
LDF Rl, id2 ADDF Rl, Rl, R2 STF id1, R1

(OR)
11. B Prove that the following two regular expressions are equivalent by A (16) CO1
showing that the minimum state DFA's are same :
a) ( a | b ) *
b) ( a * | b * ) *
Solution:
The NFA with ɛ for (a | b) * will be

We will eliminate ɛ moves and convert it to DFA


ɛ-closure {q0} = {q0, q1, q2, q4, q7}
ɛ-closure {q1} = {q1, q2, q4}
ɛ-closure {q2} = {q2} 2
ɛ-closure {q3} = {q3, q6, q1, q2, q4, q7} = {q1, q2, q3, q4, q6, q7}
ɛ-closure {q4} = {q4}
ɛ-closure {q5} = {q1, q2, q4, q5, q6, q7}
ɛ-closure {q6} = {q1, q2, q4, q6, q7}
ɛ-closure {q7} = {q7}
consider ɛ-closure {q0} = {q0, q1, q2, q4, q7} call it as state A
δ`(A,a) = ɛ-closure { δ(A,a)}
= ɛ-closure { δ(q0, q1, q2, q4, q7),a}
= ɛ-closure { δ(q0,a) U δ(q1,a) U δ(q2,a) U δ(q4,a) U δ(q7,a)}
= ɛ-closure {q3}
= {q1, q2, q3, q4, q6, q7} call it as state B
δ`(A,a) = B
δ`(A,a) = ɛ-closure { δ(q0, q1, q2, q4, q7),b}
= ɛ-closure {q5}
= {q1, q2, q4, q5, q6, q7} call it as state C 2
δ`(A,b) = C
Now consider state B and C for input transitions.
δ`(B,a) = ɛ-closure { δ(q1, q2, q3, q4, q5, q6, q7),a}
= ɛ-closure {q3} i.e. state B.
δ`(B,b) = ɛ-closure { δ(q1, q2, q3, q4, q6, q7),b}
= ɛ-closure {q5} i.e. state C.
Similarly,
δ`(C,a) = ɛ-closure { δ(q1, q2, q4, q5, q6, q7),a}
= state B δ`(C,a) = state C The DFA will be

The NFA with ɛ for (a * | b *) * will be

ɛ-closure {q0} = {q0, q1, q2, q3, q5, q6, q7, q9, q10, q11} call it as state A
δ`(A,a) = ɛ-closure { δ(A,a)}
= ɛ-closure { (q0, q1, q2, q3, q5, q6, q7, q9, q10, q11),a}
= ɛ-closure {q4}
2
= {q4, q5, q10, q1, q2, q3, q6, q7, q9, q11} call it as state B
δ`(A,a) = B
δ`(A,b) = ɛ-closure { δ(A,b)}
= ɛ-closure { (q0, q1, q2, q3, q5, q6, q7, q9, q10, q11),b}
= ɛ-closure {q8}
= {q8, q9, q10, q1, q2, q3, q5, q6, q7, q9, q11} call it as state C
δ`(A,b) = C
δ`(B,a) = ɛ-closure { δ(B,a)}
= ɛ-closure { (q1, q2, q3, q4, q5, q6, q7, q9, q10, q11),a}
= ɛ-closure {q4} = B Similarly δ`(B,b) = ɛ-closure {q8}
δ`(B,b) = C
δ`(C,a) = B
δ`(C,b) = C
After eliminating ɛ moves we get following DFA

2
Thus the DFA obtained in (i) and (ii) are the same. Hence (a | b) * = (a * | b *) *
is proved.
12. A Construct a predictive parsing table for the given grammar or Check A (16) CO2
whether the given grammar is LL(1) or not for the sentence id+id*id.
E→E+T|T
T→T*F|F
F → (E) | id
2

4
4

(OR)
12. B (i) Explain in detail about the Context-free Grammar R (8) CO2
Definition 2
A context-free grammar G is defined by the 4-tuple: G= (V, T, P S)
where
1. V is a finite set of non-terminals (variable).
T is a finite set of terminals.
2. P is a finite set of production rules of the form A -> α.
Where A is nonterminal and α is string of terminals
and/or non-terminals. P is a relation from V to
(V𝖴T)*.
3. S is the start symbol (variable S∈V)
Notational Conventions 2
The following notational conventions for grammars can be used
1. These symbols are terminals:
(a) Lowercase letters early in the alphabet, a, b, c..
(b) Operator symbols such as +,*, etc..
(c) Punctuation symbols such as parentheses, comma, etc...
(d) The digits 0,1,. . .,9.
(e) A single terminal symbol such as (Boldface strings) id or
if.
2. These symbols are non-terminals:
(a) Uppercase letters early in the alphabet, A, B, C..
(b) The letter S is usually the start symbol.
(c) Lowercase, italic names such as expr or stmt.
3. Uppercase letters in the later part of the alphabet,
such as X,Y,Z, represent either non- terminals or
terminals.
4. Lowercase letters in the later part of the alphabet,
primarily u, v, . . . , z, represent strings of terminals.
5. Lowercase Greek letters, α, β, γ, represent a generic
production can be written as A α,
6. A set of productions A->α1, A->α2,…,A->αk with a
common non-terminal, may be written as A->α1 | α2
|…| αk.
7. Unless stated otherwise, the left side of the first production is
the start symbol.
Derivations
The derivation uses productions to generate a string (set 2
of terminals). Derivation is used to find whether the string
belongs to a given grammar. The derivation is formed by
replacing the nonterminal in the right hand side by suitable
production rule.
Classification of Derivations
1. Left-most derivation
In leftmost derivation, at each and every step the leftmost
non-terminal is expanded by
substituting its corresponding production to derive a string.
2. Right-most derivation
In rightmost derivation, at each and every step the rightmost
non-terminal is expanded by substituting its corresponding
production to derive a string.
Example :
Consider the context free grammar (CFG) G = ({S}, {a, b, c}, P, S )
where P={S->SbS
| ScS | a}. Derive the string “abaca” by leftmost derivation 2
and rightmost derivation. Solution:
Leftmost derivation for “abaca”
S ⇒ SbS
⇒abS (using rule S ->a)
⇒abScS (using rule S ->ScS)
⇒abacS (using rule S ->a)
⇒abaca (using rule S ->a)
Rightmost derivation for “abaca”
S ⇒ ScS
⇒Sca (using rule S ->a)
⇒SbSca (using rule S ->SbS)
⇒Sbaca (using rule S ->a)
⇒abaca (using rule S ->a)

(ii) Describe about the parse tree with suitable example R (8) CO2
Parse Tree: 2
 Parse tree is the hierarchical representation of terminals or non-
terminals.
 These symbols (terminals or non-terminals) represent the
derivation of the grammar to yield input strings.
 In parsing, the string springs using the beginning symbol.
 The starting symbol of the grammar must be used as the root of
the Parse Tree.
 Leaves of parse tree represent terminals.
 Each interior node represents productions of a grammar.
Rules to Draw a Parse Tree:
2
 All leaf nodes need to be terminals.
 All interior nodes need to be non-terminals.
 In-order traversal gives the original input string.
 Example 1: Let us take an example of Grammar (Production
Rules).
S -> sAB
2
A -> a
B -> b
 The input string is ―sab‖, then the Parse Tree is:

Uses of Parse Tree:


 It helps in making syntax analysis by reflecting the syntax of the
input language.
 It uses an in-memory representation of the input with a structure
that conforms to the grammar. 2
 The advantages of using parse trees rather than semantic actions:
you‘ll make multiple passes over the info without having to re-
parse the input.
13. A Construct a translation scheme for obtaining the Three Address code A (16) CO3
for constructing an Annotated parse tree for the assignment statement
and Input string is x=(a+b)*(c+d). The grammar is:
S -> id : = E
E -> E1 + E2
E -> E1 * E2
E -> -E1
E -> ( E )
E -> id

4
4

(OR)
13. B How could you generate the intermediate code for the flow of control S (16) CO3
statement? Explain with an example
The translation of boolean expressions into three-address code in the
context of statements such as those generated by the following grammar: 2
S if ( B ) Si
S -» if ( B ) Si
else S2
S -> while ( 5 ) Si

In these productions, nonterminal B represents a boolean expression and


nonterminal S represents a statement. Both B and S have a synthesized
attribute code, which gives the translation into three-address instructions. 2
For simplicity, we build up the translations B.code and S.code as strings,
using syntax-directed definitions. The semantic rules defining the code
attributes could be implemented instead by building up syntax trees and
then emitting code during a tree traversal.

The translation of if (B) Si consists of B.code followed by Si.code, as


illustrated in Fig. 3.29. Within B.code are jumps based on the value of B. If
B is true, control flows to the first instruction of Si.code, and if B is false,
control flows to the instruction immediately following Si.code.
2

Fig. 3.29 The translation of if (B) Si consists of B.code followed by


Si.code
The labels for the jumps in B.code and S.code are managed using
inherited attributes. With a boolean expression B, we associate two labels:
B.true, the label to which control flows if B is true, and B.false, the label to
which control flows if B is false. With a statement S, we associate an 2
inherited attribute S.next denoting a label for the instruction immediately
after the code for S. In some cases, the instruction immediately following
S.code is a jump to some label L.
A jump to a jump to L from within S.code is avoided using S.next.
The syntax-directed definition in Fig. 6.36-6.37 produces three-address
code for boolean expressions in the context of if-, if-else-, and while-

statements.
Figure 3.30: Syntax-directed definition for flow-of-control statements

We assume that new label Q creates a new label each time it is called, and that
label(L) attaches label L to the next three-address instruction to be generated. 8 4
8 If implemented literally, the semantic rules will generate lots of labels and
may attach more than one label to a three- address instruction.

A program consists of a statement generated by P -> S.The semantic rules


associated with this production initialize S.next to a new label. P.code consists
of S.code followed by the new label S.next.Token assign in the production S —
> assign is a placeholder for assignment statements. The translation of
assignments for this discussion of control flow, S.code is simply assign.code.

In translating S ->• if (B) Si, the semantic rules in Fig. 3.29 create a new label
B.true and attach it to the first three-address instruction generated for the
statement Si,Thus, jumps to B.true within the code for B will go to the code for
Si. Further, by setting B.false to S.next, we ensure tha t control will skip the
code for Si if B evaluates to false.

In translating the if-else-statement S if (B) Si else S2 , the code for the boolean
expression B has jumps out of it to the first instruction of the code for 51 if B is
true, and to the first instruction of the code for S2 if B is false.

Further, control flows from both Si and S2 to the three-address instruction


immediately following the code for S — its label is given by the inherited
attribute S.next. An explicit goto S.next appears after the code for Si to skip
over the code for S 2 . No goto is needed after S 2 , since Si.next is the same as
S.next. The code for S —>• whil e (B) Si is formed from B.code and Si.code .

We use a local variable begin to hold a new label attached to the first instruction
for this while-statement, which is also the first instruction for B. We use a
variable rather than an attribute, because begin is local to the semantic rules for
this production. The inherited label S.next marks the instruction that control
must flow to if B is false; hence, B.false is set to be S.next.
A new label B.true is attached to the first instruction for Si; the code for B
generates a jump to this label if B is true. After the code for Si we place the
instruction goto begin, which causes a jump back to the beginning of the code
for the Boolean expression.

Note that Si.next is set to this label begin, so jumps from within Si.code can go
directly to begin. The code for S -» Si S2 consists of the code for Si followed by
the code for S2. The semantic rules manage the labels; the first instruction after
the code for Si is the beginning of the code for S 2 ; and the instruction after the
code for 52 is also the instruction after the code for S.

14. A Explain in detail about the different storage allocation techniques. R (16) CO4
From the perspective of the compiler writer, the executing target program 4
runs in its own logical address space in which each program value has a
location. The management and organization of this logical address space is
shared between the compiler, operating system, and target machine. The
operating system maps the logical addresses into physical addresses, which
are usually spread throughout memory.Runtime storage comes into blocks,
where a byte is used to show the smallest unit of addressable memory. Using
the four bytes a machine word can form. Object of multibyte is stored in
consecutive bytes and gives the first byte address. Run-time storage can be
subdivide to hold the different components of an executing program shown
in Fig. 4.1:
 Generated executable code
 Static data objects
 Dynamic data-object- heap
 Automatic data objects- stack

STORAGE ALLOCATION STRATEGIES


The different storage allocation strategies are:
1. Static allocation - lays out storage for all data objects at compile time.
2
2. Stack allocation - manages the run-time storage as a stack.
3.Heap allocation - allocates and deallocates storage as needed at run
time from a data area known as heap.

Static allocation
Static allocation is a procedure which is used for allocation of all the data
objects at compile time. Static allocation is possible only when the
compiler knows the size of data object at compile time. In this type of
allocation, formation of data objects is not possible under any 2
circumstances at run time. In static allocation, compiler decides the amount
of storage for each data object and binds the name of data objects to the
allocated storage. In static allocation, names are bound to storage
locations. If memory is created at compile time then the memory will be
created in static area and only once.

Advantages
 It is easy to implement.
 It allows type checking during compilation.
 It eliminates the feasibility of running out of memory. 1

Disadvantages
 It is incompatible with recursive subprograms.
 It is not possible to use variables whose size has to be
determined at run time.
 The static allocation can be completed if the size of the data
object is called compile time.
Stack allocation
a. Stack allocation strategy is a strategy in which the storage is
organized as stack. This stack is also called control stack.
b. As activation begins the activation records are pushed onto the stack 2
and on completion of this activation the corresponding activation
records can be popped.
c. The locals are stored in the each activation record. Hence locals are
bound to corresponding activation record on each fresh activation.
d. The data structures can be created dynamically for stack allocation.
Limitations of stack allocation
The memory addressing can be done using pointers and index registers. 1
Hence this type of allocation is slower than static allocation.
Heap Allocation
a. If the values of non local variables must be retained even after the
activation record then such a retaining is not possible by stack 3
allocation. This limitation of stack allocation is because of its Last-In
First Out nature. For retaining of such local variables heap allocation
strategy is used.
b. The heap allocation allocates the continuous block of memory when
required for storage of activation records or other data object. This
allocated memory can be deallocated when activation ends. This
deallocated (free) space can be further reused by heap manager.
c. The efficient heap management can be done by
i) Creating a linked list for the free blocks and when any memory is
deallocated that block of memory is appended in the linked list.
ii) Allocate the most suitable block of memory from the linked list.
ie. use best fit technique for allocation of block.

(OR)
14. B Explain the use of symbol table in compilation process. List out the R (16) CO4
various attributes for implementing the symbol table.
Symbol Table:
Symbol Table is an important data structure created and 2
maintained by the compiler in order to keep track of semantics of
variables i.e. it stores information about the scope and binding
information about names, information about instances of various
entities such as variable and function names, classes, objects, etc. It
is built-in lexical and syntax analysis phases. The information is
collected by the analysis phases of the compiler and is used by the
synthesis phases of the compiler to generate code. It is used by the
compiler to achieve compile-time efficiency.
A symbol table may serve the following purposes depending upon the 2
language in hand:
 To store the names of all entities in a structured form at one place.
 To verify if a variable has been declared.
 To implement type checking, by verifying assignments and
expressions in the source code are semantically correct.
 To determine the scope of a name (scope resolution). It is used by
various phases of the compiler as follows:-
• Lexical Analysis: Creates new table entries in the table, for
example like entries about tokens
• Syntax Analysis: Adds information regarding attribute
type, scope, dimension, line of reference, use, etc in the
table.
• Semantic Analysis: Uses available information in the table
to check for semantics i.e. to verify that expressions and
assignments are semantically correct(type checking) and
update it accordingly.
• Intermediate Code generation: Refers symbol table for
knowing how much and what type of run- time is allocated
and table helps in adding temporary variable information.
• Code Optimization: Uses information present in the
symbol table for machine-dependent optimization.
• Target Code generation: Generates code by using address
information of identifier present in the table.
Symbol Table entries
Each entry in the symbol table is associated with attributes 2
that support the compiler in different phases.
Items stored in Symbol table:
 Variable names and constants
 Procedure and function names
 Literal constants and strings
 Compiler generated temporaries
 Labels in source languages
Information used by the compiler from Symbol table:
 Data type and name
 Declaring procedures
 Offset in storage
 If structure or record then, a pointer to structure table.
 For parameters, whether parameter passing by value or by
reference
 Number and type of arguments passed to function
 Base Address
A symbol table can either be linear or a hash table. Using the following
format, it maintains the entry for each name.
<symbol name, type, attribute>
For example, suppose a variable store the information about the
following variable declaration:static int salary
then, it stores an entry in the following format:
<salary, int, static>
Operations of Symbol table 2
The core operations of a symbol table are Allocate, free, insert, lookup, set
attribute, and get attribute. The allocation operation creates ‗n‗ empty
symbol table. The free operation is used to remove all records and free the
storage of a symbol table. As the name implies, the insert operation puts a
name into a symbol table and returns a pointer to its entry. The lookup
function looks up a name and returns a reference to the corresponding
entry. The set and get attributes associate an attribute with a given entry
and get an attribute associated with a provided. Other steps may be
introduced depending upon the requirements. For example, a delete action
deletes a previously entered name.
The basic operations defined on a symbol table include:
Operation Function
allocate To allocate a new empty symbol table
free To remove all entries and free storage of symbol table
lookup To search for a name and return a pointer to its entry
insert To insert a name in a symbol table and return a pointer to its
entry
set_attribute To associate an attribute with a given array
get_attribute To get an attribute associated with a given array

2
Insert ()
 Insert () operation is more frequently used in the analysis phase
when the tokens are identified andnames are stored in the table.
 The insert() operation is used to insert the information in the
symbol table like the unique nameoccurring in the source code.
 In the source code, the attribute for a symbol is the information
associated with that symbol. Theinformation contains the state, value,
type and scope about the symbol.
 The insert () function takes the symbol and its value in the form of
argument. For example: int x;
Should be processed by the compiler as:
insert (x, int)
lookup() 2
In the symbol table, lookup() operation is used to search a name. It is
used to determine:
 The existence of symbol in the table.
 The declaration of the symbol before it is used.
 Check whether the name is used in the scope.
 Initialization of the symbol.
 Checking whether the name is declared multiple times.
 The basic format of lookup() function is as follows:
lookup (symbol)

Implementation of Symbol table


The symbol table can be implemented in the unordered list if the
compiler is used to handle the small amount of data.
A symbol table can be implemented in one of the following techniques:
 Linear (sorted or unsorted) list
 Hash table
 Binary search tree
Symbol table are mostly implemented as hash table.
Following are commonly used data structures for implementing symbol
table:-
1. List 1
 In this method, an array is used to store names and associated
information.
 A pointer ―available‖ is maintained at end of all stored records and
new names are added in the order as they arrive
 To search for a name we start from the beginning of the list till
available pointer and if not found we get an error ―use of the
undeclared name‖
 While inserting a new name we must ensure that it is not already
present otherwise an error occurs i.e. ―Multiple defined names‖
 Insertion is fast O(1), but lookup is slow for large tables – O(n) on
average
 The advantage is that it takes a minimum amount of space.
2. Linked List 1
 This implementation is using a linked list. A link field is added to
each record.
 Searching of names is done in order pointed by the link of the link
field.
 A pointer ―First‖ is maintained to point to the first record of the
symbol table.
 Insertion is fast O(1), but lookup is slow for large tables – O(n) on
average 1
3. Hash Table
 In hashing scheme, two tables are maintained – a hash table and
symbol table and are themost commonly used method to implement
symbol tables.
 A hash table is an array with an index range: 0 to table size – 1.
These entries are pointerspointing to the names of the symbol table.
 To search for a name we use a hash function that will result in
an integer between 0 totable size – 1.
 Insertion and lookup can be made very fast – O(1).
 The advantage is quick to search is possible and the
disadvantage is that hashing iscomplicated to implement. 1
4. Binary Search Tree
 Another approach to implementing a symbol table is to use a binary
search tree i.e. we addtwo link fields i.e. left and right child.
 All names are created as child of the root node that always
follows the property of thebinary search tree.
 Insertion and lookup are O(log2 n) on average.
15. A Describe about the Peephole Optimization in detail R (16) CO5
The most of compilers produce good code through careful
instruction selection and register allocation. A few uses an alternative
strategy: they generate naive code and then improve the quality of the 1
target code by applying "optimizing" transformations to the target program.
The term "optimizing" is somewhat misleading because there is no
guarantee that the resulting code is optimal under any mathematical
measure. Nevertheless, many simple transformations can significantly
improve the running time or space requirement of the target program.
Definition:
A simple but effective technique for locally improving the target
code is peephole optimization, which is done by examining a sliding
window of target instructions (called the peephole) and replacing 2
instruction sequences within the peephole by a shorter or faster sequence,
whenever possible. Peephole optimization can also be applied directly after
intermediate code generation to improve the intermediate representation.
The peephole is a small, sliding window on a program. The code
in the peephole need not be contiguous, although some implementations do
require this. It is characteristic of peephole optimization that each
improvement may spawn opportunities for additional improvements.
Characteristic of peephole optimizations:
 Redundant-instruction elimination 1
 Flow-of-control optimizations
 Algebraic simplifications
 Use of machine idioms
Redundant-instruction elimination
At source code level, the following can be done by the user:

int add_ten(int x) int add_ten(int x) int add_ten(int int add_ten(int 2


{ { x) x)
int y, z; int y; { {
int y = 10; return x + 10;
y = 10; y = 10; return x + y; }
z = x + y; y = x + y;return }
return z; y;
} }
At compilation level, the compiler searches for instructions redundant in
nature. Multiple loading and storing of instructions may carry the same
meaning even if some of them are removed. For example:
MOV x, R0
MOV R0, R1
We can delete the first instruction and re-write the sentence as:
MOV x, R1
Eliminating Unreachable Code
2
Unreachable code is a part of the program code that is never accessed
because of programmingconstructs. Programmers may have accidently
written a piece of code that can never be reached.
Example: void add_ten(int x)
{
return x + 10;
printf(―value of x is %d‖, x);
}
In this code segment, the printf statement will never be executed as the
program control returns back before it can execute, hence printf can be
removed.
Example 2:
if debug == 1 got o LI
got o L2
L I: print debugging information
L2:
One obvious peephole optimization is to eliminate jumps over jumps.
Thus, no matter what thevalue of debug, the code sequence above can be
replaced by
if debug != 1 goto L2
print debugging information
L2:
Flows-Of-Control Optimizations: 2
Simple intermediate code-generation algorithms frequently produce
jumps to jumps, jumps to conditional jumps, or conditional jumps to jumps.
These unnecessary jumps can be eliminated in either the intermediate code
or the target code by using peephole optimizations.
Consider the following chunk of code:
...
MOV R1, R2
GOTO L1
...
L1 : GOTO L2
L2 : INC R1
In this code, label L1 can be removed as it passes the control to L2.
So instead of jumping toL1 and then to L2, the control can directly reach
L2, as shown below:
...
MOV R1, R2
GOTO L2
...
L2 : INC R1
Algebraic expression simplification
2
The algebraic identities can also be used by a peephole optimizer to
eliminate three-address statements such as x = x + 0 or x = x * 1 in the
peephole.
There are occasions where algebraic expressions can be made simple. For
example, the expression x = x+ 0 can be replaced by „x’ itself and the
expression x = x + 1 can simply be replaced by INC x.
Strength reduction
There are operations that consume more time and space. Their „strength‟
can be reduced by replacing them with other operations that consume less
time and space, but produce the same result. For example, x² is invariably 2
cheaper to implement as x*x than as a call to an exponentiation routine.
Fixed-point multiplication or division by a power of two is cheaper to
implement as a shift. Floating-point division by a constant can be
implemented as multiplication by a constant, which may be cheaper.
Examples:
Initial code:
y = x * 2;
Optimized code:
y = x + x; or y = x << 1;
Initial code:
y = x / 2;
Optimized code:
y = x >> 1;
Use of Machine Idioms
The target machine may have hardware instructions to implement certain
specific operations efficiently. For example, some machines have auto- 2
increment and auto-decrement addressing modes. These add or subtract one
from an operand before or after using its value. The use of these modes
greatly improves the quality of code when pushing or popping a stack, as in
parameter passing. These modes can also be used in code for statements
like i : =i+1.
i:=i+1 → i++
i:=i-1 → i- -
(OR)
15. B (i) Construct an algorithm for building Dominator Tree for flow graph. U (8) CO5

In a flow graph, a node d dominates node n, if every path from initial node
of the flow graph to n goes through d. This will be denoted by d dom n. 2
Every initial node dominates all the remaining nodes in the flow graph and
the entry of a loop dominates all nodes in the loop. Similarly every node
dominates itself.
Example:
In the flow graph below,
 Initial node,node1 dominates every node. 2
 node 2 dominates itself
 node 3 dominates all but 1 and 2.
 node 4 dominates all but 1,2 and 3.
 node 5 and 6 dominates only themselves, since flow of control
can skip around either bygoing through the other.
 node 7 dominates 7,8 ,9 and 10.
 node 8 dominates 8, 9 and 10.
 node 9 and 10 dominates only themselves.

Fig: Flow graph and Dominator tree


The way of presenting dominator information is in a tree, called the
dominator tree, in which
• The initial node is the root. 2
• The parent of each other node is its immediate dominator.
• Each node d dominates only its descendents in the tree.
The existence of dominator tree follows from a property of dominators; each
node has a unique immediate dominator in that is the last dominator of n on
any path from the initial node to n. In terms of the dom relation, the
immediate dominator m has the property is d=!n and d dom n, then d dom m.
D(1)={1}
D(2)={1,2}
D(3)={1,3}
D(4)={1,3,4}
D(5)={1,3,4,5}
D(6)={1,3,4,6}
D(7)={1,3,4,7}
D(8)={1,3,4,7,8}
D(9)={1,3,4,7,8,9}
D(10)={1,3,4,7,8,10}

15. B (ii) Explain the concept of data flow analysis in flow graphs with U (8) CO5
suitable example
The data flow analysis equation is used to collect information about a 1
program block. The following is the data flow analysis equation for a
statement s-
Out[s] = gen[s] U In[s] - Kill[s]
where
Out[s] is the information at the end of the statement s.
gen[s] is the information generated by the statement s.
In[s] is the information at the beginning of the statement s.
Kill[s] is the information killed or removed by the statement s.
The main aim of the data flow analysis is to find a set of constraints on
the In[s]‟s and Out[s]‟s for the statement s. The constraints include two
types of constraints- The transfer function and the Control Flow constraint.
Transfer Function
The semantics of the statement are the constraints for the data flow 1
values before and after a statement.
For example, consider two statements x = y and z = x. Both these
statements are executed. Thus, after execution, we can say that both x and z
have the same value, i.e. y.
Thus, a transfer function depicts the relationship between the data flow
values before and after a statement.
There are two types of transfer functions-
1. Forward propagation
2. Backward propagation Let‟s see both of these.
• Forward propagation
 In forward propagation, the transfer function is represented by Fs
for any statement s.
 This transfer function accepts the data flow values before the 1
statement and outputs the new data flow value after the statement.
 Thus the new data flow or the output after the statement will be
Out[s] = Fs(In[s]).
 Backward propagation
 The backward propagation is the converse of the forward
propagation.
 After the statement, a data flow value is converted to a new data
1
flow value before the statement using this transfer function.
 Thus the new data flow or the output will be In[s] = Fs(Out[s]).
Control-Flow Constraints
The second set of constraints comes from the control flow. The
control flow value of Si will be equal to the control flow values into Si +
1 if block B contains statements S1, S2,........, Sn. That is:
IN[Si + 1] = OUT[Si], for all i = 1 , 2, ,n – 1.
Data Flow Properties
Some properties of the data flow analysis are-
• Available expression 4
• Reaching definition
• Line variable
• Busy expression
We will discuss these properties one by one. Available Expression
An expression a + b is said to be available at a program point x if none of its
operands gets modified before their use. It is used to eliminate common
subexpressions.
An expression is available at its evaluation point.
Example:

In the above example, the expression L1: 4 * i is an available expression


since this expression is available for blocks B2 and B3, and no operand is
getting modified.
Reaching Definition
A definition D is reaching a point x if D is not killed or redefined before that
point. It is generally used in variable/constant propagation.
Example:
In the above example, D1 is a reaching definition for block B2 since the
value of x is not changed (it is two only) but D1 is not a reaching definition
for block B3 because the value of x is changed to x + 2. This means D1 is
killed or redefined by D2.
Live Variable
A variable x is said to be live at a point p if the variable's value is not
killed or redefined by someblock. If the variable is killed or redefined, it is
said to be dead.
It is generally used in register allocation
and dead code elimination.
Example:

In the above example, the variable a is live at blocks B1,B2, B3 and B4 but
is killed at block B5 since its value is changed from 2 to b + c. Similarly,
variable b is live at block B3 but is killed at block B4.
Busy Expression
An expression is said to be busy along a path if its evaluation occurs along
that path, but none of its operand definitions appears before it. It is used for
performing code movement optimization.

Staff In-charge HoD

You might also like