CD Model Set-3 Answer Key
CD Model Set-3 Answer Key
4. Why SLR and LALR are more economical to construct than canonical LR? S CO2
SLR and LALR tables are easier and more economical to construct than canonical LR 1
tables because the canonical LR table usually has several thousand states for the same-
sized language.
Look-Ahead LR (LALR): The most powerful alternative, with much lower memory
requirements.
Simple LR parser (SLR): Has much lower memory requirements, but less language-
recognition power.
LALR parsing attempts to reduce the number of states. The CALR parser has a large set 1
of items, so the LALR parser is designed to have fewer items but with a reduction in the
number of conflicts
5. Generate the associated semantic rules for the expressions. E→E1 mod E2 E→E1[E2] A CO3
E→*E1
2
6. Convert the assignment statement d:= (a-b) + (a-c) + (a-c) into three address code. A CO3
t:=a-b 2
u:=a-c
v:=t+u
d:=v+u
7. State the limitations of static allocation. U CO4
The static allocation can be done only if the size of data object is known at compile 2
time.
The data structure can not be created dynamically. In the sense that, the static allocation
can not manage the allocation of memory at run time.
Recursive procedure are not supported by this type of allocation
8. How would you solve the issues in the design of code generators? S CO4
Code generation is the final activity of compiler .basically code generation is a process 1
of creating assembly language/machine language statements which will perform the
operations specified by the source program when they run.
Using an intermediate representation
1
Using a target-specific code generator
Using a garbage collector
Using a code optimizer
Using a data flow analysis
9. Apply the basic block concepts, how would you representing the dummy blocks with no S CO5
statements indicated in global dataflow analysis?
In global dataflow analysis in compiler design,dummy blocks are introduced to 1
represent certain program structures or behaviors that may not have a direct
equivalent in the original code.
Dummy Block Representation For Three Statements: 1
10. Consider the expression and eliminate the common sub expression A CO5
a : =b + c
b :=a –d
c :=b + c
2
Lexical analysis
Lexical analysis is the first phase of compiler.
Lexical analysis is also called as Scanning/ linear analysis 2
Source program is scanned to read the stream of characters and those
characters are grouped to form a sequence called lexemes which
produces token as output.
Token is a sequence of characters that can be treated as a single
logical entity. The tokens are, 1) Identifiers 2) Keywords 3) Operators 4)
Special symbols 5) Constants
Pattern: A set of strings in the input for which the same token is
produced as output. This set of strings is described by a rule called a
pattern associated with the token.
Lexeme: A lexeme is a sequence of characters in the source program
that is matched by the pattern for a token. Once a token is generated
the corresponding entry is made in the symbol table. Input:stream of
characters Output: Token Token Template: Table Token Separation
Give Input: position = initial + rate * 60
Syntax analysis
The second phase of the compiler is syntax analysis or parsing or
hierarchical analysis. Parser converts the tokens produced by lexical
2
analyzer into a tree like representation called parse tree.
The hierarchical tree structure generated in this phase is called syntax
tree or parse tree. A parse tree describes the syntactic structure of the
input. o Input: Tokens o Output: Syntax tree
In a syntax tree, each interior node represents an operation and the
children of the node represent the arguments of the operation.
Semantic Analysis
The semantic analyzer usesthe syntax tree and the information in the
symbol table to check the source program for semantic consistency with
the language definition. 2
It ensures the correctness of the program, matching of the parenthesis
is also done in this phase.
It also gathers type information and saves it in either the syntax tree or
the symbol table, for subsequent use during intermediate-code
generation.
An important part of semantic analysis is type checking, where the
compiler checks that each operator has matching operands.
The compiler must report an error if a floating-point number is used to
index an array. The language specification may permit some type
conversions like integer to float for float addition is called coercions.
Intermediate code generation
After syntax and semantic analysis of the source program, many
compilers generate an explicit low-level or machine-like intermediate
representation. 2
Three-address code is one of the intermediate representations, which
consists of a sequence of assembly-like instructions with three operands
per instruction. Each operand can act like a register.
The output of the intermediate code generator consists of the three-
address code sequence for position = initial + rate * 60 t1 = int to
float(60) t2 = id3 * t1 t3 = id2 + t2 id1 = t3 Properties of good
intermediate representations
It should be easy to produce,
And easy to translate into target program.
It should not include too much machine specific detail.
Convenient to produce in the semantic analysis phase.
Convenient to translate into code for the desired target architecture
Code optimization
The machine-independent code-optimization phase attempts to
improve the intermediate code so that better target code will result
Optimization has to improve the efficiency of code so that the target
program running time and consumption of memory can be reduced. t1 = 2
id3 * 60.0 id1 = id2 + t1
The optimizer can deduce that the conversion of 60 from integer to
floating point can be done once and for all at compile time, so the int to
float operation can be eliminated by replacing the integer 60 by the
floating-point number 60.0.
Example:
Show how each of the following C source inputs can be optimized using
global optimization techniques: a)if (x>0) {x = 2; y = 3;} else {y = 4; x
= 2;}
b)if (x>0) x = 2; else if (x<=0) x = 3; else x = 4;
Solution:
a)if (x>0) {x = 2; y = 3;} else {y = 4; x = 2;}
The optimized code is, if (x>0) y = 3; else y = 4; x = 2;
b)if (x>0) x = 2; else if (x<=0) x = 3; else x = 4;
The optimized code is, if (x>0) x = 2; else x = 3;
Code Generation
Code generation is the final phase of compiler
The code generator takes as input from code optimization and
produces target code or object code as output. 2
If the target language is machine code, then the registers or memory
locations are selected for each of the variables used by the program.
The intermediate instructions are translated into sequences of machine
instructions.
For example, using registers R1 and R2, the intermediate code might
get translated into the machine code LDF R2, id3 MULF R2, R2 , #60.0
LDF Rl, id2 ADDF Rl, Rl, R2 STF id1, R1
(OR)
11. B Prove that the following two regular expressions are equivalent by A (16) CO1
showing that the minimum state DFA's are same :
a) ( a | b ) *
b) ( a * | b * ) *
Solution:
The NFA with ɛ for (a | b) * will be
ɛ-closure {q0} = {q0, q1, q2, q3, q5, q6, q7, q9, q10, q11} call it as state A
δ`(A,a) = ɛ-closure { δ(A,a)}
= ɛ-closure { (q0, q1, q2, q3, q5, q6, q7, q9, q10, q11),a}
= ɛ-closure {q4}
2
= {q4, q5, q10, q1, q2, q3, q6, q7, q9, q11} call it as state B
δ`(A,a) = B
δ`(A,b) = ɛ-closure { δ(A,b)}
= ɛ-closure { (q0, q1, q2, q3, q5, q6, q7, q9, q10, q11),b}
= ɛ-closure {q8}
= {q8, q9, q10, q1, q2, q3, q5, q6, q7, q9, q11} call it as state C
δ`(A,b) = C
δ`(B,a) = ɛ-closure { δ(B,a)}
= ɛ-closure { (q1, q2, q3, q4, q5, q6, q7, q9, q10, q11),a}
= ɛ-closure {q4} = B Similarly δ`(B,b) = ɛ-closure {q8}
δ`(B,b) = C
δ`(C,a) = B
δ`(C,b) = C
After eliminating ɛ moves we get following DFA
2
Thus the DFA obtained in (i) and (ii) are the same. Hence (a | b) * = (a * | b *) *
is proved.
12. A Construct a predictive parsing table for the given grammar or Check A (16) CO2
whether the given grammar is LL(1) or not for the sentence id+id*id.
E→E+T|T
T→T*F|F
F → (E) | id
2
4
4
(OR)
12. B (i) Explain in detail about the Context-free Grammar R (8) CO2
Definition 2
A context-free grammar G is defined by the 4-tuple: G= (V, T, P S)
where
1. V is a finite set of non-terminals (variable).
T is a finite set of terminals.
2. P is a finite set of production rules of the form A -> α.
Where A is nonterminal and α is string of terminals
and/or non-terminals. P is a relation from V to
(V𝖴T)*.
3. S is the start symbol (variable S∈V)
Notational Conventions 2
The following notational conventions for grammars can be used
1. These symbols are terminals:
(a) Lowercase letters early in the alphabet, a, b, c..
(b) Operator symbols such as +,*, etc..
(c) Punctuation symbols such as parentheses, comma, etc...
(d) The digits 0,1,. . .,9.
(e) A single terminal symbol such as (Boldface strings) id or
if.
2. These symbols are non-terminals:
(a) Uppercase letters early in the alphabet, A, B, C..
(b) The letter S is usually the start symbol.
(c) Lowercase, italic names such as expr or stmt.
3. Uppercase letters in the later part of the alphabet,
such as X,Y,Z, represent either non- terminals or
terminals.
4. Lowercase letters in the later part of the alphabet,
primarily u, v, . . . , z, represent strings of terminals.
5. Lowercase Greek letters, α, β, γ, represent a generic
production can be written as A α,
6. A set of productions A->α1, A->α2,…,A->αk with a
common non-terminal, may be written as A->α1 | α2
|…| αk.
7. Unless stated otherwise, the left side of the first production is
the start symbol.
Derivations
The derivation uses productions to generate a string (set 2
of terminals). Derivation is used to find whether the string
belongs to a given grammar. The derivation is formed by
replacing the nonterminal in the right hand side by suitable
production rule.
Classification of Derivations
1. Left-most derivation
In leftmost derivation, at each and every step the leftmost
non-terminal is expanded by
substituting its corresponding production to derive a string.
2. Right-most derivation
In rightmost derivation, at each and every step the rightmost
non-terminal is expanded by substituting its corresponding
production to derive a string.
Example :
Consider the context free grammar (CFG) G = ({S}, {a, b, c}, P, S )
where P={S->SbS
| ScS | a}. Derive the string “abaca” by leftmost derivation 2
and rightmost derivation. Solution:
Leftmost derivation for “abaca”
S ⇒ SbS
⇒abS (using rule S ->a)
⇒abScS (using rule S ->ScS)
⇒abacS (using rule S ->a)
⇒abaca (using rule S ->a)
Rightmost derivation for “abaca”
S ⇒ ScS
⇒Sca (using rule S ->a)
⇒SbSca (using rule S ->SbS)
⇒Sbaca (using rule S ->a)
⇒abaca (using rule S ->a)
(ii) Describe about the parse tree with suitable example R (8) CO2
Parse Tree: 2
Parse tree is the hierarchical representation of terminals or non-
terminals.
These symbols (terminals or non-terminals) represent the
derivation of the grammar to yield input strings.
In parsing, the string springs using the beginning symbol.
The starting symbol of the grammar must be used as the root of
the Parse Tree.
Leaves of parse tree represent terminals.
Each interior node represents productions of a grammar.
Rules to Draw a Parse Tree:
2
All leaf nodes need to be terminals.
All interior nodes need to be non-terminals.
In-order traversal gives the original input string.
Example 1: Let us take an example of Grammar (Production
Rules).
S -> sAB
2
A -> a
B -> b
The input string is ―sab‖, then the Parse Tree is:
4
4
(OR)
13. B How could you generate the intermediate code for the flow of control S (16) CO3
statement? Explain with an example
The translation of boolean expressions into three-address code in the
context of statements such as those generated by the following grammar: 2
S if ( B ) Si
S -» if ( B ) Si
else S2
S -> while ( 5 ) Si
statements.
Figure 3.30: Syntax-directed definition for flow-of-control statements
We assume that new label Q creates a new label each time it is called, and that
label(L) attaches label L to the next three-address instruction to be generated. 8 4
8 If implemented literally, the semantic rules will generate lots of labels and
may attach more than one label to a three- address instruction.
In translating S ->• if (B) Si, the semantic rules in Fig. 3.29 create a new label
B.true and attach it to the first three-address instruction generated for the
statement Si,Thus, jumps to B.true within the code for B will go to the code for
Si. Further, by setting B.false to S.next, we ensure tha t control will skip the
code for Si if B evaluates to false.
In translating the if-else-statement S if (B) Si else S2 , the code for the boolean
expression B has jumps out of it to the first instruction of the code for 51 if B is
true, and to the first instruction of the code for S2 if B is false.
We use a local variable begin to hold a new label attached to the first instruction
for this while-statement, which is also the first instruction for B. We use a
variable rather than an attribute, because begin is local to the semantic rules for
this production. The inherited label S.next marks the instruction that control
must flow to if B is false; hence, B.false is set to be S.next.
A new label B.true is attached to the first instruction for Si; the code for B
generates a jump to this label if B is true. After the code for Si we place the
instruction goto begin, which causes a jump back to the beginning of the code
for the Boolean expression.
Note that Si.next is set to this label begin, so jumps from within Si.code can go
directly to begin. The code for S -» Si S2 consists of the code for Si followed by
the code for S2. The semantic rules manage the labels; the first instruction after
the code for Si is the beginning of the code for S 2 ; and the instruction after the
code for 52 is also the instruction after the code for S.
14. A Explain in detail about the different storage allocation techniques. R (16) CO4
From the perspective of the compiler writer, the executing target program 4
runs in its own logical address space in which each program value has a
location. The management and organization of this logical address space is
shared between the compiler, operating system, and target machine. The
operating system maps the logical addresses into physical addresses, which
are usually spread throughout memory.Runtime storage comes into blocks,
where a byte is used to show the smallest unit of addressable memory. Using
the four bytes a machine word can form. Object of multibyte is stored in
consecutive bytes and gives the first byte address. Run-time storage can be
subdivide to hold the different components of an executing program shown
in Fig. 4.1:
Generated executable code
Static data objects
Dynamic data-object- heap
Automatic data objects- stack
Static allocation
Static allocation is a procedure which is used for allocation of all the data
objects at compile time. Static allocation is possible only when the
compiler knows the size of data object at compile time. In this type of
allocation, formation of data objects is not possible under any 2
circumstances at run time. In static allocation, compiler decides the amount
of storage for each data object and binds the name of data objects to the
allocated storage. In static allocation, names are bound to storage
locations. If memory is created at compile time then the memory will be
created in static area and only once.
Advantages
It is easy to implement.
It allows type checking during compilation.
It eliminates the feasibility of running out of memory. 1
Disadvantages
It is incompatible with recursive subprograms.
It is not possible to use variables whose size has to be
determined at run time.
The static allocation can be completed if the size of the data
object is called compile time.
Stack allocation
a. Stack allocation strategy is a strategy in which the storage is
organized as stack. This stack is also called control stack.
b. As activation begins the activation records are pushed onto the stack 2
and on completion of this activation the corresponding activation
records can be popped.
c. The locals are stored in the each activation record. Hence locals are
bound to corresponding activation record on each fresh activation.
d. The data structures can be created dynamically for stack allocation.
Limitations of stack allocation
The memory addressing can be done using pointers and index registers. 1
Hence this type of allocation is slower than static allocation.
Heap Allocation
a. If the values of non local variables must be retained even after the
activation record then such a retaining is not possible by stack 3
allocation. This limitation of stack allocation is because of its Last-In
First Out nature. For retaining of such local variables heap allocation
strategy is used.
b. The heap allocation allocates the continuous block of memory when
required for storage of activation records or other data object. This
allocated memory can be deallocated when activation ends. This
deallocated (free) space can be further reused by heap manager.
c. The efficient heap management can be done by
i) Creating a linked list for the free blocks and when any memory is
deallocated that block of memory is appended in the linked list.
ii) Allocate the most suitable block of memory from the linked list.
ie. use best fit technique for allocation of block.
(OR)
14. B Explain the use of symbol table in compilation process. List out the R (16) CO4
various attributes for implementing the symbol table.
Symbol Table:
Symbol Table is an important data structure created and 2
maintained by the compiler in order to keep track of semantics of
variables i.e. it stores information about the scope and binding
information about names, information about instances of various
entities such as variable and function names, classes, objects, etc. It
is built-in lexical and syntax analysis phases. The information is
collected by the analysis phases of the compiler and is used by the
synthesis phases of the compiler to generate code. It is used by the
compiler to achieve compile-time efficiency.
A symbol table may serve the following purposes depending upon the 2
language in hand:
To store the names of all entities in a structured form at one place.
To verify if a variable has been declared.
To implement type checking, by verifying assignments and
expressions in the source code are semantically correct.
To determine the scope of a name (scope resolution). It is used by
various phases of the compiler as follows:-
• Lexical Analysis: Creates new table entries in the table, for
example like entries about tokens
• Syntax Analysis: Adds information regarding attribute
type, scope, dimension, line of reference, use, etc in the
table.
• Semantic Analysis: Uses available information in the table
to check for semantics i.e. to verify that expressions and
assignments are semantically correct(type checking) and
update it accordingly.
• Intermediate Code generation: Refers symbol table for
knowing how much and what type of run- time is allocated
and table helps in adding temporary variable information.
• Code Optimization: Uses information present in the
symbol table for machine-dependent optimization.
• Target Code generation: Generates code by using address
information of identifier present in the table.
Symbol Table entries
Each entry in the symbol table is associated with attributes 2
that support the compiler in different phases.
Items stored in Symbol table:
Variable names and constants
Procedure and function names
Literal constants and strings
Compiler generated temporaries
Labels in source languages
Information used by the compiler from Symbol table:
Data type and name
Declaring procedures
Offset in storage
If structure or record then, a pointer to structure table.
For parameters, whether parameter passing by value or by
reference
Number and type of arguments passed to function
Base Address
A symbol table can either be linear or a hash table. Using the following
format, it maintains the entry for each name.
<symbol name, type, attribute>
For example, suppose a variable store the information about the
following variable declaration:static int salary
then, it stores an entry in the following format:
<salary, int, static>
Operations of Symbol table 2
The core operations of a symbol table are Allocate, free, insert, lookup, set
attribute, and get attribute. The allocation operation creates ‗n‗ empty
symbol table. The free operation is used to remove all records and free the
storage of a symbol table. As the name implies, the insert operation puts a
name into a symbol table and returns a pointer to its entry. The lookup
function looks up a name and returns a reference to the corresponding
entry. The set and get attributes associate an attribute with a given entry
and get an attribute associated with a provided. Other steps may be
introduced depending upon the requirements. For example, a delete action
deletes a previously entered name.
The basic operations defined on a symbol table include:
Operation Function
allocate To allocate a new empty symbol table
free To remove all entries and free storage of symbol table
lookup To search for a name and return a pointer to its entry
insert To insert a name in a symbol table and return a pointer to its
entry
set_attribute To associate an attribute with a given array
get_attribute To get an attribute associated with a given array
2
Insert ()
Insert () operation is more frequently used in the analysis phase
when the tokens are identified andnames are stored in the table.
The insert() operation is used to insert the information in the
symbol table like the unique nameoccurring in the source code.
In the source code, the attribute for a symbol is the information
associated with that symbol. Theinformation contains the state, value,
type and scope about the symbol.
The insert () function takes the symbol and its value in the form of
argument. For example: int x;
Should be processed by the compiler as:
insert (x, int)
lookup() 2
In the symbol table, lookup() operation is used to search a name. It is
used to determine:
The existence of symbol in the table.
The declaration of the symbol before it is used.
Check whether the name is used in the scope.
Initialization of the symbol.
Checking whether the name is declared multiple times.
The basic format of lookup() function is as follows:
lookup (symbol)
In a flow graph, a node d dominates node n, if every path from initial node
of the flow graph to n goes through d. This will be denoted by d dom n. 2
Every initial node dominates all the remaining nodes in the flow graph and
the entry of a loop dominates all nodes in the loop. Similarly every node
dominates itself.
Example:
In the flow graph below,
Initial node,node1 dominates every node. 2
node 2 dominates itself
node 3 dominates all but 1 and 2.
node 4 dominates all but 1,2 and 3.
node 5 and 6 dominates only themselves, since flow of control
can skip around either bygoing through the other.
node 7 dominates 7,8 ,9 and 10.
node 8 dominates 8, 9 and 10.
node 9 and 10 dominates only themselves.
15. B (ii) Explain the concept of data flow analysis in flow graphs with U (8) CO5
suitable example
The data flow analysis equation is used to collect information about a 1
program block. The following is the data flow analysis equation for a
statement s-
Out[s] = gen[s] U In[s] - Kill[s]
where
Out[s] is the information at the end of the statement s.
gen[s] is the information generated by the statement s.
In[s] is the information at the beginning of the statement s.
Kill[s] is the information killed or removed by the statement s.
The main aim of the data flow analysis is to find a set of constraints on
the In[s]‟s and Out[s]‟s for the statement s. The constraints include two
types of constraints- The transfer function and the Control Flow constraint.
Transfer Function
The semantics of the statement are the constraints for the data flow 1
values before and after a statement.
For example, consider two statements x = y and z = x. Both these
statements are executed. Thus, after execution, we can say that both x and z
have the same value, i.e. y.
Thus, a transfer function depicts the relationship between the data flow
values before and after a statement.
There are two types of transfer functions-
1. Forward propagation
2. Backward propagation Let‟s see both of these.
• Forward propagation
In forward propagation, the transfer function is represented by Fs
for any statement s.
This transfer function accepts the data flow values before the 1
statement and outputs the new data flow value after the statement.
Thus the new data flow or the output after the statement will be
Out[s] = Fs(In[s]).
Backward propagation
The backward propagation is the converse of the forward
propagation.
After the statement, a data flow value is converted to a new data
1
flow value before the statement using this transfer function.
Thus the new data flow or the output will be In[s] = Fs(Out[s]).
Control-Flow Constraints
The second set of constraints comes from the control flow. The
control flow value of Si will be equal to the control flow values into Si +
1 if block B contains statements S1, S2,........, Sn. That is:
IN[Si + 1] = OUT[Si], for all i = 1 , 2, ,n – 1.
Data Flow Properties
Some properties of the data flow analysis are-
• Available expression 4
• Reaching definition
• Line variable
• Busy expression
We will discuss these properties one by one. Available Expression
An expression a + b is said to be available at a program point x if none of its
operands gets modified before their use. It is used to eliminate common
subexpressions.
An expression is available at its evaluation point.
Example:
In the above example, the variable a is live at blocks B1,B2, B3 and B4 but
is killed at block B5 since its value is changed from 2 to b + c. Similarly,
variable b is live at block B3 but is killed at block B4.
Busy Expression
An expression is said to be busy along a path if its evaluation occurs along
that path, but none of its operand definitions appears before it. It is used for
performing code movement optimization.