CD
2 marks
a) Regular Expression (RE) for strings where the leftmost symbol differs from the
rightmost symbol over {a, b}
We want strings such that:
First character ≠ last character.
Over alphabet Σ = {a, b}
Possible cases:
Starts with a and ends with b
Starts with b and ends with a
Regular Expression:
a(a|b)*b | b(a|b)*a
This expression covers all strings:
That start with a and end with b
Or start with b and end with a
With any number of as and bs in between.
b) Why is buffering used in lexical analysis? What are the commonly used buffering
methods?
Why buffering is used:
Lexical analysis involves reading characters from the source file.
To improve efficiency, characters are read in blocks instead of one-by-one.
Buffering reduces the number of I/O operations, which are expensive.
Common buffering methods:
Single Buffering:
One buffer holds a block of input.
Not efficient for backtracking, as you may need to reread input.
Double Buffering:
Uses two halves of a buffer.
When one half is exhausted, the other is loaded while scanning continues.
Reduces latency and supports lookahead and backtracking.
Sentinel Method (used with double buffering):
Each half of buffer ends with a sentinel (usually EOF).
Simplifies end-of-buffer detection without needing explicit checks.
c) What is 'Handle Pruning' in bottom-up parsing?
Definition:
Handle pruning is the reduction process in bottom-up parsing where:
A handle (i.e., the rightmost derivation step) of the sentential form is identified.
It is replaced with its corresponding non-terminal (reverse of production).
Example:
If we have production: E → E + T, and stack contains E + T, then:
E + T is a handle.
Replacing it with E is handle pruning.
This is the essence of Shift-Reduce Parsing.
d) Conflicts in Shift-Reduce Parser
Shift-Reduce Conflict:
Parser is unsure whether to shift the next symbol or reduce the stack content.
Common in ambiguous grammars.
Example:
if E then S else Sif E then S
While parsing if E then S else S, ambiguity exists.
Reduce-Reduce Conflict:
Two different productions could be reduced from the current input.
Parser cannot decide which one to apply.
Example for Reduce-Reduce:
Given:
A → αB → α
If α is on the stack, the parser is confused whether to reduce it to A or B.
e) What is Three Address Code (TAC)? Mention the various representations.
Definition:
Three Address Code is an intermediate representation (IR) used in compilers.
Each instruction has:
At most three operands
Looks like: x = y op z
Representations of TAC:
Quadruples:
(operator, arg1, arg2, result)
Example: x = y + z → (+, y, z, x)
Triples:
(operator, arg1, arg2)
Result is implied by the index of the instruction.
Indirect Triples:
Similar to triples but uses a pointer table to the list of triples.
Facilitates easy code reordering.
Static Single Assignment (SSA):
Each variable is assigned only once.
Introduces φ-functions for merging control paths.
f) What is type checking? When is it done?
Definition:
Type checking is the process of ensuring semantic correctness of a program:
Operands in an expression must be of compatible types.
Example: You cannot add a string to an integer.
When is type checking done?
Static Type Checking:
Done at compile-time.
Most common in statically-typed languages like C, Java.
Dynamic Type Checking:
Done at run-time.
Used in dynamically-typed languages like Python, JavaScript.
g) Machine Independent Code Optimization Techniques
These are optimizations applied before target machine code generation.
Common techniques:
Constant Folding:
Evaluate constant expressions at compile time.
Example: x = 2 + 3 → x = 5
Constant Propagation:
Replace variables known to have constant values.
Example: If x = 5, and later y = x + 1, → y = 6
Dead Code Elimination:
Remove code that has no effect on program output.
Example: Unused assignments.
Common Subexpression Elimination (CSE):
Avoid recomputing expressions whose results are already known.
Example: Reuse result of a + b if already computed.
Loop Invariant Code Motion:
Move calculations that do not change inside loops to outside.
Strength Reduction:
Replace expensive operations with cheaper ones.
Example: x = i * 2 → x = i + i
Copy Propagation:
Replace variables that are simply copies of others
h) What is a DAG? Applications in Code Optimization
DAG: Directed Acyclic Graph
A DAG is a graph with:
Directed edges
No cycle
In compiler design, DAG is used to represent basic blocks.
Applications of DAG:
Common Subexpression Elimination
Subexpressions with the same operands and operators share a node.
Dead Code Elimination
Nodes with no effect or usage can be pruned.
Code Generation
Generate optimized instruction sequences from DAG.
Expression Simplification
Identify algebraic identities or redundancies.
i) Evaluate E.value for the expression: 2#3&5#6&4 using given grammar
Grammar & Semantic Rules:
E → E1 # T { E.value = E1.value * T.value }
|T { E.value = T.value }
T → T1 & F { T.value = T1.value + F.value }
|F { T.value = F.value }
F → num { F.value = num.value }
Now compute step-by-step:
Expression: 2 # 3 & 5 # 6 & 4
group by precedence: # has lower precedence than
So group as:
2 # ((3 & 5) # (6 & 4))
Compute:
3&5=3+5=8
6 & 4 = 6 + 4 = 10
Now: 2 # (8 # 10)
8 # 10 = 8 * 10 = 80
2 # 80 = 2 * 80 = 16
Answer: E.value = 160
j) Various Data Structures for Symbol Table
Symbol table stores info about identifiers (variables, functions, etc.).
Data Structures used:
Linear List:
Simple list/array
Slow lookup: O(n)
Hash Table:
Most commonly used.
Fast average lookup time: O(1)
Binary Search Tree (BST):
Balanced trees like AVL, Red-Black Tree.
Lookup time: O(log n)
Trie (Prefix Tree):
Efficient for strings.
Used in compilers for fast prefix lookup.
Self-Organizing List:
Frequently used items are moved to the front.
a) Differentiate DFA and NFA
DFA (Deterministic Finite NFA (Nondeterministic Finite
Feature
Automaton) Automaton)
One transition per input Can have multiple transitions for the
Transition
symbol from a state same input symbol
Epsilon (ε) Allowed (can move without input
Not allowed
transitions symbol)
Determinism Fully deterministic (one path) May have multiple paths or none
State transition Exactly one transition Zero, one, or more transitions
Simpler to construct but harder to
Implementation Easier and more efficient
simulate
Equivalence Equivalent to NFA Equivalent to DFA (can be converted)
b) Mention the Job of Lexical Analysis
Lexical analysis is the first phase of a compiler.
Jobs:
Tokenization: Convert input characters into tokens (like keywords, identifiers,
numbers).
Remove whitespace and comments
Recognize lexemes
Detect lexical errors (invalid tokens)
Pass tokens to the syntax analyzer
Communicate with the symbol table
Buffer input for efficiency
c) Explain the Specifications of LEX Programming
LEX is a lexical analyzer generator used to create scanners in C.
Specifications of a LEX Program:
A LEX file has three sections separated by %%:
css
CopyEdit
1. Definition Section2. Rules Section3. User Code Section
Example:
%{
#include <stdio.h>
%}
%%
[0-9]+ { printf("NUMBER\n"); }
[a-zA-Z]+ { printf("IDENTIFIER\n"); }
%%
int main() {
yylex();
return 0;
}
Definition Section: Includes headers or global declarations.
Rules Section: Regex + corresponding action in C code.
User Code Section: Main function, calling yylex().
d) What Do You Mean by LL(1)?
LL(1) is a type of top-down parser.
First L: Scans input from Left to right.
Second L: Constructs Leftmost derivation.
1: Uses 1 lookahead symbol
Key Points:
Simple and fast parsing.
Requires grammar to be non-left-recursive and factored.
Predictive parsing table used.
e) What is Handle Pruning?
Handle pruning is part of bottom-up parsing.
Definition:
A handle is a substring that matches the right-hand side of a production.
Pruning replaces this handle with the corresponding non-terminal.
Repeated until the start symbol is produced.
Example:
For E → E + T, if stack has E + T, this is a handle, and it's replaced with E.
f) What is Backpatching?
Backpatching is a technique in intermediate code generation for managing jump targets
that are not known initially.
Why it's used:
For control flow statements like if, while, where jump addresses are determined later.
Example:
if (a < b)
x = x + 1;
While generating code for the if, the jump target isn't known.
Backpatching fills the jump target after parsing the block.
g) Explain Loop Invariant with an Example
A loop invariant is a piece of code inside a loop that yields the same result on every
iteration.
Purpose:
Move such code outside the loop for optimization.
Example:
for (int i = 0; i < n; i++) {
y = x + 2;
a[i] = y * i;
}
y = x + 2 is loop invariant.
Move it outside:
y = x + 2;for (int i = 0; i < n; i++) {
a[i] = y * i;
}
h) What is S-Attributed Grammar?
An S-attributed grammar is a syntax-directed definition where:
All attributes are synthesized.
Attributes are computed from child nodes only
Used in:
Bottom-up parsing
Easy to evaluate in post-order traversal.
i) Differentiate Static and Dynamic Storage Allocation
Feature Static Allocation Dynamic Allocation
When allocated Compile-time Run-time
Flexibility Inflexible Flexible
Memory usage Fixed size Varies as needed
Examples Global/static variables Heap memory, dynamic arrays
Efficiency Fast access Slower due to allocation
6 marks
a) Define Regular Expression. Explain the properties of Regular Expressions.
Construct an FA equivalent to the regular expression (0+1)(00+11)(0+1).
Regular Expression (RE):
A regular expression is a formal notation used to describe patterns in strings over a given
alphabet. It defines a set of strings that belong to a particular language and is instrumental in
lexical analysis for pattern matching.
Properties of Regular Expressions:
Union ( + ): Represents the choice between expressions. For example, a + b denotes
either 'a' or 'b'.
Concatenation: Sequential arrangement of expressions. ab denotes 'a' followed by 'b'.
Kleene Star ( * ): Denotes zero or more repetitions of the preceding expression. a*
denotes '', 'a', 'aa', 'aaa', etc.
Precedence: Kleene Star has the highest precedence, followed by concatenation, and
then union.
Parentheses: Used to group expressions and override default precedence.
Constructing Finite Automaton (FA) for (0+1)(00+11)(0+1):
To construct an FA for the given regular expression, we can follow these steps:
Breakdown the Expression:
First part: (0+1) – Accepts either '0' or '1'.
Second part: (00+11) – Accepts either '00' or '11'.
Third part: (0+1) – Accepts either '0' or '1'.
Construct FA for Each Part:
(0+1): A simple FA with a start state transitioning to an accept state on input '0'
or '1'.
(00+11): An FA that accepts either '00' or '11'. This requires branching paths:
One path for '00': start → '0' → '0' → accept.
Another path for '11': start → '1' → '1' → accept.
(0+1): Similar to the first part.
Combine the FAs:
Concatenate the three FAs by connecting the accept state of the first to the start
state of the second, and similarly for the second to the third.
Ensure that the transitions are properly labeled and that the final accept state is
clearly defined.
The resulting FA will accept strings where the first and last symbols are either '0' or '1', and
the middle part is either '00' or '11'. Examples of accepted strings include '0000', '0110',
'1001', and '1111'.
b) Discuss the issues associated with grammars in top-down parsing.
Top-down parsing constructs a parse tree from the start symbol and proceeds by expanding
productions. However, certain grammar structures pose challenges:
Left Recursion:
Grammars with left-recursive rules (e.g., A → Aα) cause infinite recursion in
top-down parsers.
Such grammars need to be transformed to eliminate left recursion.
Ambiguity:
Ambiguous grammars have multiple parse trees for the same string, making it
difficult for parsers to decide which production to use.
Ambiguity must be resolved for deterministic parsing.
Backtracking:
Without predictive capabilities, parsers may need to backtrack to try alternative
productions, leading to inefficiency.
Left Factoring:
When multiple productions for a non-terminal share a common prefix, the
parser cannot decide which production to use based on the next input symbol.
Left factoring rewrites the grammar to defer the decision until enough input is
read.
Non-Determinism:
Grammars that require lookahead of more than one symbol to make parsing
decisions are non-deterministic and complicate top-down parsing.
Addressing these issues often involves transforming the grammar to a suitable form for top-
down parsing, such as eliminating left recursion and performing left factoring.
c) Construct the CLR parser for the following grammar:
S → (L) | aL → L,S | S
Steps to Construct a CLR (Canonical LR) Parser:
Augment the Grammar:
Add a new start symbol: S' → S.
Compute the LR(1) Items:
Generate the collection of LR(1) items, which are sets of items with lookahead
symbols.
Each item is of the form [A → α·β, a], where '·' indicates the position in the
production, and 'a' is the lookahead symbol.
Construct the Canonical Collection of Sets of LR(1) Items:
Begin with the closure of the initial item [S' → ·S, $].
Use the GOTO function to compute transitions between item sets.
Build the Parsing Table:
For each state (item set), determine the ACTION and GOTO entries based on
the items:
If [A → α·aβ, b] is in the state, and 'a' is a terminal, then ACTION[state, a]
= shift to the state corresponding to GOTO(state, a).
If [A → α·, a] is in the state, then ACTION[state, a] = reduce by A → α.
If [S' → S·, $] is in the state, then ACTION[state, $] = accept.
GOTO[state, A] is defined for non-terminals A.
Due to the complexity and length of the parsing table, it's advisable to refer to detailed
examples or use parser generation tools for practical implementation.
d) Consider the following grammar:
S → AS | bA → a
Construct the SLR parse table for the grammar.
Steps:
Augment the Grammar:
Add S' → S.
Compute the LR(0) Items:
Generate the canonical collection of LR(0) items.
Compute the FOLLOW Sets:
FOLLOW(S) = { $, b }
FOLLOW(A) = { a, b }
Construct the Parsing Table:
For each state, determine the ACTION and GOTO entries based on the items
and FOLLOW sets.
Use the standard SLR parsing table construction method, where reductions are
applied based on FOLLOW sets.
The resulting SLR parsing table will guide the parser in making shift and reduce decisions
based on the current state and input symbol.
e) Draw the annotated parse tree for:
i) int a, b, c
ii) float w, x, y, z
Given Syntax-Directed Definitions (SDD):
D → T L with semantic rule: L.inh = T.type
T → int with semantic rule: T.type = integer
T → float with semantic rule: T.type = float
L → L1 , id with semantic rule: L1.inh = L.inh; addType(id.entry, L.inh)
L → id with semantic rule: addType(id.entry, L.inh
Annotated Parse Tree for int a, b, c:
Parse T → int, so T.type = integer.
Apply D → T L, setting L.inh = T.type = integer.
Parse L → L1 , id:
First, L1 → L2 , id:
L2 → id:
Apply addType(a.entry, integer).
Apply addType(b.entry, integer).
Apply addType(c.entry, integer).
Annotated Parse Tree for float w, x, y, z:
Parse T → float, so T.type = float.
Apply D → T L, setting L.inh = T.type = float.
Parse L → L1 , id:
Continue recursively for each identifier:
Apply addType(w.entry, float).
Apply addType(x.entry, float).
Apply addType(y.entry, float).
Apply addType(z.entry, float).
Each addType function associates the identifier with its type in the symbol table.
f) Explain:
(i) Common Subexpression Elimination (CSE):
CSE identifies expressions that are computed multiple times with the same operands
and eliminates redundant computations by computing the expression once and reusing
the result.
Example:
a = b + c;
d = b + c;
Optimized:
t = b + c;
a = t;
d = t;
(ii) Code Motion:
Code motion moves computations outside loops when the result does not change
across iterations, reducing redundant calculations.
Example:
for (i = 0; i < n; i++) {
y = a + b;
z[i] = y * i;
}
Optimized:
y = a + b;for (i = 0; i < n; i++) {
z[i] = y * i;
}
These optimizations enhance performance by reducing unnecessary computations.
j) Explain the characteristics of peephole optimization.
Peephole optimization is a local optimization technique that examines a small set of
instructions (the "peephole") to identify and replace inefficient sequences with more efficient
ones.
Characteristics:
Local Scope: Operates on a small window of instructions, typically within a basic
block.
Pattern Matching: Identifies specific patterns that can be optimized.
Simplification: Replaces complex instruction sequences with simpler ones.
Redundancy Elimination: Removes unnecessary instructions, such as redundant
loads and stores.
Strength Reduction: Replaces expensive operations with cheaper ones (e.g.,
replacing multiplication with addition).
Example:
MOV R1, R2
MOV R2, R1
Optimized:
; Removed redundant instructions
Peephole optimization improves code efficiency and reduces code size.
k) Describe S-attributed and L-attributed grammars with suitable example
S-attributed Grammar:
An S-attributed grammar is a syntax-directed definition where only synthesized
attributes are used.
Synthesized attributes are those which are computed from the attribute values of
the children nodes in a parse tree (i.e., bottom-up).
It is typically used in bottom-up parsing (like LR parsers).
✅ Example:
Consider the grammar for arithmetic expressions:
E → E1 + T { E.val = E1.val + T.val }
E→T { E.val = T.val }
T → num { T.val = num.val }
Assume:
num.val is taken from the lexical value (say 3 for num).
The attribute val is synthesized and propagated upward.
Parse Tree:
For input 2 + 3, the attribute val is computed bottom-up:
T → num (val = 2)
E1 → T (val = 2)
T → num (val = 3)
E → E1 + T (val = 2 + 3 = 5)
L-attributed Grammar:
An L-attributed grammar may use:
Synthesized attributes, and
Inherited attributes — which are passed from parent or left sibling nodes to the
current node.
Used mostly in top-down parsing (like LL parsers).
✅ Example:
D→TL
T → int { T.type = "int" }
T → float { T.type = "float" }
L → L1 , id { L1.inh = L.inh; addType(id.entry, L.inh) }
L → id { addType(id.entry, L.inh) }
Here, L.inh is an inherited attribute passed from T to L.
The attribute type is inherited from T and passed to each id in L.
Input:
For int a, b, attributes are passed as:
T → int sets T.type = int
L.inh = T.type = int
L → L1 , id, both L1 and id get the type "int"
Key Differences:
Feature S-attributed Grammar L-attributed Grammar
Attribute
Only synthesized Both synthesized and inherited
types
Parsing style Bottom-up (LR) Top-down (LL)
Dependency From children to parent From parent and siblings to child
Postfix expressions, semantic actions in Variable declarations, type
Use case
LR propagation
L) Explain various storage allocation strategies with examples
✅ 1. Static Allocation:
Memory for all variables is allocated at compile time.
The memory location of each variable does not change during runtime.
Used in global/static variables and constants.
Example:
int x = 5;
Here, x gets a fixed location in data segment.
✅ Advantages:
No overhead of allocation/deallocation.
Efficient access.
❌ Disadvantages:
Inflexible; cannot handle recursive procedures or dynamic structures.
✅ 2. Stack Allocation
Used for local variables inside functions or blocks.
Memory is allocated/deallocated in LIFO (Last-In-First-Out) order.
Stack grows with function calls and shrinks on return.
Example:
void func() {
int a = 10; // stored in stack
}
Stack Frame: Includes local variables, return address, etc.
✅ Advantages:
Efficient for nested function calls.
Automatic deallocation.
❌ Disadvantages:
Lifetime tied to function calls.
No support for dynamic memory.
✅ 3. Heap Allocation:
Memory is allocated at runtime using functions like malloc(), new, etc.
It is suitable for dynamic data structures like linked lists, trees.
Example:
int* p = (int*)malloc(sizeof(int));
Here, memory for an integer is dynamically allocated from the heap.
✅ Advantages:
Flexible, can grow or shrink as needed.
Variables can outlive function calls.
❌ Disadvantages:
Slower than stack/static allocation.
Memory leaks if not freed properly.
Comparison:
Feature Static Stack Heap
Allocation Time Compile-time Run-time Run-time
Lifetime Entire program Duration of function Until explicitly freed
Flexibility Rigid Moderate High
Speed Fastest Fast Slower
Use case
a) Write the output of each phase of compilation for the statement:
a = (b + c) * (b + c) * 2;
Phase Output
Lexical Analysis Tokens: id(a), =, (, id(b), +, id(c), ), *, (, id(b), +, id(c), ), *, num(2)
Parse Tree showing the structure of the expression with correct operator
Syntax Analysis
precedence
Semantic
Type checking and validation of identifiers a, b, c, and constant 2
Analysis
t1 = b + c
Intermediate t2 = t1 * t1
Code t3 = t2 * 2
a = t3
Optimization Eliminates repeated computation of (b + c)
Code Generation Machine or assembly instructions generated for the final expression
Answer:
b) Describe the structure of a LEX program. Write a LEX specification to remove
comments (both single line and multiple line) from C source code.
Answer:
Structure of a LEX Program:
%{
/* C declarations */
%}
%%
/* Rules: regex patterns and actions */
%%
/* User code: main function, utilities */
LEX Specification to Remove Comments:
%{
#include <stdio.h>
%}
%%
"//".* ; // Remove single-line comments
"/\\*"([^*]|\\*+[^*/])*"\\*/" ; // Remove multi-line comments
.|\n { ECHO; } // Print everything else
%%
int main() {
yylex();
return 0;
}
c) Define token, lexeme, and pattern. For the following program, identify lexemes,
tokens, and patterns:
Program:
int main()
{
int a, b;
printf("Enter two integers to swap\n");
scanf("%d%d", &a, &b);
a = a + b;
b = a - b;
a = a - b;
printf("a = %d\nb = %d\n", a, b);
return 0;
}
Answer:
Lexeme Token Pattern
int Keyword int
main Identifier [a-zA-Z_][a-zA-Z0-9_]*
(, ), {, } Punctuation Literal characters
a, b Identifier [a-zA-Z_][a-zA-Z0-9_]*
,, ; Punctuation Literal characters
printf, scanf Identifier [a-zA-Z_][a-zA-Z0-9_]*
"Enter two...n" String Literal "[^"]*"
"%d%d" String Literal "[^"]*"
&a, &b Operator + ID &[a-zA-Z_][a-zA-Z0-9_]*
=, +, - Operator Literal characters
return Keyword return
0 Numeric Literal [0-9]+
d) Show that the following grammar is not SLR(1) but is CLR(1):
Grammar:
S → AaAb | BbBaA → εB → ε
Answer:
SLR(1) Analysis:
FIRST(A) = {ε}, FIRST(B) = {ε}
FOLLOW(A) = {a}, FOLLOW(B) = {a}
Both A and B derive ε and are followed by a ⇒ Reduce-Reduce conflict in SLR(1)
table.
CLR(1) Analysis:
CLR(1) uses lookahead in items, not FOLLOW sets.
Each ε-production is associated with its specific context:
A → ε , lookahead a in AaAb
B → ε , lookahead b in BbBa
No conflict due to distinct lookahead ⇒ Grammar is CLR(1)
e) Write algorithm to compute FIRST() and FOLLOW() for the following grammar:
Grammar:
S → ACB | CbB | BaA → da | BCB → g | εC → h | ε
Algorithm for FIRST(X):
If X is a terminal, FIRST(X) = {X}
If X → ε, then ε ∈ FIRST(X)
If X → Y₁Y₂...Yₙ:
Add FIRST(Y₁) excluding ε to FIRST(X)
If ε ∈ FIRST(Y₁), add FIRST(Y₂), and so on
If ε ∈ all FIRST(Yᵢ), then ε ∈ FIRST(X)
Algorithm for FOLLOW(X):
Place $ in FOLLOW(start symbol)
For each production A → αBβ:
Add FIRST(β) (except ε) to FOLLOW(B)
If ε ∈ FIRST(β), add FOLLOW(A) to FOLLOW(B)
For each production A → αB:
Add FOLLOW(A) to FOLLOW(B)
FIRST Sets:
B → g | ε ⇒ FIRST(B) = {g, ε}
C → h | ε ⇒ FIRST(C) = {h, ε}
A → da | BC
⇒ FIRST(A) = {d} from da, and also
B → g | ε, C → h | ε
⇒ FIRST(BC) = FIRST(B) ∪ FIRST(C) = {g, h, ε}
So, FIRST(A) = {d, g, h}
S → ACB, CbB, Ba
⇒ FIRST(S) = FIRST(A), FIRST(CbB), FIRST(Ba)
FIRST(CbB): C → {h, ε} ⇒ if ε ∈ C, look at b ⇒ {h, b}
FIRST(Ba): B → {g, ε} ⇒ if ε ∈ B, look at a ⇒ {g, a}
⇒ FIRST(S) = {d, g, h, b, a}
FOLLOW Sets:
FOLLOW(S) = { $ }
S → ACB
FOLLOW(A) = FIRST(C) = {h, ε}
If ε ∈ FIRST(C), include FIRST(B) = {g, ε}
⇒ FOLLOW(A) = {h, g}
If ε ∈ B too, add FOLLOW(S) ⇒ { $ }
⇒ FOLLOW(A) = {h, g, $}
S → ACB, S → CbB, S → Ba
⇒ B appears in all; FOLLOW(B) = { $, a }
C→h|ε
C is followed by b and B in ACB and CbB
FIRST(B) = {g, ε}; if ε ∈ B, add FOLLOW(S) ⇒ { $ }
⇒ FOLLOW(C) = {g, b, $}
f) Describe the issues associated with grammars in top-down parsing with suitable
example.
Answer:
Top-down parsers, like recursive descent parsers, have two main issues with grammars:
Left Recursion
A grammar is left-recursive if a non-terminal appears on the leftmost side of its
own production.
Example (Problematic)
E → E + T | TT → T * F | FF → (E) | id
Issue: Recursive descent parser enters infinite recursion.
Solution: Convert left-recursion to right recursion:
E → T E'
E' → + T E' | ε
Left Factoring
When two or more productions for a non-terminal begin with the same prefix,
the parser cannot decide which one to choose.
Example:
S → if E then S else S
| if E then S
Issue: Parser gets confused after if E then S.
Solution (Left Factoring):
S → if E then S S'
S' → else S | ε
g) Compare local optimization with global optimization with suitable example.
Feature Local Optimization Global Optimization
Scope Within a basic block Across multiple basic blocks
Speed Fast and simpler Slower due to analysis of control flow
Eliminate dead code inside one
Example Move invariant code out of loops
block
Constant folding, dead code Loop-invariant code motion, common
Techniques
elimination subexpression elimination
Example:
Local Optimization:
a = 4 * 5; // constant folding → a = 20;
Global Optimization:
for(i = 0; i < 100; i++) {
x = y + z; // loop-invariant: move outside loop
a[i] = x * i;
}
h) Compare static, stack, and heap allocations.
Feature Static Allocation Stack Allocation Heap Allocation
Entire program During function
Lifetime Until manually deallocated
run execution
Storage
Data Segment Stack Memory Heap Memory
Location
Speed Fast Faster Slow
Flexibility Fixed-size Function scope Dynamic, variable-size
Dynamic memory
Example Global variables Local variables
(malloc/new)
i) Construct the DAG for the following basic block:
e := a + b
a := e - d
c := b * c
Answer:
Directed Acyclic Graph (DAG):
[+]
/ \
a b
\ /
\ /
[e]
|
[-]
/ \
[e] d
|
a
c := [*]
/\
b c
Nodes represent operations (+, -, *)
Leaves are variables (a, b, c, d)
Reuse of computed value e avoids recomputation
j) Explain loop jamming and loop unrolling with example.
Loop Jamming:
Definition: Combining multiple loops that iterate over the same range into a single
loop.
Example:
for(i=0;i<n;i++) {
a[i] = b[i] + c[i];
}for(i=0;i<n;i++) {
d[i] = e[i] * f[i];
}
After Loop Jamming
for(i=0;i<n;i++) {
a[i] = b[i] + c[i];
d[i] = e[i] * f[i];
}
Loop Unrolling:
Definition: Reducing the overhead of loop control by executing multiple iterations per
loop.
Example:
for(i=0;i<4;i++) {
a[i] = a[i] + 1;
}
After Unrolling:
a[0] = a[0] + 1;
a[1] = a[1] + 1;
a[2] = a[2] + 1;
a[3] = a[3] + 1;
k) Describe Peephole Optimization.
Answer:
Definition: A form of local optimization that examines and replaces short sequences
of instructions (a small "peephole") to improve performance or reduce code size.
Techniques:
Redundant instruction elimination
Example:
MOV R1, R2
MOV R2, R1 → Remove second
Strength reduction
Example:
x=y*2→x=y+y
Algebraic simplification
Example:
x=x+0→x=x
Jump optimization
Eliminate unnecessary GOTO or combine jumps.
l) How is scope information of variables stored in a symbol table? Explain.
Answer:
Symbol Table stores information about identifiers: name, type, scope, memory
location, etc.
Scope Handling Mechanisms:
Using Stack of Symbol Tables:
A new table is pushed when entering a block (function, loop).
Popped when exiting the block.
Ensures variables are visible only within their scope.
Linked List of Tables:
Each table points to its parent (enclosing) scope.
Lookup starts at the current scope and moves outward.
Hash Tables with Scope Information:
Each entry holds a scope level.
Helps handle variable shadowing
Example:
int x = 5; // global scopevoid func() {
int x = 10; // local scope shadows global
}
The symbol table for func contains x=10, linked to the outer table where x=5.
16 marks
Q3. Construct an SLR parsing table for the following grammar: R → R + R | R R | (R) |
a|b
Operator Precedence and Associativity:
() > Concatenation > +
All operators are left associative.
Step 1: Augmented Grammar
Let us add an augmented start symbol S′:
S' → R
R→R+R
R→RR
R → (R)
R→a
R→b
Step 2: Compute FIRST and FOLLOW sets
FIRST(R) = { (, a, b }
FOLLOW(R) = { ), $, +, (, a, b }
Step 3: Construct LR(0) items and DFA (Canonical Collection of LR(0) Items)
I0:
S' → .R
R → .R + R
R → .R R
R → .(R)
R → .a
R → .b
Transitions:
on a → I3
on b → I4
on ( → I5
on R → I1
(Similarly define I1 through I10 as per the DFA transitions)
Step 4: Build the SLR Parsing Table
We consider ACTION and GOTO tables based on the canonical LR(0) collection and
FOLLOW sets. For conflict resolution:
'+' has lowest precedence, is left associative.
Concatenation (RR) has higher precedence than '+', also left associative.
() has highest precedence.
Conflicts are resolved using precedence and associativity rules.
The final parsing table will reflect this precedence hierarchy and associativity to parse
expressions like:
a + b a → ((a + b) a)
Q4. Type Checking, Type Expression, and Type Conversion
Type Checking
Verifies that operations are semantically valid by ensuring operand types match.
Example:
int a = 5;
float b = 3.14;
a + b; // valid (type coercion may happen)
a + "hello"; // error (type mismatch)
Done at compile time (static) or run time (dynamic).
Type Expression:
A compact representation of types using constructors like arrays, records, pointers, etc.
Examples:
int[] → array(int)
pointer to int → ptr(int)
function taking float, returning int → float → in
Type Conversion:
Implicit Conversion (Coercion): Automatic type change.
int a = 5;
float b = 3.0;
float c = a + b; // a is coerced to float
Explicit Conversion (Casting): Manual conversion.
float f = 5.6;
int i = (int)f; // i becomes 5
Q5. Code Optimization Techniques
a) Copy Propagation:
Replaces occurrences of variables with known values.
Example:
a = b;
c = a + d; // becomes c = b + d
b) Dead Code Elimination:
Removes code that doesn’t affect program results.
Example:
a = 5;
a = 6; // 'a = 5' is dead
c) Code Motion:
Moves code outside loops if it does not change within loop.
Example:
for(int i=0;i<10;i++) {
x = y + z; // move out if y and z unchanged
}
d) Loop Invariant Code Motion:
A special case of code motion where expressions invariant within loop are moved out.
Example:
for(int i=0;i<n;i++) {
t = a * b;
arr[i] = t + i;
}
// move 't = a * b;' outside loop
Q6. What is an Activation Record? Draw a diagram of general activation record and
explain the fields.
Activation Record (AR):
It is a runtime data structure used to manage function calls. It stores all necessary
information for function execution.
General Activation Record Structure:
+---------------------+
| Actual Parameters |
+---------------------+
| Return Address |
+---------------------+
| Control Link (static/dynamic link) |
+---------------------+
| Access Link |
+---------------------+
| Saved Registers |
+---------------------+
| Local Variables |
+---------------------+
| Temporary Values |
+---------------------+
Fields Explained:
Actual Parameters: Values passed to the function.
Return Address: Location to return after function completes.
Control Link: Pointer to caller's activation record (dynamic link).
Access Link: Pointer for non-local variable access (static link).
Saved Registers: Caller-saved register values.
Local Variables: Variables declared in the function.
Temporary Values: Intermediate results during expression evaluation.
Activation records are maintained in the runtime stack, and they grow/shrink as function
calls and returns happen.
Q3: Consider the following grammar
D → Type Tlist;Type → int | float
Tlist → Tlist, id | id
a) Find the SLR parser for the above grammar
To construct the SLR (Simple LR) parser, we first need to compute the LR(0) items and
then build the SLR parsing table.
Find the Canonical Collection of LR(0) Items
Let's start by constructing the LR(0) items for each step:
I0 (Initial Item Set):
D → • Type Tlist;Type → • intType → • float
Tlist → • Tlist, id
Tlist → • id
I1 (After Type → int):
D → int • Tlist;
Tlist → • Tlist, id
Tlist → • id
I2 (After Type → float):
D → float • Tlist;
Tlist → • Tlist, id
Tlist → • id
I3 (After Tlist → id
D → Type • Tlist;
Tlist → • Tlist, id
Tlist → • id
I4 (After Tlist → Tlist, id
Tlist → Tlist, • id
I5 (After Tlist → id and moving to Tlist → Tlist, id):
Tlist → Tlist, id •
Construct the SLR Parsing Table
Using the canonical collection of items, we can build the SLR parsing table.
Action Table:
The action table tells whether to shift or reduce or whether to accept based on
the terminal symbol and current state. It is populated by checking the item sets.
Goto Table:
The GOTO table indicates which state to move to when a non-terminal is
encountered.
After constructing these tables, if conflicts arise (e.g., a shift-reduce or reduce-reduce
conflict), the conflict should be resolved using the operator precedence and
associativity rules.
Example: Suppose we encounter a shift-reduce conflict. Using associativity (left
associativity for ,), we may decide to reduce instead of shifting.
Note: The exact parsing table would require constructing all item sets and resolving
any conflicts using precedence and associativity rules.
b) Show the parsing of the string "int id, id, id;" using the parsing table constructed
above.
Step-by-step parsing:
Input String: "int id, id, id;"
Start with initial state in the stack and read the first symbol (int).
Based on the SLR table, shift int and move to the next state.
Next Symbol: id
Shift id and move to the next state.
Next Symbol: ,
Based on the SLR table, shift , and move to the next state.
Next Symbol: id
Shift id and move to the next state.
Next Symbol: ,
Shift , and move to the next state.
Next Symbol: id
Shift id and move to the next state.
Next Symbol: ;
Perform a reduce action, using the production Tlist → Tlist, id and reduce the
stack.
The final stack should reflect the complete parse, and we perform accept.
Q4: List the commonly used intermediate representations. Write the following
expression in all types of intermediate representations you know:
(a-b) * (c + d) - (a + b)
Commonly Used Intermediate Representations (IR)
Abstract Syntax Tree (AST):
A tree structure that captures the hierarchical structure of the source code. It
abstracts away the syntactical details and represents the program's logical
structure.
Three-Address Code (TAC):
A form of intermediate code in which each instruction has at most three
addresses. These addresses can represent variables or temporary values.
Example for (a - b) * (c + d) - (a + b) in TAC:
t1 = a - bt2 = c + dt3 = t1 * t2t4 = a + bresult = t3 - t
Quadruples:
A type of intermediate representation where each instruction is represented as a
tuple with four fields: operator, operand1, operand2, and result.
Example for (a - b) * (c + d) - (a + b) in quadruples:
(−, a, b, t1)
(+, c, d, t2)
(*, t1, t2, t3)
(+, a, b, t4)
(−, t3, t4, result)
Triples:
Similar to quadruples, but they use references to results instead of explicitly
naming the result variables.
Example for (a - b) * (c + d) - (a + b) in triples:
(−, a, b)
(+, c, d)
(*, (result1), (result2))
(+, a, b)
(−, (result3), (result4))
Static Single Assignment (SSA):
A form of IR where each variable is assigned exactly once, making it easier for
optimization algorithms to analyze the program.
Example in SSA for (a - b) * (c + d) - (a + b):
t1 = a - bt2 = c + dt3 = t1 * t2t4 = a + bresult = t3 - t4
Q5
i) Explain the simple code generator with a suitable example.
A simple code generator translates intermediate representations such as three-address code
(TAC) into machine or assembly code. It works by generating instructions that the target
architecture can understand.
For example, consider the expression (a - b) * (c + d) - (a + b):
Intermediate Code (TAC):
t1 = a - bt2 = c + dt3 = t1 * t2t4 = a + bresult = t3 - t4
Assembly Code:
SUB t1, a, b ; t1 = a - b
ADD t2, c, d ; t2 = c + d
MUL t3, t1, t2 ; t3 = t1 * t2
ADD t4, a, b ; t4 = a + b
SUB result, t3, t4 ; result = t3 - t4
The simple code generator would translate each intermediate operation into the
corresponding assembly instruction.
ii) Write detailed notes on basic blocks and flow graphs.
Basic Block: A basic block is a sequence of consecutive statements with no branches
(except at the end). Control enters the block at the beginning and exits at the end.
There is no ambiguity about the order of execution of statements within a basic block.
It forms the building blocks of control flow analysis.
Example:
a = b + cd = e * f
Flow Graph (Control Flow Graph): A control flow graph (CFG) represents the flow
of control in a program. Each basic block is represented as a node, and edges between
nodes represent control flow. It is helpful for analyzing how a program behaves during
execution and is crucial for optimization.
Example:
Start → Basic Block 1 → Basic Block 2 → End
↘→ Basic Block 3
The flow graph helps visualize loops, branches, and paths of execution.
Q6: Obtain the translation scheme for obtaining the three-address code for the
following grammar
S → id := E
E → E1 + E2 | E1 * E2 | -E1 | (E1) | id
Translation Scheme:
For the given grammar, let's define the translation schemes to generate three-address code:
For S → id := E:
Translation Scheme
id := E.code
For E → E1 + E2
Translation Scheme:
E.code = t1 + t2
For E → E1 * E2:
Translation Scheme:
E.code = t1 * t2
For E → -E1:
Translation Scheme:
E.code = -t1
For E → (E1):
Translation Scheme
E.code = E1.code
For E → id:
1.
Translation Scheme
E.code = id
Example for E → E1 + E2:
For the input expression a + b:
E1.code = a
E2.code = b
The resulting three-address code would be:
t1 = a + bE.code = t1