Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
34 views28 pages

CD 22-23 Answers

The document provides an overview of compiler design concepts, including definitions and examples of regular expressions, features of lexical analyzers, limitations of recursive descent parsers, and the bootstrapping process. It also discusses heap storage allocation advantages, rules for FIRST and FOLLOW sets, common sub-expression elimination, types of LR parsers, and semantic rules evaluation. Additionally, it differentiates between parse trees and syntax trees, and describes how various phases of compilation can be combined into passes.

Uploaded by

guruswain215
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views28 pages

CD 22-23 Answers

The document provides an overview of compiler design concepts, including definitions and examples of regular expressions, features of lexical analyzers, limitations of recursive descent parsers, and the bootstrapping process. It also discusses heap storage allocation advantages, rules for FIRST and FOLLOW sets, common sub-expression elimination, types of LR parsers, and semantic rules evaluation. Additionally, it differentiates between parse trees and syntax trees, and describes how various phases of compilation can be combined into passes.

Uploaded by

guruswain215
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

Compiler Designing 2022-23

By: Nitish Kumar Mohanty


1) a) Define regular expression. Give example.
A regular expression (or regex) is a sequence of characters that define a search pattern. This pattern can be used for
string matching, validation, or searching and replacing in text. Regular expressions are commonly used in
programming and text processing to identify, extract, or manipulate specific substrings within larger strings.

1. Start by understanding the special characters used in regex, such as “.”, “*”, “+”, “?”, and more.
2. Choose a programming language or tool that supports regex, such as Python, Perl, or grep.
3. Write your pattern using the special characters and literal characters.
4. Use the appropriate function or method to search for the pattern in a string.
Examples:

1. To match a sequence of literal characters, simply write those characters in the pattern.

2. To match a single character from a set of possibilities, use square brackets, e.g. [0123456789] matches any
digit.

3. To match zero or more occurrences of the preceding expression, use the star (*) symbol.

4. To match one or more occurrences of the preceding expression, use the plus (+) symbol.

5. It is important to note that regex can be complex and difficult to read, so it is recommended to use tools like
regex testers to debug and optimize your patterns.

Regular expression for an email address:


^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$

b) What are the features of a Lexical analyzer?


A Lexical Analyzer (or Lexer) is a crucial part of a compiler or an interpreter. It processes the source code to break it
down into manageable pieces called tokens. Here are the main features of a lexical analyzer:
1. Tokenization
• Definition: The primary function of a lexical analyzer is to read the input source code and convert it into a
series of tokens.
• Example: Breaking the statement int a = 5; into tokens like int, a, =, 5, ;.
2. Pattern Recognition
• Definition: The lexer identifies patterns based on predefined rules and matches them with lexical categories
(e.g., keywords, identifiers, operators).
• Example: Recognizing int as a keyword and a as an identifier.
3. Elimination of White Spaces and Comments
• Definition: The lexical analyzer removes unnecessary white spaces and comments from the source code to
make the analysis easier for subsequent compiler stages.
• Example: Removing spaces and // This is a comment from the source.
4. Error Detection
• Definition: It can detect and report errors in the lexical structure, such as invalid tokens or illegal characters.
• Example: Reporting an error for an unsupported character like @ in int a@ = 5;.

5. Symbol Table Generation


• Definition: The lexer may contribute to the generation of a symbol table that records information about
identifiers (e.g., variable names, data types).
• Example: Storing information for a as an identifier in the symbol table.
6. Input Buffering and Efficiency
• Definition: To process the input efficiently, the lexical analyzer uses techniques like input buffering and two-
pointer scanning to handle characters without re-scanning the input multiple times.
• Benefit: This reduces the overhead of reading the input stream, making the analysis faster.
7. Lexeme Classification
• Definition: The lexer groups the input characters into lexemes, which are then categorized as tokens. Each
lexeme is the smallest sequence of characters that forms a valid token.
• Example: In the string int a = 5; , int and a are lexemes that are categorized as a keyword and an identifier,
respectively.

c) What are the limitations of recursive descent parser?


Recursive descent parsers are a type of top-down parser that uses a set of recursive procedures to process the input.
While they are simple to implement and understand, they come with several limitations:

1. Limited to LL (1) Grammars:

o Recursive descent parsers can only handle LL(1) grammars, where parsing decisions are made with
one lookahead symbol. This limits their ability to parse more complex grammars that require greater
lookahead or cannot be parsed using a single left-to-right, leftmost derivation.

2. Left Recursion:

o One of the major limitations is their inability to handle left-recursive grammars. If a grammar rule
has left recursion (e.g., A → Aα | β), it can lead to infinite recursion during parsing. Such grammars
need to be transformed into equivalent non-left-recursive grammars to be parsed using a recursive
descent parser.

3. Ambiguity:

o Recursive descent parsers are not well-suited for handling ambiguous grammars, where multiple
parse trees are possible for the same input. The parser cannot decide which path to take without
additional disambiguation strategies.

4. Backtracking:

o If implemented without backtracking, recursive descent parsers may not be able to handle grammars
that require trying different paths when parsing. While backtracking can be added, it makes the
parser inefficient and can lead to exponential parsing times in the worst case.

d) Define Boot strapping.


Bootstrapping is a process in which simple language is used to translate more complicated program which in turn
may handle for more complicated program. This complicated program can further handle even more complicated
program and so on. Writing a compiler for any high-level language is a complicated process. It takes lot of time to
write a compiler from scratch. Hence simple language is used to generate target code in some stages. to clearly
understand the Bootstrapping technique, consider a following scenario. Suppose we want to write a cross compiler
for new language X. The implementation language of this compiler is say Y and the target code being generated is in
language Z. That is, we create XYZ. Now if existing compiler Y runs on machine M and generates code for M then it is
denoted as YMM. Now if we run XYZ using YMM then we get a compiler XMZ. That means a compiler for source
language X that generates a target code in language Z and which runs on machine M. Following diagram illustrates
the above scenario.
e) What are the advantages of heap storage allocation?
Heap storage allocation is a dynamic memory allocation strategy that allows for flexible and efficient memory
management. Here are the key advantages:

1. Dynamic Memory Management:


o The heap allows programs to allocate and deallocate memory at runtime, which provides flexibility to
handle variable-sized data structures that aren't known at compile time.
2. Efficient Use of Memory:
o Heap allocation allows for efficient use of memory by allocating only the amount needed at runtime.
This helps in optimizing memory usage for programs that require memory management beyond
what static allocation offers.
3. Variable Lifetime:
o Memory allocated on the heap remains available until it is explicitly deallocated. This makes it ideal
for objects or data structures that need to persist beyond the function scope or for the entire
duration of the program.
4. Flexibility for Data Structures:
o The heap is essential for implementing complex data structures like linked lists, trees, and graphs,
where the size of the structure can grow or shrink dynamically during program execution.

f) List out the rules for FIRST and FOLLOW.


Rules for FIRST Set
The FIRST set of a grammar symbol X is the set of terminals that begin the strings derivable from XXX. Here's how to
compute it:
1. For a terminal a:
o FIRST(a)={a}.
2. For a non-terminal A:
o Consider all productions of A: A→X1X2… Xn.
o Add FIRST(X1) (excluding ε) to FIRST(A).
o If ε ∈ FIRST(X1), then add FIRST(X2) (excluding ε).
o Repeat this process for X3, X4, …. until:
▪ A terminal is found, or
▪ All Xi derive ε, in which case ε is added to FIRST(A).
3. For ε:
o FIRST(ε)={ε}.

Rules for FOLLOW Set


The FOLLOW set of a non-terminal A is the set of terminals that can appear immediately to the right of A in some
derivation. Here's how to compute it:
1. For the start symbol S:
o Add (EOF symbol) to FOLLOW(S).
2. For a production A→αBβ:
o Add all elements of FIRST(β) (excluding ε) to FOLLOW(B).
3. If ε ∈ FIRST(β) or β is empty:
o Add all elements of FOLLOW(A) to FOLLOW(B).
4. Repeat until no changes occur:
o Continue applying the above rules iteratively until the FOLLOW sets stabilize.
g) What is common sub expression elimination?
Code Optimization Technique is an approach to enhance the performance of the code by either eliminating or
rearranging the code lines. Code Optimization techniques are as follows:

1. Compile-time evaluation
2. Common Sub-expression elimination
3. Dead code elimination
4. Code movement
5. Strength reduction
Common Sub-expression Elimination:

The expression or sub-expression that has been appeared and computed before and appears again during the
computation of the code is the common sub-expression. Elimination of that sub-expression is known as Common
sub-expression elimination.

The advantage of this elimination method is to make the computation faster and better by avoiding the re-
computation of the expression. In addition, it utilizes memory efficiently.

h) Describe in brief about types of LR parsers.


LR parsers are a type of bottom-up parsers used for syntax analysis in compilers. They read the input from left to right
and produce a rightmost derivation in reverse. LR parsers can handle a wide range of context-free grammars and are
known for their efficiency. There are four main types of LR parsers:

1. Simple LR (SLR) Parser:


o The simplest type of LR parser.
o Constructs the parsing table using LR (0) items and a follow set.
o Has limited capability for handling conflicts and cannot parse all context-free grammars.
o Suitable for smaller and simpler grammars.
2. Canonical LR (CLR) Parser:
o The most powerful and comprehensive type of LR parser.
o Uses LR (1) items, which consist of an LR(0) item plus a lookahead symbol.
o Capable of parsing a wider range of grammars with minimal conflicts.
o Generates larger parsing tables, making it more complex and less efficient for practical use.
3. Lookahead LR (LALR) Parser:
o A simplified version of the CLR parser.
o Merges states with identical LR(0) items but different lookaheads to create smaller tables.
o Balances power and efficiency, making it popular for practical compilers like YACC (Yet Another
Compiler Compiler).
o Can handle most programming languages, but might fail on highly ambiguous grammars.
4. Incremental LR Parser:
o An extension used in environments where parsing is done incrementally, such as in IDEs.
o Allows partial re-parsing of only affected portions when changes are made, improving performance
in interactive applications.

i) What is semantic rule? How to evaluate the semantic rules?


Semantic Rules
Semantic rules are used in syntax-directed translation to specify how semantic information is computed and how
actions are taken during the parsing process. They are typically associated with a grammar for a programming
language and are used to define the meaning (or semantics) of syntactic constructs.
Semantic rules are associated with production rules in a grammar and can involve:
1. Synthesized Attributes: Attributes computed from the attributes of children nodes in a parse tree.
2. Inherited Attributes: Attributes passed down from parent nodes or siblings in a parse tree.
Semantic rules typically involve:
• Operations on attributes of grammar symbols.
• Generation of intermediate code or executable code.
• Construction of data structures (like symbol tables or abstract syntax trees).

Evaluation of Semantic Rules


The evaluation of semantic rules depends on the grammar and the associated attributes. Here are the steps:
1. Attribute Grammars:
o A formalism that pairs attributes with grammar symbols and semantic rules with grammar
productions.
o Each grammar symbol in a production may have one or more attributes, and semantic rules define
how these attributes are computed.
2. Evaluation Methods:
o Top-Down (Predictive Parsers):
▪ Inherited attributes are evaluated as the parse tree is constructed.
▪ Synthesized attributes are computed during backtracking or recursive calls.
o Bottom-Up (Shift-Reduce Parsers):
▪ Attributes are computed after the right-hand side of a production is reduced.
▪ Suitable for LR parsers.
3. Dependency Graph:
o Construct a dependency graph for attributes, where nodes represent attributes, and edges represent
dependencies between them.
o Evaluate attributes in an order consistent with the dependency graph.
4. Order of Evaluation:
o S-Attributed Grammars: Only synthesized attributes; can be evaluated in a bottom-up manner.
o L-Attributed Grammars: Include both synthesized and inherited attributes but require specific
traversal orders (left-to-right).
5. Semantic Actions:
o Actions embedded within productions are executed as the parser processes the grammar.
o For example, in YACC or Bison, these actions may generate code or perform semantic checks.
6. Implementation Example:
o In a compiler, semantic rules may perform type checking, evaluate expressions, generate
intermediate code, or enforce scoping rules.

j) Differentiate Parse tree and Syntax tree with an example.


Parse Tree:
A parse tree is a visual representation of the syntactic structure of a piece of source code, as produced by a parser. It
shows the hierarchy of the elements in the code and the relationships between them.

In compiler design, a parse tree is generated by the parser, which is a component of the compiler that processes the
source code and checks it for syntactic correctness. The parse tree is then used by other components of the compiler,
such as the code generator, to generate machine code or intermediate code that can be executed by the target
machine.

Parse trees can be represented in different ways, such as a tree structure with nodes representing the different
elements in the source code and edges representing the relationships between them, or as a graph with nodes and
edges representing the same information. Parse trees are typically used as an intermediate representation in the
compilation process, and are not usually intended to be read by humans.
Example:

Here is the Parse tree for the expression, 3*4+5

Syntax Tree:

A syntax tree is a tree-like representation of the syntactic structure of a piece of source code. It is typically used in the
process of compiler design, to represent the structure of the code in a way that is easier to analyze and manipulate.

Syntax trees are constructed by parsing the source code, which involves analyzing the code and breaking it down into
its individual components, such as tokens, variables, and statements. The resulting tree is made up of nodes that
correspond to these various components, with the structure of the tree reflecting the grammatical structure of the
source code.

Syntax trees are useful for a variety of tasks in compiler design, such as type checking, optimization, and code
generation. They can also be used to represent the structure of other types of linguistic or logical structures, such as
natural language sentences or logical expressions.

Example:
Here is the Syntax tree for the expression, 3*4+5

PART-2
2) a) Describe how various phases could be combined as a pass in a compiler?

In a compiler, the different phases of compilation can be grouped and executed as a single pass or multiple passes,
depending on the compiler's design and optimization requirements. A pass refers to a single traversal of the source
code or an intermediate representation, during which specific tasks (from one or more phases) are performed.
Phases of a Compiler
The typical phases of a compiler include:
1. Lexical Analysis
2. Syntax Analysis
3. Semantic Analysis
4. Intermediate Code Generation
5. Code Optimization
6. Code Generation
Combining Phases into Passes
1. Single-Pass Compiler
• Combines all phases into a single pass.
• Ideal for simple languages or when performance (compilation speed) is a priority.
• Typically used for early compilers or just-in-time (JIT) compilation.
How it works:
• The compiler processes the source code linearly.
• As the lexical analyzer generates tokens, the syntax analyzer parses them, semantic checks are performed,
and intermediate or final code is generated in one go.
Advantages:
• Faster compilation as there’s only one traversal.
• Suitable for small or simple languages.
Limitations:
• Difficult to optimize the generated code.
• Less modular, harder to extend.

2. Two-Pass Compiler
• Breaks the compilation process into two major passes:
o First Pass: Lexical, syntax, and semantic analysis.
o Second Pass: Intermediate code optimization and final code generation.
How it works:
• The first pass processes the source code up to an intermediate representation.
• The second pass optimizes and translates the intermediate representation into machine code.
Advantages:
• Allows some level of optimization.
• Easier to debug and maintain compared to a single-pass compiler.
Example:
• Early FORTRAN compilers often used a two-pass design.

3. Multi-Pass Compiler
• Divides the compilation into several passes, each focusing on specific tasks or a group of tasks.
• Intermediate representations are used between passes to maintain the state.
Typical Pass Breakdown:
• Pass 1: Lexical Analysis, Syntax Analysis, Semantic Analysis.
• Pass 2: Intermediate Code Generation.
• Pass 3: Optimization (e.g., dead code elimination, loop unrolling).
• Pass 4: Code Generation.
Advantages:
• Highly modular and extensible.
• Allows sophisticated optimizations.
• Easier to debug and modify specific passes.
Limitations:
• Slower compilation due to multiple traversals.
• Higher memory usage for intermediate representations.
Example:
• Modern optimizing compilers like GCC and LLVM use multiple passes.

Trade-offs in Combining Phases


1. Combining Front-End Phases
• Lexical Analysis, Syntax Analysis, and Semantic Analysis are often combined in a single pass.
• Tools like YACC and Lex generate parsers that integrate these phases.
2. Combining Optimization and Code Generation
• In single-pass or two-pass compilers, optimization may be limited and done during or after code generation.
• In multi-pass compilers, separate passes are used for global and machine-specific optimizations.
3. Intermediate Code as a Bridge
• Using an intermediate representation allows separation of front-end (source-language-specific) and back-end
(target-machine-specific) phases, enabling modularity and reuse.

b) Eliminate left recursion in the following grammar


A → ABd | Aa | a
B → Be | b
c) Differentiate between NFA and DFA.

d) Discuss in brief about LL (1) Grammars.


LL(1) Parsing: Here the 1st L represents that the scanning of the Input will be done from the Left to Right
manner and the second L shows that in this parsing technique, we are going to use the Left most Derivation
Tree. And finally, the 1 represents the number of look-ahead, which means how many symbols are you
going to see when you want to make a decision.
Essential conditions to check first are as follows:
1. The grammar is free from left recursion.
2. The grammar should not be ambiguous.
3. The grammar has to be left factored in so that the grammar is deterministic grammar.
These conditions are necessary but not sufficient for proving a LL(1) parser.

Algorithm to construct LL(1) Parsing Table:


Step 1: First check all the essential conditions mentioned above and go to step 2.
Step 2: Calculate First() and Follow() for all non-terminals.
1. First(): If there is a variable, and from that variable, if we try to drive all the strings then the beginning
Terminal Symbol is called the First.
2. Follow(): What is the Terminal Symbol which follows a variable in the process of derivation.
Step 3: For each production A –> α. (A tends to alpha)
1. Find First(α) and for each terminal in First(α), make entry A –> α in the table.
2. If First(α) contains ε (epsilon) as terminal, then find the Follow(A) and for each terminal in Follow(A), make
entry A –> ε in the table.
3. If the First(α) contains ε and Follow(A) contains $ as terminal, then make entry A –> ε in the table for the $.
To construct the parsing table, we have two functions:
In the table, rows will contain the Non-Terminals and the column will contain the Terminal Symbols. All the Null
Productions of the Grammars will go under the Follow elements and the remaining productions will lie under the
elements of the First set.
Now, let’s understand with an example.
Example 1: Consider the Grammar:
E --> TE'
E' --> +TE' | ε
T --> FT'
T' --> *FT' | ε
F --> id | (E)

*ε denotes epsilon
Step 1: The grammar satisfies all properties in step 1.
Step 2: Calculate first() and follow().
Find their First and Follow sets:

First Follow

E –> TE’ { id, ( } { $, ) }

E’ –> +TE’/
{ +, ε } { $, ) }
ε

T –> FT’ { id, ( } { +, $, ) }

T’ –> *FT’/
{ *, ε } { +, $, ) }
ε

F –> id/(E) { id, ( } { *, +, $, ) }


Step 3: Make a parser table.

Now, the LL(1) Parsing Table is:

id + * ( ) $

E E –> TE’ E –> TE’

E’ E’ –> +TE’ E’ –> ε E’ –> ε

T T –> FT’ T –> FT’

T’ T’ –> ε T’ –> *FT’ T’ –> ε T’ –> ε

F F –> id F –> (E)

As you can see that all the null productions are put under the Follow set of that symbol and all the remaining
productions lie under the First of that symbol.

Note: Every grammar is not feasible for LL(1) Parsing table. It may be possible that one cell may contain more than
one production.

Advantages of Construction of LL(1) Parsing Table:

1.Deterministic Parsing: LL(1) parsing tables give a deterministic parsing process, truly intending that for a given
information program and language structure, there is a novel not entirely set in stone by the ongoing non-terminal
image and the lookahead token. This deterministic nature works on the parsing calculation and guarantees that the
parsing system is unambiguous and unsurprising.

2.Efficiency: LL(1) parsing tables take into consideration productive parsing of programming dialects. When the
parsing table is built, the parsing calculation can decide the following parsing activity by straightforwardly ordering
the table, bringing about a steady time query. This productivity is particularly useful for huge scope programs and can
altogether lessen the time expected for parsing.

3.Predictive Parsing: LL(1) parsing tables work with prescient parsing, where the parsing activity is resolved
exclusively by the ongoing non-terminal image and the lookahead token without the requirement for backtracking or
speculating. This prescient nature makes the LL(1) parsing calculation direct to execute and reason about. It likewise
adds to better blunder dealing with and recuperation during parsing.

4.Error Discovery: The development of a LL(1) parsing table empowers the parser to proficiently distinguish mistakes.
By dissecting the passages in the parsing table, the parser can recognize clashes, like various sections for a similar
non-terminal and lookahead blend. These struggles demonstrate sentence structure ambiguities or mistakes in the
syntax definition, considering early discovery and goal of issues.

5.Non-Left Recursion: LL(1) parsing tables require the disposal of left recursion in the language structure. While left
recursion is a typical issue in syntaxes, the most common way of killing it brings about a more organized and
unambiguous language structure. The development of a LL(1) parsing table energizes the utilization of non-left
recursive creations, which prompts more clear and more effective parsing calculations.

6.Readability and Practicality: LL(1) parsing tables are by and large straightforward and keep up with. The parsing
table addresses the whole parsing calculation in a plain configuration, with clear mappings between non-terminal
images, lookahead tokens, and parsing activities. This plain portrayal works on the comprehensibility of the parsing
calculation and improves on changes to the sentence structure, making it more viable over the long haul.

7.Language Plan: Building a LL(1) parsing table assumes a significant part in the plan and improvement of
programming dialects. LL(1) language structures are frequently preferred because of their straightforwardness and
consistency. By guaranteeing that a punctuation is LL(1) and building the related parsing table, language planners can
shape the linguistic structure and characterize the normal way of behaving of the language all the more really.

e) Differentiate between Top down and bottom-up parsing techniques.

Advantages of Top-Down Parsing


• Simplicity: It is easier to implement and comprehend the top-down parser than the bottom-up
parser particularly when dealing with simple languages.
• Predictive Parsing: There also exists a more efficient variant of top-down parsing referred to as LL
Parsing, which is capable of anticipating which particular rule should be applied next taking into
account the Lookahead symbol.
• No Need for Backtracking: Backtracking is not needed in a parser such as a deterministic top-down
parser like those in the LL class of parsers leading to optimizations.
Advantages of Bottom-Up Parsing
• Handles Complex Grammars: There are no known problems of bottom-up parsers in that regard:
bottom-up parsers can recognize left-recursive grammars.
• Efficient: LR parsers in specifically bottom-up parsers are more efficient and what’s more powerful
when it comes to parsing context-free grammars. They can work with considerable and complex
grammar with limited limitation as compared to other language models.
• No Backtracking: These are the reasons why the LR parsers do not necessitate the use of
backtracking making them more efficient in terms of performance.
f) Construct FIRST and FOLLOW for the Grammar:
E→E+T / T
T→T*F / F
F→ (E) / id.
g) Define Ambiguous Grammar? Check whether the grammar:
S→aAB
A→bC/cd
C→cd
B→c/d
is Ambiguous or not?
Ambiguous grammar: A CFG is said to be ambiguous if there exists more than one derivation tree for the given
input string i.e., more than one LeftMost Derivation Tree (LMDT) or RightMost Derivation Tree
(RMDT). Definition: G = (V,T,P,S) is a CFG that is said to be ambiguous if and only if there exists a string in T* that has
more than one parse tree. where V is a finite set of variables. T is a finite set of terminals. P is a finite set of
productions of the form, A -> ?, where A is a variable and ? ? (V ? T)* S is a designated variable called the start
symbol.
h) Define Intermediate code generator. Explain in brief about different forms of Intermediate code
generation.
Intermediate Code Generation is a stage in the process of compiling a program, where the compiler translates the
source code into an intermediate representation. This representation is not machine code but is simpler than the
original high-level code. Here’s how it works:

• Translation: The compiler takes the high-level code (like C or Java) and converts it into an intermediate form,
which can be easier to analyze and manipulate.

• Portability: This intermediate code can often run on different types of machines without needing major
changes, making it more versatile.

• Optimization: Before turning it into machine code, the compiler can optimize this intermediate code to make
the final program run faster or use less memory.

If we generate machine code directly from source code then for n target machine we will have optimizers and n code
generator but if we will have a machine-independent intermediate code, we will have only one optimizer.
Intermediate code can be either language-specific (e.g., Bytecode for Java) or language. independent (three-address
code). The following are commonly used intermediate code representations:

Postfix Notation

• Also known as reverse Polish notation or suffix notation.


• In the infix notation, the operator is placed between operands, e.g., a + b. Postfix notation positions the
operator at the right end, as in ab +.
• For any postfix expressions e1 and e2 with a binary operator (+) , applying the operator yields e1e2+.
• Postfix notation eliminates the need for parentheses, as the operator’s position and arity allow unambiguous
expression decoding.
• In postfix notation, the operator consistently follows the operand.
Example 1: The postfix representation of the expression (a + b) * c is : ab + c *
Example 2: The postfix representation of the expression (a – b) * (c + d) + (a – b) is : ab – cd + *ab -+
Read more: Infix to Postfix

Three-Address Code
• A three address statement involves a maximum of three references, consisting of two for operands and one
for the result.
• A sequence of three address statements collectively forms a three address code.
• The typical form of a three address statement is expressed as x = y op z, where x, y, and z represent memory
addresses.
• Each variable (x, y, z) in a three address statement is associated with a specific memory location.
While a standard three address statement includes three references, there are instances where a statement may
contain fewer than three references, yet it is still categorized as a three address statement.
Example: The three address code for the expression a + b * c + d : T1 = b * c T2 = a + T1 T3 = T2 + d; T 1 , T2 , T3 are
temporary variables.

Syntax Tree

• A syntax tree serves as a condensed representation of a parse tree.

• The operator and keyword nodes present in the parse tree undergo a relocation process to become part of
their respective parent nodes in the syntax tree. the internal nodes are operators and child nodes are
operands.

• Creating a syntax tree involves strategically placing parentheses within the expression. This technique
contributes to a more intuitive representation, making it easier to discern the sequence in which operands
should be processed.

The syntax tree not only condenses the parse tree but also offers an improved visual representation of the program’s
syntactic structure,
Example: x = (a + b * c) / (a – b * c)

i) Explain in brief about Type checking and Type Conversion.


Type checking is the process of verifying that the types of all variables, expressions, and functions are according to
the rules of the programming language. It means operations operate on compatible data types, which helps in
preventing errors and hence ensures correctness of the program. Type checking might be performed at compile time,
called static type checking, or it could be performed during execution, and this is termed dynamic type checking.
Both techniques have different advantages concerning error detection and flexibility of programs.

Every programming language has a collection of type rules that dictate how integers, floats, and characters can
legally be used. A compiler keeps track of this type information and performs a set of computations to verify that a
program is correct with respect to the type rules. Type checking therefore offers a guarantee that programs are type-
safe and compatible with the language specifications.

Types of Type Checking

There are two kinds of type checking:


• Static Type Checking.
• Dynamic Type Checking.
Static Type Checking

Static type checking is defined as type checking performed at compile time. It checks the type variables at compile-
time, which means the type of the variable is known at the compile time. It generally examines the program text
during the translation of the program. Using the type rules of a system, a compiler can infer from the source text that
a function (fun) will be applied to an operand (a) of the right type each time the expression fun(a) is evaluated.

Dynamic Type Checking

Dynamic Type Checking is defined as the type checking being done at run time. In Dynamic Type Checking, types are
associated with values, not variables. Implementations of dynamically type-checked languages runtime objects are
generally associated with each other through a type tag, which is a reference to a type containing its type
information. Dynamic typing is more flexible. A static type system always restricts what can be conveniently
expressed. Dynamic typing results in more compact programs since it is more flexible and does not require types to
be spelled out. Programming with a static type system often requires more design and implementation effort.

Type Conversion
Type conversion is the process of converting one data type to another. This can happen automatically or explicitly:
1. Implicit Type Conversion (Type Coercion):
o Performed by the compiler or interpreter automatically.
o Converts data types without explicit instruction from the programmer.
2. Explicit Type Conversion (Type Casting):
• Performed by the programmer manually using specific functions or syntax.
• Ensures precise control over type conversion.

j) Differentiate between Static and Dynamic Storage allocation Strategies.


Part-3
Q3 a) What is intermediate code? Translate the expression (a+b)/(c+d)*(a+b/c)-d into quadruples, triples
and indirect triples.
b) Define Symbol table? Explain about the data structures used for Symbol table.
Intermediate Code Generation is a stage in the process of compiling a program, where the compiler translates the
source code into an intermediate representation. This representation is not machine code but is simpler than the
original high-level code. Here’s how it works:

• Translation: The compiler takes the high-level code (like C or Java) and converts it into an intermediate form,
which can be easier to analyze and manipulate.

• Portability: This intermediate code can often run on different types of machines without needing major
changes, making it more versatile.

• Optimization: Before turning it into machine code, the compiler can optimize this intermediate code to make
the final program run faster or use less memory.

Given That: (a+b)/(c+d)*(a+b/c)-d


b)
Symbol Table:

A Symbol Table is a data structure used by a compiler or interpreter to store information about variables, functions,
objects, and other entities in the source code during the compilation or interpretation process. It is crucial for
semantic analysis, code generation, and optimization.

The symbol table stores attributes of identifiers, such as:


• Name: The identifier's name (e.g., a variable name).
• Type: The data type of the identifier (e.g., integer, float).
• Scope: The region of the program in which the identifier is valid (e.g., global or local scope).
• Memory Location: The location in memory where the variable is stored (for runtime).
• Other Attributes: These can include function signatures, parameter types, initialization values, and more,
depending on the type of symbol (e.g., functions, classes).
The symbol table is critical to performing semantic analysis (e.g., type checking), and it also helps the compiler
manage memory for variables and functions.

Data Structures for Symbol Table:

Various data structures can be used to implement a symbol table, each with advantages and disadvantages
depending on the requirements of the compiler, such as fast lookups, insertions, and deletions. The most commonly
used data structures are:

1. Hash Table:

A hash table is one of the most efficient data structures for symbol tables. It allows for average constant time
complexity O(1)O(1)O(1) for insertions, lookups, and deletions.

• How It Works:
o Each identifier (symbol) is hashed to a unique index using a hash function. The hash function takes
the identifier name and maps it to a specific location in an array.
o The symbol table can then store the symbol’s associated information at this hashed index.
o In case of hash collisions (when two different identifiers map to the same index), collision resolution
techniques like chaining or open addressing are used.
• Advantages:
o Fast lookups and insertions.
o Efficient for large programs.
• Disadvantages:
o Hash collisions can degrade performance.
o Requires a good hash function to minimize collisions.
2. Binary Search Tree (BST):

A Binary Search Tree (BST) can also be used to implement a symbol table. This structure maintains the symbol table
in a sorted order, making it easy to perform search, insert, and delete operations efficiently.

• How It Works:
o Each node in the BST represents a symbol, and the tree is ordered according to the symbol's name.
o To insert or search for a symbol, we traverse the tree based on lexicographical ordering of the symbol
names.
• Advantages:
o The tree remains ordered, which allows for easy traversal and listing of symbols in alphabetical order.
o Efficient with an average time complexity of O(log⁡n)O(\log n)O(logn) for operations.
• Disadvantages:
o Performance can degrade to O(n)O(n)O(n) in the worst case if the tree becomes unbalanced (i.e.,
resembling a linked list).
o Requires extra space for pointers to child nodes.
3. AVL Tree (Self-Balancing BST):

An AVL Tree is a type of self-balancing binary search tree. It automatically keeps the tree balanced after every
insertion and deletion to ensure O(log⁡n)O(\log n)O(logn) time complexity for all operations.
• How It Works:
o Each node in the tree stores a balance factor, which indicates whether the tree is balanced or needs
rebalancing. After any insertion or deletion, the tree rebalances itself.
• Advantages:
o Provides guaranteed O(log⁡n)O(\log n)O(logn) time complexity for insertion, deletion, and lookup
operations.
• Disadvantages:
o More complex to implement and maintain than a simple BST.
o Requires extra space to store balance factors and perform rotations during rebalancing.
4. Linked List:

A linked list is a simple data structure where each node points to the next node. It can be used for symbol tables, but
it is typically slower than other data structures for searching and inserting symbols.

• How It Works:
o Each node in the list contains a symbol (name and associated information).
o To search for a symbol, the list must be traversed from the beginning, making it inefficient for large
symbol tables.
• Advantages:
o Easy to implement and understand.
o Simple for small symbol tables or educational purposes.
• Disadvantages:
o Searching for symbols takes O(n)O(n)O(n) time, which is inefficient for larger programs.
o Insertion and deletion are O(1)O(1)O(1), but only if we already know the position.

5. Trie:

A Trie (or prefix tree) is a tree-like data structure that stores strings (like the identifiers in a symbol table) where
nodes represent characters, and paths represent prefixes of the identifiers.

• How It Works:
o Each path from the root to a leaf node represents a symbol, with each character of the symbol being
stored in a separate node.
o The Trie allows for efficient prefix-based searches and retrieval of all symbols starting with a
particular prefix.
• Advantages:
o Allows efficient retrieval and insertion of strings.
o Supports autocomplete and prefix searches.
• Disadvantages:
o Takes up more space than other data structures, as each node can require additional memory for
storing pointers and character data.
o Can be more complex to implement.
6. Array or List:

In simple cases, an array or list may be used to implement the symbol table, especially for small grammars where
symbols do not need to be accessed frequently.

• How It Works:
o Each element of the array represents a symbol, and the index or position in the array could
correspond to the symbol’s position or ordering.
• Advantages:
o Simple and easy to implement.
• Disadvantages:
o Search operations are linear (O(n)O(n)O(n)), which makes it inefficient for large symbol tables.
o Insertion and deletion may require shifting elements, making them O(n)O(n)O(n) operations.

Q4 a) What is an activation record? What is its content? When is it created? Explain with an example.
b) What do you mean by code optimization? Explain machine dependent and independent optimization
with suitable examples.

Activation Record:

An activation record is a contiguous block of storage that manages information required by a single execution of a
procedure. When you enter a procedure, you allocate an activation record, and when you exit that procedure, you
de-allocate it. Basically, it stores the status of the current activation function. So, whenever a function call occurs,
then a new activation record is created and it will be pushed onto the top of the stack. It will remain in stack till the
execution of that function. So, once the procedure is completed and it is returned to the calling function, this
activation function will be popped out of the stack.

Contents of an Activation Record

The typical fields in an activation record include:


1. Return Address:
o The address in the calling function where execution resumes after the current function completes.
2. Control Link:
o A pointer to the activation record of the calling function, facilitating proper return to the caller.
3. Access Link:
o A pointer to access non-local variables in nested or enclosing functions (used in languages supporting
nested functions).
4. Parameters:
o Stores the actual parameters (arguments) passed to the function.
5. Local Variables:
o Stores variables declared within the function.
6. Temporary Variables:
o Holds intermediate results computed within the function.
7. Saved Register Values:
o Saves the values of registers that need to be restored after the function call.

When is it Created?

An activation record is created when a function is called and is pushed onto the call stack. Once the function
completes execution, the activation record is popped off the stack.

Consider this c programming

Execution Steps:

1. Call to main():
o An activation record for main is created.
o Local variables x and result are stored in main's activation record.
2. Call to sum(x, 10):
o An activation record for sum is created.
o Parameters a (set to x, i.e., 5) and b (set to 10) are stored.
o Local variable total is allocated.
3. Completion of sum():
o total is computed as a + b (i.e., 5 + 10 = 15).
o The activation record of sum is popped off the stack, and control returns to main.
4. Completion of main():
o The activation record of main is popped off, and the program ends.
b)
Code Optimization

Code optimization is the process of improving the efficiency of a program by making it run faster, use less memory,
or consume fewer resources without altering its functionality. The goal of code optimization is to improve the
performance of the compiled code in terms of execution speed, memory usage, and power consumption, among
other factors.

Code optimization is generally performed during the compilation process by optimizing intermediate code, machine
code, or assembly code. The optimization techniques can be broadly classified into two types:

1. Machine-Independent Optimization

2. Machine-Dependent Optimization

These optimizations help in making the program more efficient in a general sense (machine-independent) or tailored
to a specific type of machine architecture (machine-dependent).

Machine-Independent Optimization

Machine-independent optimizations are those that apply to any target machine architecture, i.e., they are not
dependent on the specific hardware the program will run on. These optimizations typically focus on improving the
high-level structure of the program, making it more efficient without considering the underlying machine details.

Examples of Machine-Independent Optimization

1. Constant Folding:
o This optimization involves evaluating constant expressions at compile time rather than at runtime.
o Example:
int x = 3 + 5; // Original code
After constant folding:
int x = 8; // Optimized code
o The result of 3 + 5 is calculated at compile time, reducing unnecessary runtime computation.
2. Constant Propagation:
o In constant propagation, known constant values are substituted throughout the code where
appropriate.
o Example:
int a = 5;
int b = a + 3; // 'a' is constant
After constant propagation:
int a = 5;
int b = 5 + 3; // 'a' is replaced with its value
3. Dead Code Elimination:
o This removes code that never gets executed or has no impact on the program’s outcome. If a variable
is assigned a value but never used, the assignment can be removed.
o Example:

int a = 5;
int b = a + 3; // Dead code if b is never used
After dead code elimination:
int a = 5; // Redundant code removed
4. Loop Invariant Code Motion:
o This optimization moves computations that do not change within a loop outside the loop, reducing
repeated evaluations.
o Example:
for (int i = 0; i < n; i++) {
a = b + c; // 'b + c' is invariant, it doesn't depend on i
// loop body
}
After optimization:

a = b + c; // Moved out of loop


for (int i = 0; i < n; i++) {
// loop body
}
5. Inlining of Functions:
o Function inlining replaces a function call with the actual body of the function, which reduces the
overhead of calling a function and can lead to further optimizations (like constant folding).
o Example:
int add(int x, int y) { return x + y; }
int result = add(3, 4); // Function call
After inlining:
int result = 3 + 4; // Function call replaced with code

Machine-Dependent Optimization

Machine-dependent optimization is tailored to the characteristics of the underlying hardware, like the processor
architecture, memory hierarchy, and instruction set. These optimizations are designed to take advantage of the
machine-specific features for better performance on a particular platform.

Examples of Machine-Dependent Optimization

1. Register Allocation:

o This optimization focuses on assigning frequently used variables to processor registers instead of
memory, as accessing registers is faster than accessing memory.

o Example: If the program has variables x and y, and the architecture has only 4 registers, the compiler
can allocate the most frequently used variables to registers to minimize memory accesses.

▪ Before optimization (using memory):

LOAD R1, x // Load variable x into register R1

ADD R1, R1, y // Add y to R1

▪ After optimization (using registers):

ADD R1, R2, R3 // Directly use registers R1, R2, R3 for computation
2. Peephole Optimization:

o This is a local optimization that looks at small sets of instructions and replaces them with more
efficient instruction sequences that accomplish the same task. It often targets the use of specific
machine instructions that are faster or more compact.

o Example:

ADD R1, R2, R3

ADD R4, R5, R6

After peephole optimization (if R3 is the same as R5):

ADD R1, R2, R3 // Removed redundant ADD operation

3. Instruction Scheduling:

o This optimization involves rearranging the order of machine instructions to avoid pipeline stalls and
make better use of the CPU's instruction pipeline. It is especially important on modern processors
where multiple instructions can be executed concurrently.

o Example: If one instruction is dependent on the result of another, scheduling the independent
instructions first can improve performance.

▪ Before optimization:

LOAD R1, A

ADD R2, R1, B

▪ After optimization (scheduling independent instructions first):

LOAD R1, A

// Other independent instructions

ADD R2, R1, B

4. Loop Unrolling:

o Loop unrolling is the process of expanding a loop by performing multiple operations in each iteration,
thereby reducing the loop overhead (such as the cost of evaluating the loop condition).

o Example:

for (int i = 0; i < 4; i++) {

a[i] = b[i] + 2;

After unrolling:

a[0] = b[0] + 2;

a[1] = b[1] + 2;

a[2] = b[2] + 2;

a[3] = b[3] + 2;
5. Cache Optimization:

o This optimization ensures that data is accessed in a way that minimizes cache misses. This is done by
ensuring that frequently accessed data is kept in the CPU cache, which is faster than main memory.

o Example: Accessing an array in a sequential manner can improve cache locality, as opposed to
random access:

▪ Before optimization (poor cache locality):


for (int i = 0; i < n; i++) {
for (int j = 0; j < m; j++) {
// Accessing matrix[i][j]
}
}
▪ After optimization (better cache locality):
for (int j = 0; j < m; j++) {
for (int i = 0; i < n; i++) {
// Accessing matrix[i][j]
}
}

Q5 a) For the following grammar construct SLR parser and parse (a,a,^)
S→ a| ^|(R)
T →S,T|S
R →T

b) Show that the following grammar is CLR (1) but not SLR (1).

S →A aA b | B bB a
A →
B →
Q6 a) Consider the following grammar:
A → A & B/ B
B → B @ C/ C
C → C # D/ D
D → id
What can you say about the precedence and associativity of operator &, @ and #?
b) Show that following grammar is SLR(1) but not LL(1).

S →S A | A
A →a

a)
To determine the precedence and associativity of the operators &, @, and # in the given grammar:
Grammar
1. A→A&B ∣ B
2. B→B@C ∣ C
3. C→C#D ∣ D
4. D→id

Observations and Precedence


The structure of the grammar determines the precedence of the operators:
• A→A&B shows that & is applied between two sub-expressions of type A and B.
• B→B@C shows that @ is applied between two sub-expressions of type B and C.
• C→C#D shows that # is applied between two sub-expressions of type C and D.
• D is the terminal symbol id.
From the grammar:
1. Highest precedence: #
o C derives C#D, meaning # binds the tightest and applies to the smallest units first.
2. Middle precedence: @
o BBB derives B@C, meaning @ binds less tightly than # but more tightly than &.
3. Lowest precedence: &
o A derives A&B, meaning & binds the loosest and applies last in expressions.

Associativity
The associativity is determined by the recursive structure of the grammar:
1. Left-recursion (e.g., A→A&B, B→B@C, C→C#D) indicates left-associativity for all operators &, @, #.
o For example, A&B&C is parsed as (A&B)&C.

Summary
• Precedence (from highest to lowest): # > @ > &.
• Associativity: All operators (&, @, #) are left-associative.

b)
Checking for LL (1)
Left Factoring the Grammar:
The given grammar contains left recursion (S→SA), which prevents it from being directly parsed using LL(1). Let's
attempt to rewrite it.
1. Rewrite S using left factoring:
S→AS′
S′→AS′∣ϵ
2. New Grammar:
o S→AS′
o S′→AS′ ∣ ϵ
o A→a
FIRST and FOLLOW Sets:
• FIRST(A): {a}
• FIRST(S'): {a,ϵ}
• FIRST(S): {a}
• FOLLOW(S): {$}
• FOLLOW(S'): {$}
• FOLLOW(A): {$}
LL(1) Parsing Table:
• S→AS′: Uses a.
• S′→AS′: Uses a.
• S′→ϵ: Uses $.
The grammar has conflicts in the LL(1) table due to S′→AS′ and S′→ϵ both being applicable when the lookahead is a.
This demonstrates that the grammar is not LL (1).

You might also like