CD 22-23 Answers
CD 22-23 Answers
1. Start by understanding the special characters used in regex, such as “.”, “*”, “+”, “?”, and more.
2. Choose a programming language or tool that supports regex, such as Python, Perl, or grep.
3. Write your pattern using the special characters and literal characters.
4. Use the appropriate function or method to search for the pattern in a string.
Examples:
1. To match a sequence of literal characters, simply write those characters in the pattern.
2. To match a single character from a set of possibilities, use square brackets, e.g. [0123456789] matches any
digit.
3. To match zero or more occurrences of the preceding expression, use the star (*) symbol.
4. To match one or more occurrences of the preceding expression, use the plus (+) symbol.
5. It is important to note that regex can be complex and difficult to read, so it is recommended to use tools like
regex testers to debug and optimize your patterns.
o Recursive descent parsers can only handle LL(1) grammars, where parsing decisions are made with
one lookahead symbol. This limits their ability to parse more complex grammars that require greater
lookahead or cannot be parsed using a single left-to-right, leftmost derivation.
2. Left Recursion:
o One of the major limitations is their inability to handle left-recursive grammars. If a grammar rule
has left recursion (e.g., A → Aα | β), it can lead to infinite recursion during parsing. Such grammars
need to be transformed into equivalent non-left-recursive grammars to be parsed using a recursive
descent parser.
3. Ambiguity:
o Recursive descent parsers are not well-suited for handling ambiguous grammars, where multiple
parse trees are possible for the same input. The parser cannot decide which path to take without
additional disambiguation strategies.
4. Backtracking:
o If implemented without backtracking, recursive descent parsers may not be able to handle grammars
that require trying different paths when parsing. While backtracking can be added, it makes the
parser inefficient and can lead to exponential parsing times in the worst case.
1. Compile-time evaluation
2. Common Sub-expression elimination
3. Dead code elimination
4. Code movement
5. Strength reduction
Common Sub-expression Elimination:
The expression or sub-expression that has been appeared and computed before and appears again during the
computation of the code is the common sub-expression. Elimination of that sub-expression is known as Common
sub-expression elimination.
The advantage of this elimination method is to make the computation faster and better by avoiding the re-
computation of the expression. In addition, it utilizes memory efficiently.
In compiler design, a parse tree is generated by the parser, which is a component of the compiler that processes the
source code and checks it for syntactic correctness. The parse tree is then used by other components of the compiler,
such as the code generator, to generate machine code or intermediate code that can be executed by the target
machine.
Parse trees can be represented in different ways, such as a tree structure with nodes representing the different
elements in the source code and edges representing the relationships between them, or as a graph with nodes and
edges representing the same information. Parse trees are typically used as an intermediate representation in the
compilation process, and are not usually intended to be read by humans.
Example:
Syntax Tree:
A syntax tree is a tree-like representation of the syntactic structure of a piece of source code. It is typically used in the
process of compiler design, to represent the structure of the code in a way that is easier to analyze and manipulate.
Syntax trees are constructed by parsing the source code, which involves analyzing the code and breaking it down into
its individual components, such as tokens, variables, and statements. The resulting tree is made up of nodes that
correspond to these various components, with the structure of the tree reflecting the grammatical structure of the
source code.
Syntax trees are useful for a variety of tasks in compiler design, such as type checking, optimization, and code
generation. They can also be used to represent the structure of other types of linguistic or logical structures, such as
natural language sentences or logical expressions.
Example:
Here is the Syntax tree for the expression, 3*4+5
PART-2
2) a) Describe how various phases could be combined as a pass in a compiler?
In a compiler, the different phases of compilation can be grouped and executed as a single pass or multiple passes,
depending on the compiler's design and optimization requirements. A pass refers to a single traversal of the source
code or an intermediate representation, during which specific tasks (from one or more phases) are performed.
Phases of a Compiler
The typical phases of a compiler include:
1. Lexical Analysis
2. Syntax Analysis
3. Semantic Analysis
4. Intermediate Code Generation
5. Code Optimization
6. Code Generation
Combining Phases into Passes
1. Single-Pass Compiler
• Combines all phases into a single pass.
• Ideal for simple languages or when performance (compilation speed) is a priority.
• Typically used for early compilers or just-in-time (JIT) compilation.
How it works:
• The compiler processes the source code linearly.
• As the lexical analyzer generates tokens, the syntax analyzer parses them, semantic checks are performed,
and intermediate or final code is generated in one go.
Advantages:
• Faster compilation as there’s only one traversal.
• Suitable for small or simple languages.
Limitations:
• Difficult to optimize the generated code.
• Less modular, harder to extend.
2. Two-Pass Compiler
• Breaks the compilation process into two major passes:
o First Pass: Lexical, syntax, and semantic analysis.
o Second Pass: Intermediate code optimization and final code generation.
How it works:
• The first pass processes the source code up to an intermediate representation.
• The second pass optimizes and translates the intermediate representation into machine code.
Advantages:
• Allows some level of optimization.
• Easier to debug and maintain compared to a single-pass compiler.
Example:
• Early FORTRAN compilers often used a two-pass design.
3. Multi-Pass Compiler
• Divides the compilation into several passes, each focusing on specific tasks or a group of tasks.
• Intermediate representations are used between passes to maintain the state.
Typical Pass Breakdown:
• Pass 1: Lexical Analysis, Syntax Analysis, Semantic Analysis.
• Pass 2: Intermediate Code Generation.
• Pass 3: Optimization (e.g., dead code elimination, loop unrolling).
• Pass 4: Code Generation.
Advantages:
• Highly modular and extensible.
• Allows sophisticated optimizations.
• Easier to debug and modify specific passes.
Limitations:
• Slower compilation due to multiple traversals.
• Higher memory usage for intermediate representations.
Example:
• Modern optimizing compilers like GCC and LLVM use multiple passes.
*ε denotes epsilon
Step 1: The grammar satisfies all properties in step 1.
Step 2: Calculate first() and follow().
Find their First and Follow sets:
First Follow
E’ –> +TE’/
{ +, ε } { $, ) }
ε
T’ –> *FT’/
{ *, ε } { +, $, ) }
ε
id + * ( ) $
As you can see that all the null productions are put under the Follow set of that symbol and all the remaining
productions lie under the First of that symbol.
Note: Every grammar is not feasible for LL(1) Parsing table. It may be possible that one cell may contain more than
one production.
1.Deterministic Parsing: LL(1) parsing tables give a deterministic parsing process, truly intending that for a given
information program and language structure, there is a novel not entirely set in stone by the ongoing non-terminal
image and the lookahead token. This deterministic nature works on the parsing calculation and guarantees that the
parsing system is unambiguous and unsurprising.
2.Efficiency: LL(1) parsing tables take into consideration productive parsing of programming dialects. When the
parsing table is built, the parsing calculation can decide the following parsing activity by straightforwardly ordering
the table, bringing about a steady time query. This productivity is particularly useful for huge scope programs and can
altogether lessen the time expected for parsing.
3.Predictive Parsing: LL(1) parsing tables work with prescient parsing, where the parsing activity is resolved
exclusively by the ongoing non-terminal image and the lookahead token without the requirement for backtracking or
speculating. This prescient nature makes the LL(1) parsing calculation direct to execute and reason about. It likewise
adds to better blunder dealing with and recuperation during parsing.
4.Error Discovery: The development of a LL(1) parsing table empowers the parser to proficiently distinguish mistakes.
By dissecting the passages in the parsing table, the parser can recognize clashes, like various sections for a similar
non-terminal and lookahead blend. These struggles demonstrate sentence structure ambiguities or mistakes in the
syntax definition, considering early discovery and goal of issues.
5.Non-Left Recursion: LL(1) parsing tables require the disposal of left recursion in the language structure. While left
recursion is a typical issue in syntaxes, the most common way of killing it brings about a more organized and
unambiguous language structure. The development of a LL(1) parsing table energizes the utilization of non-left
recursive creations, which prompts more clear and more effective parsing calculations.
6.Readability and Practicality: LL(1) parsing tables are by and large straightforward and keep up with. The parsing
table addresses the whole parsing calculation in a plain configuration, with clear mappings between non-terminal
images, lookahead tokens, and parsing activities. This plain portrayal works on the comprehensibility of the parsing
calculation and improves on changes to the sentence structure, making it more viable over the long haul.
7.Language Plan: Building a LL(1) parsing table assumes a significant part in the plan and improvement of
programming dialects. LL(1) language structures are frequently preferred because of their straightforwardness and
consistency. By guaranteeing that a punctuation is LL(1) and building the related parsing table, language planners can
shape the linguistic structure and characterize the normal way of behaving of the language all the more really.
• Translation: The compiler takes the high-level code (like C or Java) and converts it into an intermediate form,
which can be easier to analyze and manipulate.
• Portability: This intermediate code can often run on different types of machines without needing major
changes, making it more versatile.
• Optimization: Before turning it into machine code, the compiler can optimize this intermediate code to make
the final program run faster or use less memory.
If we generate machine code directly from source code then for n target machine we will have optimizers and n code
generator but if we will have a machine-independent intermediate code, we will have only one optimizer.
Intermediate code can be either language-specific (e.g., Bytecode for Java) or language. independent (three-address
code). The following are commonly used intermediate code representations:
Postfix Notation
Three-Address Code
• A three address statement involves a maximum of three references, consisting of two for operands and one
for the result.
• A sequence of three address statements collectively forms a three address code.
• The typical form of a three address statement is expressed as x = y op z, where x, y, and z represent memory
addresses.
• Each variable (x, y, z) in a three address statement is associated with a specific memory location.
While a standard three address statement includes three references, there are instances where a statement may
contain fewer than three references, yet it is still categorized as a three address statement.
Example: The three address code for the expression a + b * c + d : T1 = b * c T2 = a + T1 T3 = T2 + d; T 1 , T2 , T3 are
temporary variables.
Syntax Tree
• The operator and keyword nodes present in the parse tree undergo a relocation process to become part of
their respective parent nodes in the syntax tree. the internal nodes are operators and child nodes are
operands.
• Creating a syntax tree involves strategically placing parentheses within the expression. This technique
contributes to a more intuitive representation, making it easier to discern the sequence in which operands
should be processed.
The syntax tree not only condenses the parse tree but also offers an improved visual representation of the program’s
syntactic structure,
Example: x = (a + b * c) / (a – b * c)
Every programming language has a collection of type rules that dictate how integers, floats, and characters can
legally be used. A compiler keeps track of this type information and performs a set of computations to verify that a
program is correct with respect to the type rules. Type checking therefore offers a guarantee that programs are type-
safe and compatible with the language specifications.
Static type checking is defined as type checking performed at compile time. It checks the type variables at compile-
time, which means the type of the variable is known at the compile time. It generally examines the program text
during the translation of the program. Using the type rules of a system, a compiler can infer from the source text that
a function (fun) will be applied to an operand (a) of the right type each time the expression fun(a) is evaluated.
Dynamic Type Checking is defined as the type checking being done at run time. In Dynamic Type Checking, types are
associated with values, not variables. Implementations of dynamically type-checked languages runtime objects are
generally associated with each other through a type tag, which is a reference to a type containing its type
information. Dynamic typing is more flexible. A static type system always restricts what can be conveniently
expressed. Dynamic typing results in more compact programs since it is more flexible and does not require types to
be spelled out. Programming with a static type system often requires more design and implementation effort.
Type Conversion
Type conversion is the process of converting one data type to another. This can happen automatically or explicitly:
1. Implicit Type Conversion (Type Coercion):
o Performed by the compiler or interpreter automatically.
o Converts data types without explicit instruction from the programmer.
2. Explicit Type Conversion (Type Casting):
• Performed by the programmer manually using specific functions or syntax.
• Ensures precise control over type conversion.
• Translation: The compiler takes the high-level code (like C or Java) and converts it into an intermediate form,
which can be easier to analyze and manipulate.
• Portability: This intermediate code can often run on different types of machines without needing major
changes, making it more versatile.
• Optimization: Before turning it into machine code, the compiler can optimize this intermediate code to make
the final program run faster or use less memory.
A Symbol Table is a data structure used by a compiler or interpreter to store information about variables, functions,
objects, and other entities in the source code during the compilation or interpretation process. It is crucial for
semantic analysis, code generation, and optimization.
Various data structures can be used to implement a symbol table, each with advantages and disadvantages
depending on the requirements of the compiler, such as fast lookups, insertions, and deletions. The most commonly
used data structures are:
1. Hash Table:
A hash table is one of the most efficient data structures for symbol tables. It allows for average constant time
complexity O(1)O(1)O(1) for insertions, lookups, and deletions.
• How It Works:
o Each identifier (symbol) is hashed to a unique index using a hash function. The hash function takes
the identifier name and maps it to a specific location in an array.
o The symbol table can then store the symbol’s associated information at this hashed index.
o In case of hash collisions (when two different identifiers map to the same index), collision resolution
techniques like chaining or open addressing are used.
• Advantages:
o Fast lookups and insertions.
o Efficient for large programs.
• Disadvantages:
o Hash collisions can degrade performance.
o Requires a good hash function to minimize collisions.
2. Binary Search Tree (BST):
A Binary Search Tree (BST) can also be used to implement a symbol table. This structure maintains the symbol table
in a sorted order, making it easy to perform search, insert, and delete operations efficiently.
• How It Works:
o Each node in the BST represents a symbol, and the tree is ordered according to the symbol's name.
o To insert or search for a symbol, we traverse the tree based on lexicographical ordering of the symbol
names.
• Advantages:
o The tree remains ordered, which allows for easy traversal and listing of symbols in alphabetical order.
o Efficient with an average time complexity of O(logn)O(\log n)O(logn) for operations.
• Disadvantages:
o Performance can degrade to O(n)O(n)O(n) in the worst case if the tree becomes unbalanced (i.e.,
resembling a linked list).
o Requires extra space for pointers to child nodes.
3. AVL Tree (Self-Balancing BST):
An AVL Tree is a type of self-balancing binary search tree. It automatically keeps the tree balanced after every
insertion and deletion to ensure O(logn)O(\log n)O(logn) time complexity for all operations.
• How It Works:
o Each node in the tree stores a balance factor, which indicates whether the tree is balanced or needs
rebalancing. After any insertion or deletion, the tree rebalances itself.
• Advantages:
o Provides guaranteed O(logn)O(\log n)O(logn) time complexity for insertion, deletion, and lookup
operations.
• Disadvantages:
o More complex to implement and maintain than a simple BST.
o Requires extra space to store balance factors and perform rotations during rebalancing.
4. Linked List:
A linked list is a simple data structure where each node points to the next node. It can be used for symbol tables, but
it is typically slower than other data structures for searching and inserting symbols.
• How It Works:
o Each node in the list contains a symbol (name and associated information).
o To search for a symbol, the list must be traversed from the beginning, making it inefficient for large
symbol tables.
• Advantages:
o Easy to implement and understand.
o Simple for small symbol tables or educational purposes.
• Disadvantages:
o Searching for symbols takes O(n)O(n)O(n) time, which is inefficient for larger programs.
o Insertion and deletion are O(1)O(1)O(1), but only if we already know the position.
5. Trie:
A Trie (or prefix tree) is a tree-like data structure that stores strings (like the identifiers in a symbol table) where
nodes represent characters, and paths represent prefixes of the identifiers.
• How It Works:
o Each path from the root to a leaf node represents a symbol, with each character of the symbol being
stored in a separate node.
o The Trie allows for efficient prefix-based searches and retrieval of all symbols starting with a
particular prefix.
• Advantages:
o Allows efficient retrieval and insertion of strings.
o Supports autocomplete and prefix searches.
• Disadvantages:
o Takes up more space than other data structures, as each node can require additional memory for
storing pointers and character data.
o Can be more complex to implement.
6. Array or List:
In simple cases, an array or list may be used to implement the symbol table, especially for small grammars where
symbols do not need to be accessed frequently.
• How It Works:
o Each element of the array represents a symbol, and the index or position in the array could
correspond to the symbol’s position or ordering.
• Advantages:
o Simple and easy to implement.
• Disadvantages:
o Search operations are linear (O(n)O(n)O(n)), which makes it inefficient for large symbol tables.
o Insertion and deletion may require shifting elements, making them O(n)O(n)O(n) operations.
Q4 a) What is an activation record? What is its content? When is it created? Explain with an example.
b) What do you mean by code optimization? Explain machine dependent and independent optimization
with suitable examples.
Activation Record:
An activation record is a contiguous block of storage that manages information required by a single execution of a
procedure. When you enter a procedure, you allocate an activation record, and when you exit that procedure, you
de-allocate it. Basically, it stores the status of the current activation function. So, whenever a function call occurs,
then a new activation record is created and it will be pushed onto the top of the stack. It will remain in stack till the
execution of that function. So, once the procedure is completed and it is returned to the calling function, this
activation function will be popped out of the stack.
When is it Created?
An activation record is created when a function is called and is pushed onto the call stack. Once the function
completes execution, the activation record is popped off the stack.
Execution Steps:
1. Call to main():
o An activation record for main is created.
o Local variables x and result are stored in main's activation record.
2. Call to sum(x, 10):
o An activation record for sum is created.
o Parameters a (set to x, i.e., 5) and b (set to 10) are stored.
o Local variable total is allocated.
3. Completion of sum():
o total is computed as a + b (i.e., 5 + 10 = 15).
o The activation record of sum is popped off the stack, and control returns to main.
4. Completion of main():
o The activation record of main is popped off, and the program ends.
b)
Code Optimization
Code optimization is the process of improving the efficiency of a program by making it run faster, use less memory,
or consume fewer resources without altering its functionality. The goal of code optimization is to improve the
performance of the compiled code in terms of execution speed, memory usage, and power consumption, among
other factors.
Code optimization is generally performed during the compilation process by optimizing intermediate code, machine
code, or assembly code. The optimization techniques can be broadly classified into two types:
1. Machine-Independent Optimization
2. Machine-Dependent Optimization
These optimizations help in making the program more efficient in a general sense (machine-independent) or tailored
to a specific type of machine architecture (machine-dependent).
Machine-Independent Optimization
Machine-independent optimizations are those that apply to any target machine architecture, i.e., they are not
dependent on the specific hardware the program will run on. These optimizations typically focus on improving the
high-level structure of the program, making it more efficient without considering the underlying machine details.
1. Constant Folding:
o This optimization involves evaluating constant expressions at compile time rather than at runtime.
o Example:
int x = 3 + 5; // Original code
After constant folding:
int x = 8; // Optimized code
o The result of 3 + 5 is calculated at compile time, reducing unnecessary runtime computation.
2. Constant Propagation:
o In constant propagation, known constant values are substituted throughout the code where
appropriate.
o Example:
int a = 5;
int b = a + 3; // 'a' is constant
After constant propagation:
int a = 5;
int b = 5 + 3; // 'a' is replaced with its value
3. Dead Code Elimination:
o This removes code that never gets executed or has no impact on the program’s outcome. If a variable
is assigned a value but never used, the assignment can be removed.
o Example:
int a = 5;
int b = a + 3; // Dead code if b is never used
After dead code elimination:
int a = 5; // Redundant code removed
4. Loop Invariant Code Motion:
o This optimization moves computations that do not change within a loop outside the loop, reducing
repeated evaluations.
o Example:
for (int i = 0; i < n; i++) {
a = b + c; // 'b + c' is invariant, it doesn't depend on i
// loop body
}
After optimization:
Machine-Dependent Optimization
Machine-dependent optimization is tailored to the characteristics of the underlying hardware, like the processor
architecture, memory hierarchy, and instruction set. These optimizations are designed to take advantage of the
machine-specific features for better performance on a particular platform.
1. Register Allocation:
o This optimization focuses on assigning frequently used variables to processor registers instead of
memory, as accessing registers is faster than accessing memory.
o Example: If the program has variables x and y, and the architecture has only 4 registers, the compiler
can allocate the most frequently used variables to registers to minimize memory accesses.
ADD R1, R2, R3 // Directly use registers R1, R2, R3 for computation
2. Peephole Optimization:
o This is a local optimization that looks at small sets of instructions and replaces them with more
efficient instruction sequences that accomplish the same task. It often targets the use of specific
machine instructions that are faster or more compact.
o Example:
3. Instruction Scheduling:
o This optimization involves rearranging the order of machine instructions to avoid pipeline stalls and
make better use of the CPU's instruction pipeline. It is especially important on modern processors
where multiple instructions can be executed concurrently.
o Example: If one instruction is dependent on the result of another, scheduling the independent
instructions first can improve performance.
▪ Before optimization:
LOAD R1, A
LOAD R1, A
4. Loop Unrolling:
o Loop unrolling is the process of expanding a loop by performing multiple operations in each iteration,
thereby reducing the loop overhead (such as the cost of evaluating the loop condition).
o Example:
a[i] = b[i] + 2;
After unrolling:
a[0] = b[0] + 2;
a[1] = b[1] + 2;
a[2] = b[2] + 2;
a[3] = b[3] + 2;
5. Cache Optimization:
o This optimization ensures that data is accessed in a way that minimizes cache misses. This is done by
ensuring that frequently accessed data is kept in the CPU cache, which is faster than main memory.
o Example: Accessing an array in a sequential manner can improve cache locality, as opposed to
random access:
Q5 a) For the following grammar construct SLR parser and parse (a,a,^)
S→ a| ^|(R)
T →S,T|S
R →T
b) Show that the following grammar is CLR (1) but not SLR (1).
S →A aA b | B bB a
A →
B →
Q6 a) Consider the following grammar:
A → A & B/ B
B → B @ C/ C
C → C # D/ D
D → id
What can you say about the precedence and associativity of operator &, @ and #?
b) Show that following grammar is SLR(1) but not LL(1).
S →S A | A
A →a
a)
To determine the precedence and associativity of the operators &, @, and # in the given grammar:
Grammar
1. A→A&B ∣ B
2. B→B@C ∣ C
3. C→C#D ∣ D
4. D→id
Associativity
The associativity is determined by the recursive structure of the grammar:
1. Left-recursion (e.g., A→A&B, B→B@C, C→C#D) indicates left-associativity for all operators &, @, #.
o For example, A&B&C is parsed as (A&B)&C.
Summary
• Precedence (from highest to lowest): # > @ > &.
• Associativity: All operators (&, @, #) are left-associative.
b)
Checking for LL (1)
Left Factoring the Grammar:
The given grammar contains left recursion (S→SA), which prevents it from being directly parsed using LL(1). Let's
attempt to rewrite it.
1. Rewrite S using left factoring:
S→AS′
S′→AS′∣ϵ
2. New Grammar:
o S→AS′
o S′→AS′ ∣ ϵ
o A→a
FIRST and FOLLOW Sets:
• FIRST(A): {a}
• FIRST(S'): {a,ϵ}
• FIRST(S): {a}
• FOLLOW(S): {$}
• FOLLOW(S'): {$}
• FOLLOW(A): {$}
LL(1) Parsing Table:
• S→AS′: Uses a.
• S′→AS′: Uses a.
• S′→ϵ: Uses $.
The grammar has conflicts in the LL(1) table due to S′→AS′ and S′→ϵ both being applicable when the lookahead is a.
This demonstrates that the grammar is not LL (1).