2.
Review of CFG Ambiguity of Grammars and Parsing Techniques
1. Ambiguity in Context-Free Grammars (CFG)
• Definition:
A CFG is ambiguous if there exists at least one string in the language generated by the grammar that has two or
more distinct parse trees or derivations.
• Implications of Ambiguity:
o Multiple parse trees imply multiple interpretations of the same string, which complicates parsing and
semantic analysis.
o Ambiguity makes it difficult for a parser to decide which derivation to choose.
• Example:
Consider the grammar for arithmetic expressions where expressions can be parsed in different ways leading to
different parse trees (e.g., a + a * a).
2. Bottom-Up Parsing
• Concept:
Bottom-up parsing starts from the input string and tries to construct the start symbol by reducing substrings
(handles) to non-terminals.
• Goal:
To find a rightmost derivation in reverse (reduce rightmost derivation).
• Process:
o Shift symbols onto a stack until a handle is recognized.
o Reduce the handle to a non-terminal using a production rule.
o Repeat until the start symbol is derived or an error is detected.
• Advantages:
o Can handle a larger class of grammars compared to top-down.
o Works efficiently with LR parsers.
3. Shift-Reduce Parsing A common form of bottom-up parsing.
• Actions:
o Shift: Push the next input symbol onto the stack.
o Reduce: Replace the handle on the top of the stack with the corresponding non-terminal.
o Accept: Successfully parse the input.
o Error: Parsing error detected.
• Handle:
The substring that matches the right side of a production rule and can be reduced.
• Conflicts:
o Shift-Reduce Conflict: When parser cannot decide whether to shift or reduce.
o Reduce-Reduce Conflict: When parser cannot decide which production to reduce.
4. LR Parsers
• LR Parsing:
Stands for Left-to-right scanning of the input, producing a Rightmost derivation in reverse.
• Types:
o LR(0): No lookahead symbols.
o SLR(1): Simple LR with 1 lookahead.
o CLR (Canonical LR): Full LR with lookahead.
o LALR(1): Look-Ahead LR with merged states.
• Key Feature:
Uses a stack and an LR parsing table to decide parsing actions.
• LR Parsing Table:
o Action Table: Dictates shift, reduce, accept, or error based on the current state and lookahead symbol.
o Goto Table: Used for transitions after reductions.
5. Construction of Parsing Tables
a. SLR Parsing Table Construction
• Items: LR(0) items (productions with dot positions).
• Follow Sets: Used to decide reduce actions.
• Steps:
1. Compute LR(0) item sets (canonical collection).
2. Compute Follow sets for non-terminals.
3. Construct action and goto tables:
▪ Shift for terminals after dot.
▪ Reduce when dot at end and lookahead in Follow set.
b. Canonical LR (CLR) Parsing Table
• Items: LR(1) items (items with 1 lookahead symbol).
• Lookahead: Distinguishes reduce actions better.
• More powerful: Can parse a broader class of grammars without conflicts.
• Construction:
1. Compute LR(1) item sets.
2. Build action and goto tables using lookahead.
c. LALR Parsing Table
• Idea: Merge LR(1) states with the same core (LR(0) items), combining lookaheads.
• Benefit: Smaller tables like SLR but nearly as powerful as CLR.
• Widely used in practice (e.g., YACC).
• Trade-off: Slight loss of power compared to CLR.
6. Parsing with Ambiguous Grammars
• Ambiguous grammars lead to conflicts in parsing tables.
• Handling ambiguity:
o Modify grammar to remove ambiguity.
o Use precedence and associativity rules.
o Use specialized parsing techniques.
• LR parsers expect unambiguous grammars or conflict resolution strategies.
7. Operator Precedence Parsing
• Purpose: Efficiently parse expressions with operators having precedence and associativity.
• Key Idea:
Use precedence relations (<, =, >) between terminals to guide parsing.
• Precedence Table:
Encodes precedence and associativity rules for operators.
• Method:
o Shift when next input has higher precedence.
o Reduce when next input has lower precedence.
• Limitation:
o Only suitable for operator precedence grammars (a subset of CFGs).
• Advantages:
Simple and efficient for arithmetic expressions.
8. Introduction to Automatic Parser Generators: YACC
• YACC (Yet Another Compiler Compiler):
Tool that generates LR parsers automatically from grammar specifications.
• Input:
Grammar rules with embedded semantic actions.
• Output:
C code implementing an LR parser.
• Features:
o Supports LALR(1) parsing.
o Allows specification of operator precedence and associativity to resolve conflicts.
o Integrates with lexical analyzers like lex.
9. Error Handling in LR Parsers
• Error Detection:
When parser cannot find a valid action for current state and input symbol.
• Error Recovery Strategies:
o Panic mode: Discard input tokens until a synchronizing token (e.g., semicolon) is found.
o Phrase-level recovery: Insert/delete tokens to repair errors and continue parsing.
• YACC Error Handling:
o Provides special error token in grammar rules.
o Allows custom error messages and recovery actions.
• Goal:
Recover from errors gracefully without stopping parsing immediately.
Summary Table of Parsing Methods
Parsing Method Class of Grammar Lookahead Table Size Power Common Use
SLR Subset of LR 1 Small Less powerful Simple LR parsing
CLR (Canonical LR) Full LR 1 Large Most powerful Theoretical basis
LALR Subset of CLR 1 Medium Almost as powerful Widely used (YACC)
Operator Precedence Operator Precedence N/A Small Limited Arithmetic parsing
UNIT 3 Syntax-Directed Translation
• Syntax-Directed Translation (SDT): A technique where translation of input is driven by the syntax structure of
the source program.
• Each grammar rule is associated with semantic actions that compute attributes or generate intermediate code.
• Widely used in compiler design for constructing intermediate representations during parsing.
2. Construction of Syntax Trees
• Syntax Tree (Parse Tree): A tree representation that shows the syntactic structure of the input according to the
grammar.
• Abstract Syntax Tree (AST): Simplified version of the parse tree removing unnecessary syntactic details.
Construction Methods:
• During parsing, semantic actions are invoked to construct the nodes and link subtrees.
• Each node corresponds to a grammar construct (e.g., expressions, statements).
• Leaf nodes represent tokens (identifiers, constants).
3. Attribute Grammars
Attributes are values associated with grammar symbols to hold semantic information.
3.1 S-Attributed Definitions
• Attributes are Synthesized: computed from the attributes of children nodes.
• Suitable for Bottom-Up parsing.
• Example: Evaluating arithmetic expressions from leaves up to the root.
3.2 L-Attributed Definitions
• Attributes can be Synthesized or Inherited:
o Synthesized attributes: computed from children nodes.
o Inherited attributes: computed from parent or left siblings.
• Suitable for Top-Down parsing.
• Allows one-pass translation where attributes flow down or across the parse tree.
• Example: Type checking with context information passed down.
4. Top-Down Translation
• Semantic actions executed during Top-Down parsing (e.g., recursive descent).
• Attributes (especially inherited) are passed as parameters to parsing functions.
• Semantic rules are embedded in parsing procedures or invoked immediately after recognizing grammar
constructs.
• Enables immediate translation and code generation.
5. Intermediate Code Forms
Intermediate code is an abstraction between source and target machine code; easier to optimize and translate.
5.1 Postfix Notation (Reverse Polish Notation)
• Operators follow operands.
• Example: a + b → a b +
• Simple to generate and evaluate using a stack.
• Useful as intermediate representation during syntax-directed translation.
5.2 Directed Acyclic Graph (DAG)
• Used to represent expressions compactly by sharing common subexpressions.
• Nodes represent operators or operands.
• Reduces redundancy in expressions, aiding optimization.
6. Three-Address Code (TAC)
• A common form of intermediate code.
• Each instruction typically contains at most three addresses (operands).
• General format: x = y op z
• Includes assignments, arithmetic, control transfer instructions.
7. TAC for Control Structures
Examples:
• If Statement:
if (B) S1 else S2
Translated to:
ifFalse B goto L1
S1
goto L2
L1: S2
L2:
• While Loop:
L1: ifFalse B goto L2
goto L1
L2:
8. Representing TAC Using Triples and Quadruples
8.1 Triples
• Each instruction is represented as a triple (op, arg1, arg2)
• The result is implicit by the instruction’s position (index).
• Example: (+, t1, t2) at position 3 means t3 = t1 + t2.
8.2 Quadruples
• Explicitly stores the result as a separate field (result, op, arg1, arg2).
• Easier to manipulate and optimize.
• Example: (t3, +, t1, t2)
9. Boolean Expressions and Control Structures
• Boolean expressions control flow in programs.
• Common intermediate representation uses short-circuit evaluation with jump instructions.
Translation Approach:
• Use true and false lists to keep track of jump addresses.
• Backpatching technique fills jump targets later.
Summary Table
Concept Description Use Case
S-Attributed Synthesized attributes only Bottom-up parsing
L-Attributed Synthesized + Inherited attributes Top-down parsing
Postfix Notation Operator after operands Stack-based evaluation
DAG Shared subexpressions Expression optimization
Three-Address Code (TAC) Simple three-address intermediate code General IR
Triples and Quadruples TAC storage formats Easy manipulation
Boolean Expression Short-circuit evaluation with jumps Conditional and loops
UNIT 4 Runtime Environments
• Runtime Environment: The data structures and mechanisms used during the execution of a program to manage
memory, variables, control flow, and function calls.
• Crucial for implementing features like procedure calls, recursion, and block-structured languages.
2. Storage Allocation Strategies
Storage for variables and data structures must be managed during program execution.
2.1 Static Allocation
• Memory allocated at compile time.
• Fixed locations throughout program execution.
• Used for global variables, constants, static data.
• Advantage: Fast access.
• Limitation: No flexibility for dynamic memory needs.
2.2 Stack Allocation (Automatic Allocation)
• Memory allocated and deallocated in LIFO order.
• Used for local variables and activation records.
• Activation records pushed when a function is called and popped when it returns.
• Efficient and supports recursion naturally.
2.3 Heap Allocation (Dynamic Allocation)
• Memory allocated at runtime as needed.
• Used for dynamic data structures like linked lists, trees.
• Managed by heap managers or garbage collectors.
• More flexible but slower than stack allocation.
3. Heap Management
• The heap is a large pool of memory for dynamic allocation.
• Heap management strategies:
3.1 Free List Management
• Keeps track of free blocks using a linked list.
• On allocation, finds a suitable block (first fit, best fit, etc.)
• On deallocation, returns memory to free list.
3.2 Garbage Collection
• Automatically reclaims memory no longer in use.
• Common in languages like Java, Python.
• Techniques: Reference counting, Mark-and-sweep.
4. Activation Records (AR) / Stack Frames
• Data structure used to manage information for each procedure call.
• Stored in the call stack.
Components of Activation Record:
• Return Address: Where to resume after procedure ends.
• Parameters: Arguments passed to the procedure.
• Local Variables: Variables declared inside the procedure.
• Saved Registers: For preserving CPU state.
• Control Link (Dynamic Link): Points to caller's AR.
• Access Link (Static Link): Points to AR of lexically enclosing scope (for non-local access).
5. Accessing Local and Non-Local Names
In block-structured languages, variables can be local or non-local (declared in enclosing scopes).
5.1 Access to Local Variables
• Directly accessed via offsets from Stack Pointer (SP) or Frame Pointer (FP).
5.2 Access to Non-Local Variables
• Static/Access Links: Each activation record contains a pointer to the AR of its lexical parent.
• To access a variable declared in an outer scope:
o Follow static links up the chain until reaching the correct AR.
o Access variable at known offset.
6. Parameter Passing Methods
6.1 Pass by Value
• Copy of actual parameter is passed.
• Changes inside procedure don’t affect the caller’s variable.
6.2 Pass by Reference
• Address of actual parameter is passed.
• Changes inside procedure affect caller’s variable.
6.3 Pass by Value-Result (Copy-In Copy-Out)
• Parameter passed by value.
• On procedure exit, updated value copied back to caller.
6.4 Pass by Name
• Argument expression is re-evaluated each time it is used in the procedure.
• Used in Algol-like languages.
7. Symbol Table Organization
• Symbol table stores information about identifiers (variables, functions, types).
• Used by lexical analyzer, parser, semantic analyzer, code generator.
7.1 Operations on Symbol Tables
• Insert: Add new identifier.
• Lookup: Find identifier details.
• Delete: Remove scope’s identifiers when scope ends.
7.2 Symbol Table Strategies
7.2.1 Linear List
• Simple but inefficient for large programs.
• O(n) lookup time.
7.2.2 Hash Table
• Efficient average-case O(1) lookup.
• Uses hash functions to index symbols.
7.2.3 Tree Structures
• For nested scopes. Each node represents a scope’s symbol table.
• Allows hierarchical lookups.
8. Data Structures Used in Symbol Tables
• Hash Tables: Most commonly used for fast access.
• Linked Lists: For chaining in hash tables or simple scopes.
• Trees (Scope Trees): Represent nested scopes for block-structured languages.
• Stacks: To manage active scopes, push/pop when entering/exiting blocks.
Summary Table
Topic Description Typical Use
Static Allocation Fixed size & location Global variables
Stack Allocation LIFO structure, efficient Local variables, activation records
Heap Allocation Dynamic, flexible Dynamic data structures
Activation Records Stores call info Procedure calls and recursion
Static Link Access non-local variables Block-structured languages
Parameter Passing Various modes (value, reference, name) Procedure argument passing
Symbol Table Organization Efficient lookup of identifiers Compiler symbol management
Data Structures in Symbol Tables Hash tables, trees, stacks Implement symbol tables
UNIT 5 Basic Block, Control Flow Graphs, and Code Optimization
1. Basic Blocks
• Definition: A basic block is a sequence of consecutive statements in a program with:
o Only one entry point (the first statement).
o Only one exit point (the last statement).
o No jumps or jump targets in the middle of the block.
• Purpose: Simplifies analysis and optimization by grouping instructions with linear control flow.
2. Control Flow Graph (CFG)
• Definition: A graph representing all paths that might be traversed through a program during execution.
• Nodes: Basic blocks.
• Edges: Control flow transfers (branches, jumps) between blocks.
3. DAG Representation of Basic Block
• Directed Acyclic Graph (DAG):
o Represents computations within a basic block.
o Nodes correspond to operators and operands.
o Edges represent dependencies.
o Common subexpressions are merged to avoid recomputation.
4. Advantages of DAG
• Eliminates Redundant Computations: Common subexpressions appear once.
• Facilitates Code Optimization: Simplifies dead code elimination and expression simplification.
• Provides a clear view of data dependencies.
5. Sources of Optimization
• Local Optimization: Inside basic blocks (e.g., constant folding, algebraic simplification).
• Global Optimization: Across basic blocks (e.g., common subexpression elimination).
• Loop Optimization: Special techniques to improve loops.
6. Loop Optimization
• Goal: Increase performance by minimizing redundant computations inside loops.
6.1 Loop-Invariant Computation
• Computations that yield the same result each iteration.
• Move such computations outside the loop (loop-invariant code motion).
7. Peephole Optimization
• Definition: Small-scale local optimization examining a few adjacent instructions.
• Examples:
o Removing redundant loads/stores.
o Simplifying sequences like x = x + 0 to x.
• Implemented at the machine code level or intermediate code level.
8. Issues in Code Generator Design
• Register Allocation: Limited number of registers vs. variables.
• Instruction Selection: Mapping intermediate code to machine instructions.
• Instruction Scheduling: Ordering instructions to avoid stalls.
• Handling Different Architectures: Target machine dependencies.
• Preserving Semantics: Ensuring generated code behaves correctly.
9. Simple Code Generator
• Translates intermediate code (e.g., TAC or DAG) into machine code.
• Uses heuristics for register usage.
• Example approach:
o Traverse DAG in postorder.
o Generate code for children first.
o Use available registers or spill to memory.
10. Code Generation from DAG
• Traverse DAG nodes.
• For each operator node:
o Generate code to compute operands (recursively).
o Generate code for the operation.
• Store results in registers or memory.
• Benefits: reduces redundant calculations, efficient code.
11. Machine Independent Optimization
Optimization techniques independent of the target machine.
11.1 Global Data Flow Analysis
• Analyzes how data values propagate through the program.
• Useful for optimization like dead code elimination.
11.2 Constant Propagation
• Replace variables with known constant values.
• Simplifies expressions and reduces computations.
11.3 Liveness Analysis
• Determines which variables are live (needed in the future).
• Important for register allocation and dead code removal.
11.4 Common Subexpression Elimination (CSE)
• Identifies expressions computed multiple times.
• Eliminates redundancy by reusing previously computed values.
Summary Table
Concept Description Purpose/Use
Basic Block Sequence of code with one entry/exit Unit of optimization
Control Flow Graph (CFG) Nodes: Basic blocks; Edges: Control flow Program flow analysis
DAG Representation of expressions in a basic block Optimize computations
Loop Invariant Computation Move invariant code outside loops Improve loop efficiency
Peephole Optimization Local, small window code improvements Clean up generated code
Code Generator Design Issues Register allocation, instruction selection Efficient code generation
Machine Independent Optim. Data flow analysis, constant propagation, etc. Platform-neutral optimization