CD Module3
CD Module3
YACC
o YACC stands for Yet Another Compiler Compiler.
o YACC provides a tool to produce a parser for a given grammar.
o YACC is a program designed to compile a LALR (1) grammar.
o It is used to produce the source code of the syntactic analyzer of the language
produced by LALR (1) grammar.
o The input of YACC is the rule or grammar and the output is a C program.
These are some points about YACC:
Input: A CFG- file.y
Output: A parser y.tab.c (yacc)
o The output file "file.output" contains the parsing tables.
o The file "file.tab.h" contains declarations.
o The parser called the yyparse ().
o Parser expects to use a function called yylex () to get tokens.
The basic operational sequence is as follows:
This file contains the desired grammar in YACC format.
C Compiler
Executable file that will parse grammar given in gram.Y
Global Correction:
We often want such a compiler that makes very few changes in processing an incorrect
input string to the correct input string. Given an incorrect input string x and grammar G, the
algorithm itself can find a parse tree for a related string y (Expected output string); such
that a number of insertions, deletions, and changes of token require to transform x into y is
as low as possible. Global correction methods increase time & space requirements at
parsing time. This is simply a theoretical concept.
Symbol Table:
In semantic errors, errors are recovered by using a symbol table for the corresponding
identifier and if data types of two operands are not compatible, automatically type
conversion is done by the compiler.
Parser uses a CFG(Context-free-Grammar) to validate the input string and produce output
for the next phase of the compiler. Output could be either a parse tree or an abstract syntax
tree. Now to interleave semantic analysis with the syntax analysis phase of the compiler, we
use Syntax Directed Translation.
Conceptually, with both syntax-directed definition and translation schemes, we parse the
input token stream, build the parse tree, and then traverse the tree as needed to evaluate the
semantic rules at the parse tree nodes. Evaluation of the semantic rules may generate code,
save information in a symbol table, issue error messages, or perform any other activities.
The translation of the token stream is the result obtained by evaluating the semantic rules.
Definition
Syntax Directed Translation has augmented rules to the grammar that facilitate semantic
analysis. SDT involves passing information bottom-up and/or top-down to the parse tree in
form of attributes attached to the nodes. Syntax-directed translation rules use 1) lexical
values of nodes, 2) constants & 3) attributes associated with the non-terminals in their
definitions.
E -> E+T | T
T -> T*F | F
F -> INTLIT
For understanding translation rules further, we take the first SDT augmented to [ E ->
E+T ] production rule. The translation rule in consideration has val as an attribute for both
the non-terminals – E & T. Right-hand side of the translation rule corresponds to attribute
values of the right-side nodes of the production rule and vice-versa. Generalizing, SDT are
augmented rules to a CFG that associate 1) set of attributes to every node of the grammar
and 2) a set of translation rules to every production rule using attributes, constants, and
lexical values.
Let’s take a string to see how semantic analysis happens – S = 2+3*4. Parse tree
corresponding to S would be
To evaluate translation rules, we can employ one depth-first search traversal on the parse
tree. This is possible only because SDT rules don’t impose any specific order on evaluation
until children’s attributes are computed before parents for a grammar having all synthesized
attributes. Otherwise, we would have to figure out the best-suited plan to traverse through
the parse tree and evaluate all the attributes in one or more traversals. For better
understanding, we will move bottom-up in the left to right fashion for computing the
translation rules of our example.
The above diagram shows how semantic analysis could happen. The flow of information
happens bottom-up and all the children’s attributes are computed before parents, as
discussed above. Right-hand side nodes are sometimes annotated with subscript 1 to
distinguish between children and parents.
Additional Information
Synthesized Attributes are such attributes that depend only on the attribute values of
children nodes.
Thus [ E -> E+T { E.val = E.val + T.val } ] has a synthesized attribute val corresponding to
node E. If all the semantic attributes in an augmented grammar are synthesized, one depth-
first search traversal in any order is sufficient for the semantic analysis phase.
Inherited Attributes are such attributes that depend on parent and/or sibling’s attributes.
Thus [ Ep -> E+T { Ep.val = E.val + T.val, T.val = Ep.val } ], where E & Ep are same
production symbols annotated to differentiate between parent and child, has an inherited
attribute val corresponding to node T.
Separation of concerns: SDT separates the translation process from the parsing process,
making it easier to modify and maintain the compiler. It also separates the translation
concerns from the parsing concerns, allowing for more modular and extensible compiler
designs.
Efficient code generation: SDT enables the generation of efficient code by optimizing the
translation process. It allows for the use of techniques such as intermediate code generation
and code optimization.
Inflexibility: SDT can be inflexible in situations where the translation rules are complex
and cannot be easily expressed using grammar rules.
Limited error recovery: SDT is limited in its ability to recover from errors during the
translation process. This can result in poor error messages and may make it difficult to
locate and fix errors in the input program.
Syntax Directed Definitions (SDD)
Syntax Directed Definitions (SDD) are formal methods of attaching semantic information
to the syntactic structure of a programming language. SDDs improve the means of context-
free high-level by instances, adding a semantic rule set for every production of the
grammar. The rules described in these definitions state how to derive values such as types,
memory locations, or fragments of code from the structure of an input object.
The semantic actions are normally included in the grammatical rules and can be either
performed synchronously or asynchronously with the parsing actions. Integrated
development systems such as SDT offer tools to compilers such as code generation, lexical
analysis, evaluation of expressions, definition, grammar, and checking of types among
other services.
The parse tree containing the values of attributes at each node for given input string is
called annotated or decorated parse tree.
Features of Annotated Parse Trees
Types of Attributes
1. Synthesized Attributes: These are those attributes which derive their values from their
children nodes i.e. value of synthesized attribute at node is computed from the values of
attributes at children nodes in parse tree.
Example:
Write the SDD using appropriate semantic rules for each production in given
grammar.
The annotated parse tree is generated and attribute values are computed in bottom
up manner.The value obtained at root node is the final output.
S --> E
E --> E1 + T
E --> T
T --> T1 * F
T --> F
F --> digit
Let us assume an input string 4 * 5 + 6 for computing synthesized attributes. The annotated
parse tree for the input string is
For computation of attributes we start from leftmost bottom node. The rule F –> digit is
used to reduce digit to F and the value of digit is obtained from lexical analyzer which
becomes value of F i.e. from semantic action F.val = digit.lexval. Hence, F.val = 4 and
since T is parent node of F so, we get T.val = 4 from semantic action T.val = F.val. Then,
for T –> T1 * F production, the corresponding semantic action is T.val = T1.val * F.val .
Hence, T.val = 4 * 5 = 20
Similarly, combination of E1.val + T.val becomes E.val i.e. E.val = E1.val + T.val = 26.
Then, the production S –> E is applied to reduce E.val = 26 and semantic action associated
with it prints the result E.val . Hence, the output will be 26.
2. Inherited Attributes: These are the attributes which derive their values from their
parent or sibling nodes i.e. value of inherited attributes are computed by value of parent or
sibling nodes.
Example:
The annotated parse tree is generated and attribute values are computed in top down
manner.
S --> T L
T --> int
T --> float
T --> double
L --> L1, id
L --> id
Let us assume an input string int a, c for computing inherited attributes. The annotated
parse tree for the input string is
The value of L nodes is obtained from T.type (sibling) which is basically lexical value
obtained as int, float or double. Then L node gives type of identifiers a and c. The
computation of type is done in top down manner or preorder traversal. Using function
Enter_type the type of identifiers a and c is inserted in symbol table at corresponding
id.entry.
1. Leaf Nodes:
2. Bottom-Up Evaluation:
o T → T * F for 5 * 2: T.val = 5 * 2 = 10
Types of SDDs
2. Inherited SDD: Uses inherited attributes, which may require additional passes or
dependencies on parent and sibling attributes.
Advantages of SDDs
Modularity: Attributes and semantic rules are associated directly with grammar
productions, making it easier to modify and extend language features.
Efficiency: Synthesized SDDs can often be evaluated in a single pass through the
parse tree, providing efficient evaluation.
2. Inherited Attributes: Computed based on the values from parent or sibling nodes in
the parse tree. These may require specific orders to ensure that all dependencies are
satisfied.+
Each edge between nodes indicates a dependency, where the target node’s attribute
depends on the source node’s attribute.
For evaluation, nodes should be computed in an order that respects this dependency graph:
If the graph is acyclic (no cycles), the attributes can be evaluated in a topological
order.
If there are cyclic dependencies, the SDD may not have a well-defined evaluation
order, and it might require additional rewriting or adjustments to resolve
dependencies.
Identify an ordering of nodes where each attribute is evaluated only after all attributes
it depends on have been evaluated.
Topological order is often applied for attribute evaluation in parse trees for
synthesized-only SDDs or acyclic SDDs.
Attributes are evaluated in a bottom-up pass from the leaves of the parse tree to the
root.
Each node’s synthesized attribute can be computed using its children’s attributes,
which have already been evaluated.
Example:
When using inherited attributes (in L-Attributed Definitions), a more careful approach is
required:
Evaluation often requires a left-to-right traversal along each level of the tree to
ensure inherited attributes are available when needed.
Consider a simple expression grammar where each variable in an expression must inherit the
type from its surrounding context.
E → T E'
E' → + T E' | ε
T → num
Example Walkthrough
Suppose we want to evaluate the expression 3 + 4 * 5. Here’s how each approach
would handle it:
Using SDD
1. Parse 3 + 4 * 5 and compute values using the SDD rules.
2. The values of attributes would be computed without any translation output, simply
resulting in a final value of 3 + (4 * 5) = 23.
Using SDT
1. Parse 3 + 4 * 5.
2. Execute translation actions to output postfix notation while computing values.
o After parsing 3, it outputs 3.
o After parsing 4 * 5, it outputs 4 5 *.
o Then it outputs 3 4 5 * +, which is the postfix representation of 3 + 4 * 5.
Differences
SDD (Syntax-Directed
Aspect SDT (Syntax-Directed Translation)
Definition)
Defines attribute relationships Embeds actions directly within grammar
Purpose
without immediate actions rules for immediate execution
Focus Attribute computation Immediate translation actions
Generates intermediate code or
Output Attribute values only
representation
Example Computes E.val = 23 for 3 + 4 Generates postfix notation: 3 4 5 * +
Output * 5
3. In short, SDD focuses on specifying what needs to be computed, while SDT specifies
how and when to compute and translate as part of parsing.
1. Expression Evaluation
Example:
For an expression E → E1 + T, the SDT can evaluate E.val = E1.val + T.val directly
during parsing.
2. Type Checking
SDT enables type checking by attaching semantic rules to the grammar that enforce
type compatibility.
The SDT can detect errors such as incompatible type assignments or operations that
violate type rules.
Example:
For the production E → E1 + T, SDT can include a rule to check if E1 and T have
compatible types before performing addition.
If incompatible types are detected, the SDT generates an error message instead of
continuing.
Example:
t1 = E1
t2 = T
t3 = t1 + t2
4. Code Optimization
During intermediate code generation, SDT can also be used to perform basic
optimizations like constant folding (evaluating constant expressions at compile time),
and eliminating unnecessary temporary variables.
For example, if an SDT detects a constant expression like 3 * 4, it could replace it
with 12 in the intermediate code.
SDT can handle the translation of control flow statements like if, while, and for
loops, which require branching in the generated code.
SDT rules can generate the labels and jump statements required to implement the
control flow in intermediate code.
Example:
For an if statement:
if (E) S1 else S2
S1.code
goto L2
L1: S2.code
L2:
6. Memory Management
SDT can aid in memory allocation for variables by managing the symbol table and
generating addresses for each variable.
It helps in keeping track of variable scopes and allocating appropriate memory offsets,
which is essential for code generation.
Example:
For a variable declaration int a;, SDT might assign an address or offset in the symbol
table for a, which can be used later during code generation.
SDT helps in semantic error detection and reporting by including checks for various
conditions, such as undeclared variables or incompatible types, during translation.
Syntax errors can often be detected at the parsing stage, but semantic errors are
handled by SDT.
Example:
For a production S → id = E, SDT can check if id has been declared in the current
scope before generating code for the assignment.
SDT can handle the translation of declarations, arrays, structures, and other data
constructs by assigning memory offsets, generating type information, and handling
access patterns.
This ensures the correct memory layout and alignment, which are essential for correct
code execution.
Example:
For an array declaration like int arr[10];, SDT can allocate a contiguous block of
memory and compute the offset for each element access.
SDT can manage function calls, parameter passing, and return values, which
involve saving the current state, handling scope changes, and managing function
arguments and return values.
SDT generates the necessary code to handle function prologues and epilogues, and
ensures arguments are passed correctly (e.g., by value or by reference).
Example:
For a function call f(a, b), SDT can generate code that pushes a and b onto the stack,
calls f, and retrieves the return value.
Example:
Summary
Expression evaluation
Type checking
Intermediate code generation
Code Optimization
Control flow handling
Memory management
Error detection
Translation of declaration and Data Structure
Function and procedure handling
Object-oriented constructs
SDT is at the core of the translation phase in compilers, turning high-level language
syntax into structured, optimized, and executable intermediate representations or machine
code.
SDTs are useful for implementing compilers or interpreters as they allow code to be
generated, attributes to be evaluated, and actions to be performed in a specific order
during parsing. SDTs are often used to generate intermediate code, perform type
checking, or construct syntax trees.
SDTs can be classified based on where the semantic actions are positioned relative to
grammar symbols in a production rule:
The choice of SDT type depends on whether top-down or bottom-up parsing is used.
Examples of Syntax Directed Translation Schemes
Here are some examples of SDTs for various types of translation tasks:
For arithmetic expressions, the SDT can evaluate expressions as it parses them by using
postfix notation. Here’s an example:
Grammar:
E → E + T { print('+') }
E→T
T → T * F { print('*') }
T→F
F → (E)
F → num { print(num.val) }
Explanation:
1. This SDT generates postfix code for expressions. For instance, for the expression 3 +
5 * 2, the output would be 3 5 2 * +.
2. Semantic actions are executed in a postfix order, meaning they are executed after
parsing the symbols on the right-hand side of each production.
Suppose we want to generate three-address code (TAC) for expressions with the following
grammar:
Grammar:
E → T { E.place = T.place; }
T → F { T.place = F.place; }
Explanation:
t1 = b + c
t2 = a * t1
For languages with basic type checking, an SDT can enforce type rules as part of the parsing
process.
Grammar:
E → E + T { if (E1.type == int && T.type == int) E.type = int; else E.type = float; }
E → T { E.type = T.type; }
In bottom-up parsing (used by LR parsers), SDTs typically use postfix actions that execute
once the parser has recognized the right-hand side of a production. This approach is suited to
synthesized attributes.
For example:
E → E + T { print('+') }
E→T
Explanation:
The semantic action { print('+') } will execute after parsing E + T completely, making
it suitable for postfix operations.
In top-down parsing (used by LL parsers), SDTs may place actions at the beginning (prefix)
or interspersed within the production. Inherited attributes can be handled more naturally in
top-down parsing.
Explanation:
Here, E.in and T.in are inherited attributes passed down the parse tree to set or check
types.
Attached to grammar as
Semantic Rules Embedded in grammar rules as actions.
attribute rules.
Determined by dependency
Execution Order Procedural; follows parsing sequence.
graph.
Type Checking: Enforces type rules and consistency within expressions and
assignments.
Error Checking and Reporting: Provides semantic error checks, such as undeclared
variables or type mismatches.
Syntax Tree Construction: Builds syntax trees for further analysis or optimization.
Code Generation for Control Structures: Generates code for if-else, while, for
loops, etc.
L-attributed SDDs (Syntax Directed Definitions) are a specific type of SDD in which
attributes are evaluated following a left-to-right traversal of the parse tree. These SDDs allow
both synthesized attributes (computed based on the attributes of children nodes) and inherited
attributes (computed based on the attributes of parent or left sibling nodes). L-attributed
SDDs are especially suitable for top-down parsing and can be implemented in one left-to-
right pass if the attribute dependencies are set up correctly.
1. Attributes of the parent or left siblings (i.e., symbols that appear to the left in
the production rule).
2. No circular dependencies.
E → T E'
In this example:
1. E' inherits the partial result from the preceding computation through E'.inh, which is
based on the left sibling’s value.
2. This can be evaluated in a single left-to-right pass by ensuring that E'.inh is computed
before evaluating the rest of E'.
Embed Semantic Rules: For each grammar production, write rules to compute
inherited and synthesized attributes.
Determine the Evaluation Order: Ensure that inherited attributes of a symbol are
computed before any dependent symbols are evaluated in each production. For top-
down parsers, this evaluation typically happens as the parser moves from left to right
within a rule.
Consider translating an expression into postfix notation. Here’s how we might structure this
in an L-attributed SDD.
Explanation
1. Attributes:
1. inh: Inherited attribute that passes context from parent to child or from left to
right within a production.
2. syn: Synthesized attribute that combines results from the children and is
returned to the parent.
2. Semantic Rules:
2. E' and T' inherit values from left siblings and pass them to the next sibling or
back to the parent.
2. Each production appends its results based on the rules specified in the
attributes.
Goal
Grammar
1. E → E + T
2. E → T
3. T → T * F
4. T → F
5. F → (E)
6. F → id
Where:
E represents an expression.
T represents a term.
F represents a factor.
Attributes
1. Synthesized Attribute:
2. Inherited Attribute:
o T.inh, F.inh: These attributes store intermediate values needed for evaluation
(e.g., results from previous operations).
o tempCount: Used to generate unique temporary variable names (e.g., t1, t2,
…).
Semantic Rules
We'll add semantic rules to each production to translate the expression into three-
address code.
1. E → E1 + T
o E.place = newTemp()
2. E → T
o E.place = T.place
3. T → T1 * F
o T.place = newTemp()
4. T → F
o T.place = F.place
5. F → ( E )
o F.place = E.place
6. F → id
o F.place = lookup(id)
newTemp() is a function that generates a new temporary variable (e.g., t1, t2).
Example Translation
Derivation Steps
1. Start: E → E + T
2. Rewrite: E → T + T
3. Rewrite: E → F + T
4. Rewrite: E → id + T
5. Rewrite: T → T * F
6. Rewrite: T → F * F
7. Rewrite: F → id
2. TAC for 4 * 5: t3 = t1 * t2
Tac = t0 + t3
Need for User defined functions, a multifunction program- Elements of user
defined functions Definition of Functions- Return values and their Types-
Function Calls-Function Declaration Category of functions- Nesting of
functions –Recursion.