Module 2
Module 2
Parse tree
o Parse tree is the graphical representation of symbol. The symbol can be
terminal or non-terminal.
o In parsing, the string is derived using the start symbol. The root of the
parse tree is that start symbol.
o It is the graphical representation of symbol that can be terminals or non-
terminals.
o Parse tree follows the precedence of operators. The deepest sub-tree
traversed first. So, the operator in the parent node has less precedence
over the operator in the sub-tree.
The parse tree follows these points:
o All leaf nodes have to be terminals.
MODULE 2
Automata and Compiler Design
o All interior nodes have to be non-terminals.
o In-order traversal gives original input string.
Example:
Production rules:
1. T= T + T | T * T
2. T = a|b|c
Input: a * b + c
Step 1:
Step 2:
Step 3:
Step 4:
MODULE 2
Automata and Compiler Design
Step 5:
Ambiguity
A grammar is said to be ambiguous if there exists more than one leftmost
derivation or more than one rightmost derivative or more than one parse
tree for the given input string. If the grammar is not ambiguous then it is
called unambiguous.
Example:
2. S = ∈
1. S = aSb | SS
For the string aabb, the above grammar generates two parse trees:
MODULE 2
Automata and Compiler Design
If the grammar has ambiguity then it is not good for a compiler
construction. No method can automatically detect and remove the
ambiguity but you can remove ambiguity by re-writing the whole
grammar without ambiguity.
LL(K) Grammars :
LL(k) grammar is a type of grammar that uses a predictive parsing table
to parse input from left to right. LL(k) stands for "left-to-right, leftmost
derivation" and "k" represents the number of lookahead symbols.
How it works
An LL(k) parser uses a predictive parsing table to parse input.
The table is constructed from a context-free grammar (CFG).
The parser reads the input from left to right, performing leftmost
derivation of the sentence.
The parser uses k tokens of lookahead when parsing a sentence.
LL(k) grammar properties
Every LL(k) grammar is deterministic.
The set of LL(k) languages is contained in the set of LL(k+1) languages.
Not all context-free languages can be recognized by an LL(k) parser.
LL(1) Parsing :
Here the 1st L represents that the scanning of the Input will be done from
the Left to Right manner and the second L shows that in this parsing
technique, we are going to use the Left most Derivation Tree. And finally,
the 1 represents the number of look-ahead, which means how many
symbols you will see when you want to make a decision.
MODULE 2
Automata and Compiler Design
*ε denotes epsilon
Step 1: The grammar satisfies all properties in step 1.
Step 2: Calculate first() and follow().
First Follow
E’ –> { +, ε } { $, ) }
MODULE 2
Automata and Compiler Design
First Follow
+TE’/
ε
T’ –> *FT’/
{ *, ε } { +, $, ) }
ε
id + * ( ) $
E E
–> –>
E
T T
E’ E’
E E
E’ ’ ’
E –> – –
’ +T > >
E’
ε ε
T T T
–> –>
F F
MODULE 2
Automata and Compiler Design
id + * ( ) $
T’ T’
T T
T’ ’ ’
T’
T –> – –
–>
’ *F > >
ε
T’
ε ε
F
F
–>
F –>
(E
id
)
As you can see that all the null productions are put under the Follow set
of that symbol and all the remaining productions lie under the First of that
symbol.
Note: Every grammar is not feasible for LL(1) Parsing table. It may be
possible that one cell may contain more than one production.
Advantages of Construction of LL(1) Parsing Table
Clear Decision-Making: With an LL(1) parsing table, the parser can
decide what to do by looking at just one symbol ahead. This makes it
easy to choose the right rule without confusion or guessing.
Fast Parsing: Since there’s no need to go back and forth or guess the next
step, LL(1) parsing is quick and efficient. This is useful for applications
like compilers where speed is important.
Easy to Spot Errors: The table helps identify errors right away. If the
current symbol doesn’t match any rule in the table, the parser knows
there’s an error and can handle it immediately.
MODULE 2
Automata and Compiler Design
Simple to Implement: Once the table is set up, the parsing process is
straightforward. You just follow the instructions in the table, making it
easier to build and maintain.
Good for Predictive Parsing: LL(1) parsing is often called “predictive
parsing” because the table lets you predict the next steps based on the
input. This makes it reliable for parsing programming languages and
structured data.
Bottom-up Parsers
Bottom-up parsing is a type of syntax analysis method where the parser starts
from the input symbols (tokens) and attempts to reduce them to the start symbol
of the grammar (usually denoted as S). The process involves applying
production rules in reverse, starting from the leaves of the parse tree and
working upwards toward the root.
Start with tokens: The parser begins with the terminal symbols (the input
tokens), which are the leaves of the parse tree.
Shift and reduce: The parser repeatedly applies two actions:
MODULE 2
Automata and Compiler Design
Shift: The next token is pushed onto a stack.
Reduce: A sequence of symbols on the stack is replaced by a non-
terminal according to the production rules of the grammar. This step is
called “reduction,” where the parser replaces the right-hand side of a
production with the left-hand side non-terminal.
Repeat until root: The process of shifting and reducing continues until the
entire input is reduced to the start symbol, indicating the sentence has been
successfully parsed.
Example:
1. E → T
2. T → T * F
3. T → id
4. F → T
5. F → id
input string: “id * id”
SR Parsing
Shift-reduce parsing is a popular bottom-up technique used in syntax analysis,
where the goal is to create a parse tree for a given input based on grammar
rules. The process works by reading a stream of tokens (the input), and then
working backwards through the grammar rules to discover how the input can be
generated.
1. Input Buffer: This stores the string or sequence of tokens that needs to be
parsed.
2. Stack: The parser uses a stack to keep track of which symbols or parts of
the parse it has already processed. As it processes the input, symbols are
pushed onto and popped off the stack.
3. Parsing Table: Similar to a predictive parser, a parsing table helps the
parser decide what action to take next.
4. Shift-reduce parsing works by processing the input left to right and
gradually building up a parse tree by shifting tokens onto the stack and
reducing them using grammar rules, until it reaches the start symbol of
the grammar.
Four Main Operations of Shift Reduce Parser
Shift: Move the next input symbol onto the stack when no reduction is
possible.
Reduce: Replace a sequence of symbols at the top of the stack with the
left-hand side of a grammar rule.
Accept: Successfully complete parsing when the entire input is
processed and the stack contains only the start symbol.
Error: Handle unexpected or invalid input when no shift or reduce action
is possible.
Example
Input Parsing
Stack Buffer Action
(a,(a,a))
$ Shift
$
a,(a,a))
$( Shift
$
Input Parsing
Stack Buffer Action
Reduce L →L,
$ ( L, ( L, S ))$
S
$ ( L, ( L ))$ Shift
Reduce S →
$ ( L, ( L ) )$
(L)
Reduce L → L,
$ ( L, S )$
S
$(L )$ Shift
Reduce S →
$(L) $
(L)
$S $ Accept
LR Parser
LR parser is a bottom-up parser for context-free grammar that is very
generally used by computer programming language compiler and other
associated tools. LR parser reads their input from left to right and
produces a right-most derivation. It is called a Bottom-up parser because
it attempts to reduce the top-level grammar productions by building up
from the leaves. LR parsers are the most powerful parser of all
deterministic parsers in practice.
MODULE 2
Automata and Compiler Design
LR parser algorithm :
LR Parsing algorithm is the same for all the parser, but the parsing table
is different for each parser. It consists following components as follows.
1. Input Buffer –
It contains the given string, and it ends with a $ symbol.
2. Stack –
The combination of state symbol and current input symbol is used to
refer to the parsing table in order to take the parsing decisions.
SLR Parser
LR parser is also called as SLR parser
it is weakest of the three methods but easier to implement
a grammar for which SLR parser can be constructed is called SLR
grammar
Steps for constructing the SLR parsing table
1. Writing augmented grammar
2. LR(0) collection of items to be found
3. Find FOLLOW of LHS of production
4. Defining 2 functions:goto[list of terminals] and action[list of non-
terminals] in the parsing table
CLR Parser
The CLR parser stands for canonical LR parser. It is a more powerful
LR parser. It makes use of lookahead symbols. This method uses a large
set of items called LR(1) items.The main difference between LR(0) and
LR(1) items is that, in LR(1) items, it is possible to carry more
information in a state, which will rule out useless reduction states.This
extra information is incorporated into the state by the lookahead symbol.
The general syntax becomes [A->∝.B, a ]
where A->∝.B is the production and a is a terminal or right end marker$
LR(1) items=LR(0) items + look ahead
MODULE 2
Automata and Compiler Design
LALR Parser
ALR Parser is lookahead LR parser. It is the most powerful parser
which can handle large classes of grammar. The size of CLR parsing
table is quite large as compared to other parsing table. LALR reduces
the size of this table.LALR works similar to CLR. The only difference is
, it combines the similar states of CLR parsing table into one single
state.
The general syntax becomes [A->∝.B, a ]
where A->∝.B is production and a is a terminal or right end marker $
LR(1) items=LR(0) items + look ahead
YACC tool