Unit III
Unit III
Bottom up parsing
Bottom up parsing is also known as shift-reduce parsing.
Example
E→T
T→T*F
T → id
F→T
F → id
Parse Tree representation of input string "id * id" is as follows:
TOP-DOWN PARSING:
Top-down parsing technique parses the input and starts constructing a parse tree from the root
node gradually moving down to the leaf nodes. The types of top-down parsing are depicted
below:
Recursive descent is a top-down parsing technique that constructs the parse tree from the top and
the input is read from left to right. It uses procedures for every terminal and non-terminal entity.
This parsing technique recursively parses the input to make a parse tree, which may or may not
require back-tracking. But the grammar associated with it (if not left factored) cannot avoid
back-tracking. A form of recursive-descent parsing that does not require any backtracking is
known as predictive parsing.
This parsing technique is regarded recursive as it uses context-free grammar which is recursive
in nature.
Back-tracking
Top- down parsers start from the root node (start symbol) and match the input string against the
production rules to replace them (if matched). To understand this, take the following example of
CFG:
S → rXd | rZd
X → oa | ea
Z → ai
For an input string: read, a top-down parser, will behave like this:
It will start with S from the production rules and will match its yield to the left-most letter of the
input, i.e. „r‟. The very production of S (S → rXd) matches with it. So the top-down parser
advances to the next input letter (i.e. „e‟). The parser tries to expand non-terminal „X‟ and checks
its production from the left (X → oa). It does not match with the next input symbol. So the top-
down parser backtracks to obtain the next production rule of X, (X → ea).
Now the parser matches all the input letters in an ordered manner. The string is accepted.
Recursive Descent Parser uses the technique of Top-Down Parsing without backtracking. It can
be defined as a Parser that uses the various recursive procedure to process the input string with
no backtracking. It can be simply performed using a Recursive language. The first symbol of the
string of R.H.S of production will uniquely determine the correct alternative to choose.
The major approach of recursive-descent parsing is to relate each non-terminal with a procedure.
The objective of each procedure is to read a sequence of input characters that can be produced by
the corresponding non-terminal, and return a pointer to the root of the parse tree for the non-
terminal. The structure of the procedure is prescribed by the productions for the equivalent non-
terminal.
The recursive procedures can be simply to write and adequately effective if written in a language
that executes the procedure call effectively. There is a procedure for each non-terminal in the
grammar. It can consider a global variable lookahead, holding the current input token and a
procedure match (Expected Token) is the action of recognizing the next token in the parsing
process and advancing the input stream pointer, such that lookahead points to the next token to
be parsed. Match () is effectively a call to the lexical analyzer to get the next token.
lookahead == a
match()
lookahead == +
match ()
lookahead == b
……………………….
……………………….
In this manner, parsing can be done.
EXAMPLE:
Automata Theory & Compiler Design Mr. P.Krishnamoorthy Page 4
Department of Computer Science and Engineering
Write down the algorithm using Recursive procedures to implement the following Grammar.
E → TE′
E′ → +TE′
T → FT′
T′ →∗ FT′|ε
F → (E)|id
One of major drawback or recursive-descent parsing is that it can be implemented only for those
languages which support recursive procedure calls and it suffers from the problem of left-
recursion.
PREDICTIVE PARSER:
Predictive parser is a recursive descent parser, which has the capability to predict which
production is to be used to replace the input string. The predictive parser does not suffer from
backtracking.
To accomplish its tasks, the predictive parser uses a look-ahead pointer, which points to the next
input symbols. To make the parser back-tracking free, the predictive parser puts some constraints
on the grammar and accepts only a class of grammar known as LL(k) grammar.
Predictive parsing uses a stack and a parsing table to parse the input and generate a parse tree.
Both the stack and the input contains an end symbol $ to denote that the stack is empty and the
input is consumed. The parser refers to the parsing table to take any decision on the input and
stack element combination.
In recursive descent parsing, the parser may have more than one production to choose from for a
single instance of input, whereas in predictive parser, each step has at most one production to
choose. There might be instances where there is no production matching the input string, making
the parsing procedure to fail.
LL Parser
LL parser is denoted as LL(k). The first L in LL(k) is parsing the input from left to right, the
second L in LL(k) stands for left-most derivation and k itself represents the number of look
aheads. Generally k = 1, so LL(k) may also be written as LL(1).
LL Parsing Algorithm
We may stick to deterministic LL(1) for parser explanation, as the size of table grows
exponentially with the value of k. Secondly, if a given grammar is not LL(1), then usually, it is
not LL(k), for any given k.
Input:
string ω
parsing table M for grammar G
Output:
If ω is in L(G) then left-most derivation of ω,
error otherwise.
repeat
let X be the top stack symbol and a the symbol pointed by ip.
if X∈ Vt or $
if X = a
POP X and advance ip.
else
error()
endif
else /* X is non-terminal */
if M[X,a] = X → Y1, Y2,... Yk
POP X
PUSH Yk, Yk-1,... Y1 /* Y1 on top */
Output the production X → Y1, Y2,... Yk
else
error()
endif
endif
until X = $ /* empty stack */
A table-driven predictive parser has an input buffer, a stack, a parsing table, and an output
stream. The input buffer contains the string to be parsed, followed by $, a symbol used as a right
end marker to indicate the end of the input string. The stack contains a sequence of grammar
symbols with $ on the bottom, indicating the bottom of the stack. Initially, the stack contains the
start symbol of the grammar on top of $. The parsing table is a two dimensional array M[A,a]
where A is a non-terminal, and a is a terminal or the symbol $. The parser is controlled by a
program that behaves as follows. The program considers X, the symbol on the top of the stack,
and a, the current input symbol. These two symbols determine the action of the parser. There are
three possibilities.
The parsing table entries are single entries. so, each location has not more than one entry.
This type of grammar is called LL(1) grammar.
Simulation of parser table for the input symbol id+id*id
Since there are more than one production, the grammar is not LL(1) grammar.
4. Parse the given input string using stack and parsing table
BOTTOM-UP PARSER:
Bottom-up parsing starts from the leaf nodes of a tree and works in upward direction till it
reaches the root node. Here, we start from a sentence and then apply production rules in reverse
manner in order to reach the start symbol. The image given below depicts the bottom-up parsers
available.
Handle:
• Informally, a handle of a string is a substring that matches the right side of a production
rule.
– But not every substring matches the right side of a production rule is handle
S A
• If the grammar is unambiguous, then every right-sentential form of the grammar has
exactly one handle.
Handle Pruning:
• Start from n, find a handle Ann in n, and replace n in by An to get n-1.
• Then find a handle An-1n-1 in n-1, and replace n-1 in by An-1 to get n-2.
• Repeat this, until we reach S.
E E+T | T
T T*F | F
F (E) | id
E E+T
E+T*F TF
E+T*id F id
E+F*id TF
E+id*id F id
T+id*id ET
F+id*id TF
id+id*id F id
Shift-Reduce Parsing
Parse Tree:
token = next_token()
repeat forever
s = top of stack
else
error()
There are context-free grammars for which shift-reduce parsers cannot be used.
Stack contents and the next input symbol may not decide action:
shift/reduce conflict: Whether make a shift operation or a reduction.
reduce/reduce conflict: The parser cannot decide which of several reductions to
make.
If a shift-reduce parser cannot be used for a grammar, that grammar is called as non-
LR(k) grammar.
An ambiguous grammar can never be a LR grammar.
LR Parser
The LR parser is a non-recursive, shift-reduce, bottom-up parser. It uses a wide class of context-
free grammar which makes it the most efficient syntax analysis technique. LR parsers are also
known as LR(k) parsers, where L stands for left-to-right scanning of the input stream; R stands
for the construction of right-most derivation in reverse, and k denotes the number of lookahead
symbols to make decisions.
There are three widely used algorithms available for constructing an LR parser:
SLR Parser:
SLR is simple LR. It is the smallest class of grammar having few number of states. SLR is
very easy to construct and is similar to LR parsing. The only difference between SLR parser
and LR(0) parser is that in LR(0) parsing table, there‟s a chance of „shift reduced‟ conflict
because we are entering „reduce‟ corresponding to all terminal states. We can solve this
problem by entering „reduce‟ corresponding to FOLLOW of LHS of production in the
terminating state. This is called SLR(1) collection of items.
Steps for constructing the SLR parsing table :
1. Writing augmented grammar
2. LR(0) collection of items to be found
3. Find FOLLOW of LHS of production
4. Defining 2 functions: goto[list of terminals] and action[list of non-terminals] in the parsing table
EXAMPLE
Construct LR parsing table for the given context-free grammar.
S–>AA
A–>aA|b
Solution:
STEP1 – Find augmented grammar
The augmented grammar of the given grammar is:-
S‟–>.S [0th production]
S–>.AA [1st production]
A–>.aA [2nd production]
A–>.b [3rd production]
STEP2 – Find LR(0) collection of items. Below is the figure showing the LR(0) collection of
items. We will understand everything one by one.
Io goes to I1 when „ . „ of 0th production is shifted towards the right of S(S‟->S.). this state
is the accepted state. S is seen by the compiler.
Io goes to I2 when „ . „ of 1st production is shifted towards right (S->A.A) . A is seen by
the compiler
I0 goes to I3 when „ . „ of the 2nd production is shifted towards right (A->a.A) . a is seen
by the compiler.
I0 goes to I4 when „ . „ of the 3rd production is shifted towards right (A->b.) . b is seen by
the compiler.
STEP 4-
Defining 2 functions:goto[list of non-terminals] and action[list of terminals] in the
parsing table. Below is the SLR parsing table.
S→E
E→E+T|T
T→T*F|F
F → id
Solution:
Add Augment Production and insert '•' symbol at the first position for every production in G
S` → •E
E → •E + T
E → •T
T → •T * F
T → •F
F → •id
I0 State:
Add Augment production to the I0 State and Compute the Closure
I0 = Closure (S` → •E)
Add all productions starting with E in to I0 State because "." is followed by the non-terminal. So,
the I0 State becomes
I0 = S` → •E
E → •E + T
E → •T
Add all productions starting with T and F in modified I0 State because "." is followed by the
non-terminal. So, the I0 State becomes.
I0= S` → •E
E → •E + T
E → •T
T → •T * F
T → •F
F → •id
I1= Go to (I0, E) = closure (S` → E•, E → E• + T)
I2= Go to (I0, T) = closure (E → T•T, T• → * F)
I3= Go to (I0, F) = Closure ( T → F• ) = T → F•
I4= Go to (I0, id) = closure ( F → id•) = F → id•
I5= Go to (I1, +) = Closure (E → E +•T)
Add all productions starting with T and F in I5 State because "." is followed by the non-terminal.
So, the I5 State becomes
I5 = E → E +•T
T → •T * F
T → •F
F → •id
Add all productions starting with F in I6 State because "." is followed by the non-terminal. So,
the I6 State becomes
I6 = T → T * •F
F → •id
Drawing DFA:
CLR refers to canonical lookahead. CLR parsing use the canonical collection of LR (1) items to
build the CLR (1) parsing table. CLR (1) parsing table produces the more number of states as
compare to the SLR (1) parsing.
In the CLR (1), we place the reduce node only in the lookahead symbols.
LR (1) item
The look ahead is used to determine that where we place the final item.
The look ahead always add $ symbol for the argument production.
S → AA
A → aA
A→b
Solution:
Add Augment Production, insert '•' symbol at the first position for every production in G and
also add the lookahead.
S` → •S, $
S → •AA, $
A → •aA, a/b
A → •b, a/b
I0 State:
Add Augment production to the I0 State and Compute the Closure
I0 = Closure (S` → •S)
Add all productions starting with S in to I0 State because "." is followed by the non-terminal. So,
the I0 State becomes
I0 = S` → •S, $
S → •AA, $
Add all productions starting with A in modified I0 State because "." is followed by the non-
terminal. So, the I0 State becomes.
I0= S` → •S, $
S → •AA, $
A → •aA, a/b
A → •b, a/b
I1= Go to (I0, S) = closure (S` → S•, $) = S` → S•, $
I2= Go to (I0, A) = closure ( S → A•A, $ )
Add all productions starting with A in I2 State because "." is followed by the non-terminal. So,
the I2 State becomes
I2= S → A•A, $
A → •aA, $
A → •b, $
I3= Go to (I0, a) = Closure ( A → a•A, a/b )
Add all productions starting with A in I3 State because "." is followed by the non-terminal. So,
the I3 State becomes
I3= A → a•A, a/b
A → •aA, a/b
A → •b, a/b
Go to (I3, a) = Closure (A → a•A, a/b) = (same as I3)
Go to (I3, b) = Closure (A → b•, a/b) = (same as I4)
I4= Go to (I0, b) = closure ( A → b•, a/b) = A → b•, a/b
I5= Go to (I2, A) = Closure (S → AA•, $) =S → AA•, $
I6= Go to (I2, a) = Closure (A → a•A, $)
Add all productions starting with A in I6 State because "." is followed by the non-terminal. So,
the I6 State becomes
I6 = A → a•A, $
A → •aA, $
A → •b, $
Go to (I6, a) = Closure (A → a•A, $) = (same as I6)
Go to (I6, b) = Closure (A → b•, $) = (same as I7)
I7= Go to (I2, b) = Closure (A → b•, $) = A → b•, $
I8= Go to (I3, A) = Closure (A → aA•, a/b) = A → aA•, a/b
I9= Go to (I6, A) = Closure (A → aA•, $) = A → aA•, $
Drawing DFA:
LALR refers to the lookahead LR. To construct the LALR (1) parsing table, we use the canonical
collection of LR (1) items.
In the LALR (1) parsing, the LR (1) items which have same productions but different look ahead
are combined to form a single set of items
LALR (1) parsing is same as the CLR (1) parsing, only difference in the parsing table.
S → AA
A → aA
A→b
Solution:
Add Augment Production, insert '•' symbol at the first position for every production in G and
also add the look ahead.
S` → •S, $
S → •AA, $
A → •aA, a/b
A → •b, a/b
I0 State:
Add Augment production to the I0 State and Compute the ClosureL
I0 = Closure (S` → •S)
Add all productions starting with S in to I0 State because "•" is followed by the non-terminal. So,
the I0 State becomes
I0 = S` → •S, $
S → •AA, $
Add all productions starting with A in modified I0 State because "•" is followed by the non-
terminal. So, the I0 State becomes.
I0= S` → •S, $
S → •AA, $
A → •aA, a/b
A → •b, a/b
I1= Go to (I0, S) = closure (S` → S•, $) = S` → S•, $
I2= Go to (I0, A) = closure ( S → A•A, $ )
Add all productions starting with A in I2 State because "•" is followed by the non-terminal. So,
the I2 State becomes
I2= S → A•A, $
A → •aA, $
A → •b, $
I3= Go to (I0, a) = Closure ( A → a•A, a/b )
Add all productions starting with A in I3 State because "•" is followed by the non-terminal. So,
the I3 State becomes
I3= A → a•A, a/b
A → •aA, a/b
A → •b, a/b
Go to (I3, a) = Closure (A → a•A, a/b) = (same as I3)
Go to (I3, b) = Closure (A → b•, a/b) = (same as I4)
I4= Go to (I0, b) = closure ( A → b•, a/b) = A → b•, a/b
I5= Go to (I2, A) = Closure (S → AA•, $) =S → AA•, $
I6= Go to (I2, a) = Closure (A → a•A, $)
Add all productions starting with A in I6 State because "•" is followed by the non-terminal. So,
the I6 State becomes
I6 = A → a•A, $
A → •aA, $
A → •b, $
Go to (I6, a) = Closure (A → a•A, $) = (same as I6)
Go to (I6, b) = Closure (A → b•, $) = (same as I7)
I7= Go to (I2, b) = Closure (A → b•, $) = A → b•, $
I8= Go to (I3, A) = Closure (A → aA•, a/b) = A → aA•, a/b
I9= Go to (I6, A) = Closure (A → aA•, $) A → aA•, $
If we analyze then LR (0) items of I3 and I6 are same but they differ only in their
lookahead.
I3 = { A → a•A, a/b
A → •aA, a/b
A → •b, a/b
}
I6= { A → a•A, $
A → •aA, $
A → •b, $
}
Clearly I3 and I6 are same in their LR (0) items but differ in their lookahead, so we can
combine them and called as I36.
Drawing DFA:
LL vs. LR Comparison :
LL LR
Starts with the root nonterminal on the stack. Ends with the root nonterminal on the stack.
Builds the parse tree top-down. Builds the parse tree bottom-up.
Continuously pops a nonterminal off the stack Tries to recognize a right-hand side on the stack,
and pushes the corresponding right hand side. pops it, and pushes the corresponding nonterminal.
Reads the terminals when it pops one off the Reads the terminals while it pushes them on the
stack. stack.
Pre-order traversal of the parse tree. Post-order traversal of the parse tree.