LR parsing techniques
SLR
Simple LR parsing
Easy to implement, not strong enough
Uses LR(0) items
Canonical LR (or LR method)
Larger parser but powerful
Uses LR(1) items
LALR
Condensed version of canonical LR, often used in practice
May introduce conflicts
Uses LR(1) items
LR parsing
LR ⊃ LL, i.e., can parse more grammars than predictive parsers
Typically, LR parsing works by building an automaton where each
state represents what has been parsed so far and what we hope to parse
in the future.
In other words, states contain productions with dots(as we shall see
shortly).
Such productions are called items
i.e. , it uses a DFA for Shift/Reduce Decisions.
Model of an LR Parser
Behavior of LR Parser
If action[sm,ai] = shift s then push s, and advance input:
(s0 s1 s2 … sm s, ai+1 … an $)
If action[sm,ai] = reduce A and goto[sm-r ,A] = s with r=|| then
pop r symbols, push s:
(s0 s1 s2 … sm-r s, ai ai+1 … an $)
If action[sm,ai] = accept then stop
If action[sm,ai] = error then attempt recovery 5
LR Parsing Algorithm
LR(k) items
Example
SLR-Parsing
SLR parsers build automata where states contain items (a.k.a. LR(0)
items) and reductions are decided based on FOLLOW set information.
Augmented Grammar:
G’ is G with a new production rule S’S where S’ is the new
starting symbol.
When parsing begins, we have not parsed any input at all and we hope
to parse an S. This is represented by S'S.
We will build an SLR table for the augmented grammar.
Sets of LR(0) items will be the states of action and goto table of the
SLR parser.
A collection of sets of LR(0) items (the canonical LR(0) collection) is
the basis for constructing SLR parsers.
SLR-Parsing
To construct LR(0) automaton, we need two functions :
closure(I) to build the states
goto(I,X) to determine its transitions
Start state : closure({[S'S]})
All states are accepting states.
The DFA recognizes the viable prefixes of right-sentential
forms.
Closure of Item sets
The function goto and Canonical set of LR(0) items
First and Follow
FIRST() is a set of the terminal symbols which occur as
first symbols in strings derived from where is any string
of grammar symbols.
if derives to , then is also in FIRST() .
FOLLOW(A) is the set of the terminals which occur
immediately after (i.e. follow) the non-terminal A in the
strings derived from the starting symbol.
a terminal a is in FOLLOW(A) if S Aa
$ is in FOLLOW(A) if S A*
*
CS416 Compiler Design
Compute FIRST for Any String X
If X is a terminal symbol FIRST(X)={X}
If X is a non-terminal symbol and X is a production
rule is in FIRST(X).
If X is a non-terminal symbol and X Y1Y2..Yn is a
production rule if a terminal a in FIRST(Yi) and
is in all FIRST(Yj) for j=1,...,i-1 then a is in
FIRST(X).
if is in all FIRST(Yj) for j=1,...,n
then is in FIRST(X).
If X is FIRST(X)={}
If X is Y1Y2..Yn
if a terminal a in FIRST(Yi) and is in all
FIRST(Yj) for j=1,...,i-1 then a is in
FIRST(X).
if is in all FIRST(Yj) for j=1,...,n
then is in FIRST(X).
CS416 Compiler Design
FIRST Example
E TE’
E’ +TE’ |
T FT’
T’ *FT’ |
F (E) | id
FIRST(F) = {(,id} FIRST(TE’) = {(,id}
FIRST(T’) = {*, } FIRST(+TE’ ) = {+}
FIRST(T) = {(,id} FIRST() = {}
FIRST(E’) = {+, } FIRST(FT’) = {(,id}
FIRST(E) = {(,id} FIRST(*FT’) = {*}
FIRST() = {}
FIRST((E)) = {(}
FIRST(id) = {id}
CS416 Compiler Design
Compute FOLLOW (for non-terminals)
If S is the start symbol $ is in FOLLOW(S)
if A B is a production rule
everything in FIRST() is FOLLOW(B) except
If ( A B is a production rule ) or
( A B is a production rule and is in FIRST() )
everything in FOLLOW(A) is in FOLLOW(B).
We apply these rules until nothing more can be added to any
follow set.
CS416 Compiler Design
FOLLOW Example
E TE’
E’ +TE’ |
T FT’
T’ *FT’ |
F (E) | id
FOLLOW(E) = { $, ) }
FOLLOW(E’) = { $, ) }
FOLLOW(T) = { +, ), $ }
FOLLOW(T’) = { +, ), $ }
FOLLOW(F) = {+, *, ), $ }
CS416 Compiler Design
SLR-parsing table construction
SLR-parsing
An LR parser using SLR(1) parsing tables for a grammar G
is called as the SLR(1) parser for G.
If a grammar G has an SLR(1) parsing table, it is called
SLR(1) grammar (or SLR grammar in short).
Example :
1) S → (S) S
2) S → Є
Corresponding DFA and SLR-Parsing table given next
slide.
Parse the string ( ) .
SAA
A aA|b
Corresponding DFA is given, make SLR Parsing table
yourself and parse the string aabb
Exercise
Consider the augmented expression grammar :
E’ E
E E+T | T
TT*F|F
F (E) | id
Parse the string id*id using SLR-parsing.
(LR(0) automaton construction is done for you)
LR(0) Automaton for expression grammar
Conflicts in LR parsing
There are two types of conflicts in LR parsing:
shift/reduce
On some particular lookahead, it is possible to shift or reduce
reduce/reduce
This occurs when a state contains more than one handle that
may be reduced on the same lookahead.
SLR Parsing and ambiguity
shift/reduce conflict
Conflicts in SLR parsing
The conflict occurred because we made a decision about
when to reduce based on what token may follow a non-
terminal at any time.
However, the fact that a token t may follow a non-terminal
N in some derivation does not necessarily imply that t will
follow N in some other derivation.
SLR parsing does not make a distinction.
If the SLR parsing table of a grammar G has a conflict, we
say that that grammar is not SLR grammar.
Every SLR grammar is unambiguous, but every
unambiguous grammar is not a SLR grammar.
Conflicts in SLR parsing
Solution : instead of using general FOLLOW information,
try to keep track of exactly what tokens may follow a non-
terminal in each possible derivation and perform reductions
based on that knowledge.
Save this information in the states.
This gives rise to LR(1) items:
items where we also save the possible lookaheads.
LR(1) parsing uses lookahead to avoid unnecessary
conflicts in parsing table.
SLR vs LR(1)
LR(1) Items
Closure function for LR(1) Items
Goto function for LR(1) Items
Construction of The Canonical LR(1) Collection
Construction of LR(1) Parsing Tables
Example
Consider the following augmented grammar:
S’ S
S CC (1)
C cC (2) | d (3)
Show parse of string ccdcd using Canonical LR(1)
parsing(Exercise).
Example(Contd.): goto graph
Example(Contd.): Canonical parsing table
Another Example
Consider the grammar given in (Slide 20)
Goto graph is given below :
LALR parsing
LALR stands for LookAhead LR.
LALR parsers are often used in practice because LALR
parsing tables are smaller than LR(1) parsing tables.
The number of states in SLR and LALR parsing tables for a
grammar G are equal.
But LALR parsers recognize more grammars than SLR
parsers.
yacc creates a LALR parser for the given grammar.
A state of LALR parser will be again a set of LR(1) items.
Motivation: Try to combine efficiency of SLR parser with
power of canonical method.
Core : Set of LR(0) items corresponding to a set of LR(1)
items.
LALR table construction
LALR Parsing: Example
S’ S
S CC (1)
C cC (2) | d (3)
Corresponding LALR(1) parsing table
Exercise : Draw the LALR(1) parsing table for grammar in
(slide 19).
Conflicts in LALR(1) parsing
LALR(1) parsing combines CLR(1) states to reduce
table size
• Less powerful than CLR(1)
– Will not introduce shift-reduce conflicts, because
shifts do not use lookaheads
– May introduce reduce-reduce conflicts, but
seldom do so for grammars of programming
languages
If no conflict is introduced, the grammar is LALR(1)
grammar.
Conflicts in LALR(1) parsing: Example
Consider the augmented grammar given below :
Generates reduce-reduce conflict during shrink process (i.e.
replacing those sets having same cores with a single set
which is their union)
Error recovery in LR Parsing
A canonical LR parser (LR(1) parser) will never make even
a single reduction before announcing an error.
The SLR and LALR parsers may make several reductions
before announcing an error.
But, all LR parsers (LR(1), LALR(1) and SLR(1) parsers)
will never shift an erroneous input symbol onto the stack.
Using Ambiguous Grammars
All grammars used in the construction of LR-parsing tables
must be unambiguous.
Can we create LR-parsing tables for ambiguous grammars ?
Yes, but they will have conflicts.
We can resolve these conflicts in favor of one of them to disambiguate
the grammar.
At the end, we will have again an unambiguous grammar.
Why we want to use an ambiguous grammar?
Some of the ambiguous grammars are much natural, and a
corresponding unambiguous grammar can be very complex.
Usage of an ambiguous grammar may eliminate unnecessary reductions.
Ex. E E+T |
T
E E+E | E*E | (E) | id T T*F |
F
F (E) |id
Sets of LR(0) Items for Ambiguous Grammar
I0: E’
E
..E+E
E E I1: E’ E
E E +E
.. +
..
E E+E
.
I4: E E + E
(
E
.. .
I7: E E+E + I4
E E +E * I
E .E*E E E *E . E E*E I2 E E *E 5
E
E
..(E)
id
* ..
E (E)
E id
id
I3
(
I : E E *.E
(
I2: E ( ..E+E
5
E .E+E (
E
.. .
I8: E E*E + I4
E .E*E
E)
id I2 E E +E * I
E
E .(E)
E E *E 5
E .E*E E .id
I3
id
E
E
..(E)
id
E
I : E (E.) .
id ) I9: E (E)
E E.+E
6
+
.
I3: E id E E.*E * I4
I5
SLR-Parsing Tables for Ambiguous Grammar
FOLLOW(E) = { $,+,*,) }
State I7 has shift/reduce conflicts for symbols + and *.
E + E
I0 I1 I4 I7
when current token is +
shift + is right-associative
reduce + is left-associative
when current token is *
shift * has higher precedence than +
reduce + has higher precedence than *
SLR-Parsing Tables for Ambiguous Grammar
FOLLOW(E) = { $,+,*,) }
State I8 has shift/reduce conflicts for symbols + and *.
E * E
I0 I1 I5 I7
when current token is *
shift * is right-associative
reduce * is left-associative
when current token is +
shift + has higher precedence than *
reduce * has higher precedence than +
SLR-Parsing Tables for Ambiguous Grammar
States Action Goto
id + * ( ) $ E
0 s3 s2 1
1 s4 s5 acc
2 s3 s2 6
3 r4 r4 r4 r4
4 s3 s2 7
5 s3 s2 8
6 s4 s5 s9
7 r1 s5 r1 r1
8 r2 r2 r2 r2
9 r3 r3 r3 r3