Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
336 views50 pages

LR Parsing Methods

The document discusses different types of LR parsing techniques including SLR, canonical LR, and LALR parsing. SLR parsing uses LR(0) items and is easy to implement but not very powerful. Canonical LR or LR(1) parsing uses a larger parser but is more powerful as it uses LR(1) items. LALR parsing is a condensed version of canonical LR that is often used in practice, though it may introduce conflicts since it uses LR(1) items.

Uploaded by

Bhabatosh Sinha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
336 views50 pages

LR Parsing Methods

The document discusses different types of LR parsing techniques including SLR, canonical LR, and LALR parsing. SLR parsing uses LR(0) items and is easy to implement but not very powerful. Canonical LR or LR(1) parsing uses a larger parser but is more powerful as it uses LR(1) items. LALR parsing is a condensed version of canonical LR that is often used in practice, though it may introduce conflicts since it uses LR(1) items.

Uploaded by

Bhabatosh Sinha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 50

LR parsing techniques

 SLR
 Simple LR parsing
 Easy to implement, not strong enough
 Uses LR(0) items
 Canonical LR (or LR method)
 Larger parser but powerful
 Uses LR(1) items
 LALR
 Condensed version of canonical LR, often used in practice
 May introduce conflicts
 Uses LR(1) items
LR parsing
 LR ⊃ LL, i.e., can parse more grammars than predictive parsers
 Typically, LR parsing works by building an automaton where each
state represents what has been parsed so far and what we hope to parse
in the future.
 In other words, states contain productions with dots(as we shall see
shortly).
 Such productions are called items
 i.e. , it uses a DFA for Shift/Reduce Decisions.
Model of an LR Parser
Behavior of LR Parser

If action[sm,ai] = shift s then push s, and advance input:


(s0 s1 s2 … sm s, ai+1 … an $)

If action[sm,ai] = reduce A   and goto[sm-r ,A] = s with r=|| then


pop r symbols, push s:
(s0 s1 s2 … sm-r s, ai ai+1 … an $)

If action[sm,ai] = accept then stop

If action[sm,ai] = error then attempt recovery 5


LR Parsing Algorithm
LR(k) items
Example
SLR-Parsing

 SLR parsers build automata where states contain items (a.k.a. LR(0)
items) and reductions are decided based on FOLLOW set information.
 Augmented Grammar:
G’ is G with a new production rule S’S where S’ is the new
starting symbol.
 When parsing begins, we have not parsed any input at all and we hope
to parse an S. This is represented by S'S.
 We will build an SLR table for the augmented grammar.
 Sets of LR(0) items will be the states of action and goto table of the
SLR parser.
 A collection of sets of LR(0) items (the canonical LR(0) collection) is
the basis for constructing SLR parsers.
SLR-Parsing

 To construct LR(0) automaton, we need two functions :


 closure(I) to build the states
 goto(I,X) to determine its transitions

Start state : closure({[S'S]})


All states are accepting states.

The DFA recognizes the viable prefixes of right-sentential


forms.
Closure of Item sets
The function goto and Canonical set of LR(0) items
First and Follow

 FIRST() is a set of the terminal symbols which occur as


first symbols in strings derived from  where  is any string
of grammar symbols.
 if  derives to , then  is also in FIRST() .

 FOLLOW(A) is the set of the terminals which occur


immediately after (i.e. follow) the non-terminal A in the
strings derived from the starting symbol.
 a terminal a is in FOLLOW(A) if S  Aa
 $ is in FOLLOW(A) if S  A*
*

CS416 Compiler Design


Compute FIRST for Any String X

 If X is a terminal symbol  FIRST(X)={X}


 If X is a non-terminal symbol and X   is a production
rule   is in FIRST(X).
 If X is a non-terminal symbol and X  Y1Y2..Yn is a
production rule  if a terminal a in FIRST(Yi) and 
is in all FIRST(Yj) for j=1,...,i-1 then a is in
FIRST(X). 
if  is in all FIRST(Yj) for j=1,...,n
then  is in FIRST(X).
 If X is   FIRST(X)={}
 If X is Y1Y2..Yn
 if a terminal a in FIRST(Yi) and  is in all
FIRST(Yj) for j=1,...,i-1 then a is in
FIRST(X). 
if  is in all FIRST(Yj) for j=1,...,n
then  is in FIRST(X).
CS416 Compiler Design
FIRST Example

E  TE’
E’  +TE’ | 
T  FT’
T’  *FT’ | 
F  (E) | id

FIRST(F) = {(,id} FIRST(TE’) = {(,id}


FIRST(T’) = {*, } FIRST(+TE’ ) = {+}
FIRST(T) = {(,id} FIRST() = {}
FIRST(E’) = {+, } FIRST(FT’) = {(,id}
FIRST(E) = {(,id} FIRST(*FT’) = {*}
FIRST() = {}
FIRST((E)) = {(}
FIRST(id) = {id}

CS416 Compiler Design


Compute FOLLOW (for non-terminals)

 If S is the start symbol  $ is in FOLLOW(S)

 if A  B is a production rule


 everything in FIRST() is FOLLOW(B) except 

 If ( A  B is a production rule ) or
( A  B is a production rule and  is in FIRST() )
 everything in FOLLOW(A) is in FOLLOW(B).

We apply these rules until nothing more can be added to any


follow set.

CS416 Compiler Design


FOLLOW Example

E  TE’
E’  +TE’ | 
T  FT’
T’  *FT’ | 
F  (E) | id

FOLLOW(E) = { $, ) }
FOLLOW(E’) = { $, ) }
FOLLOW(T) = { +, ), $ }
FOLLOW(T’) = { +, ), $ }
FOLLOW(F) = {+, *, ), $ }

CS416 Compiler Design


SLR-parsing table construction
SLR-parsing

 An LR parser using SLR(1) parsing tables for a grammar G


is called as the SLR(1) parser for G.
 If a grammar G has an SLR(1) parsing table, it is called
SLR(1) grammar (or SLR grammar in short).
 Example :
1) S → (S) S
2) S → Є
Corresponding DFA and SLR-Parsing table given next
slide.
 Parse the string ( ) .
 SAA
 A aA|b
 Corresponding DFA is given, make SLR Parsing table
yourself and parse the string aabb
Exercise

Consider the augmented expression grammar :


E’  E
E  E+T | T
TT*F|F
F  (E) | id
Parse the string id*id using SLR-parsing.
(LR(0) automaton construction is done for you)
LR(0) Automaton for expression grammar
Conflicts in LR parsing

 There are two types of conflicts in LR parsing:


 shift/reduce
 On some particular lookahead, it is possible to shift or reduce

 reduce/reduce
 This occurs when a state contains more than one handle that
may be reduced on the same lookahead.
SLR Parsing and ambiguity

 shift/reduce conflict
Conflicts in SLR parsing

 The conflict occurred because we made a decision about


when to reduce based on what token may follow a non-
terminal at any time.
 However, the fact that a token t may follow a non-terminal
N in some derivation does not necessarily imply that t will
follow N in some other derivation.
 SLR parsing does not make a distinction.
 If the SLR parsing table of a grammar G has a conflict, we
say that that grammar is not SLR grammar.
 Every SLR grammar is unambiguous, but every
unambiguous grammar is not a SLR grammar.
Conflicts in SLR parsing

 Solution : instead of using general FOLLOW information,


try to keep track of exactly what tokens may follow a non-
terminal in each possible derivation and perform reductions
based on that knowledge.
 Save this information in the states.
 This gives rise to LR(1) items:
 items where we also save the possible lookaheads.
 LR(1) parsing uses lookahead to avoid unnecessary
conflicts in parsing table.
SLR vs LR(1)
LR(1) Items
Closure function for LR(1) Items
Goto function for LR(1) Items
Construction of The Canonical LR(1) Collection
Construction of LR(1) Parsing Tables
Example

 Consider the following augmented grammar:

S’  S
S  CC (1)
C  cC (2) | d (3)

 Show parse of string ccdcd using Canonical LR(1)


parsing(Exercise).
Example(Contd.): goto graph
Example(Contd.): Canonical parsing table
Another Example

 Consider the grammar given in (Slide 20)


 Goto graph is given below :
LALR parsing

 LALR stands for LookAhead LR.

 LALR parsers are often used in practice because LALR


parsing tables are smaller than LR(1) parsing tables.
 The number of states in SLR and LALR parsing tables for a
grammar G are equal.
 But LALR parsers recognize more grammars than SLR
parsers.
 yacc creates a LALR parser for the given grammar.
 A state of LALR parser will be again a set of LR(1) items.

 Motivation: Try to combine efficiency of SLR parser with


power of canonical method.

 Core : Set of LR(0) items corresponding to a set of LR(1)


items.
LALR table construction
LALR Parsing: Example

S’  S
S  CC (1)
C  cC (2) | d (3)
Corresponding LALR(1) parsing table

Exercise : Draw the LALR(1) parsing table for grammar in


(slide 19).
Conflicts in LALR(1) parsing

 LALR(1) parsing combines CLR(1) states to reduce


table size
• Less powerful than CLR(1)
– Will not introduce shift-reduce conflicts, because
shifts do not use lookaheads
– May introduce reduce-reduce conflicts, but
seldom do so for grammars of programming
languages

 If no conflict is introduced, the grammar is LALR(1)


grammar.
Conflicts in LALR(1) parsing: Example

 Consider the augmented grammar given below :

 Generates reduce-reduce conflict during shrink process (i.e.


replacing those sets having same cores with a single set
which is their union)
Error recovery in LR Parsing

 A canonical LR parser (LR(1) parser) will never make even


a single reduction before announcing an error.
 The SLR and LALR parsers may make several reductions
before announcing an error.
 But, all LR parsers (LR(1), LALR(1) and SLR(1) parsers)
will never shift an erroneous input symbol onto the stack.
Using Ambiguous Grammars

 All grammars used in the construction of LR-parsing tables


must be unambiguous.
 Can we create LR-parsing tables for ambiguous grammars ?
 Yes, but they will have conflicts.
 We can resolve these conflicts in favor of one of them to disambiguate
the grammar.
 At the end, we will have again an unambiguous grammar.
 Why we want to use an ambiguous grammar?
 Some of the ambiguous grammars are much natural, and a
corresponding unambiguous grammar can be very complex.
 Usage of an ambiguous grammar may eliminate unnecessary reductions.
 Ex. E  E+T |
T
E  E+E | E*E | (E) | id  T  T*F |
F
 F  (E) |id
Sets of LR(0) Items for Ambiguous Grammar

I0: E’ 
E
..E+E
E E I1: E’  E
E  E +E
.. +
..
E  E+E
.
I4: E  E + E
(
E
.. .
I7: E  E+E + I4
E  E +E * I
E .E*E E  E *E . E  E*E I2 E  E *E 5

E
E
..(E)
id
* ..
E  (E)
E  id
id
I3
(
I : E  E *.E
(

I2: E  ( ..E+E
5
E  .E+E (
E
.. .
I8: E  E*E + I4
E  .E*E
E)
id I2 E  E +E * I
E
E  .(E)
E  E *E 5

E .E*E E  .id
I3

id
E
E
..(E)
id
E

I : E  (E.) .
id ) I9: E  (E)
E  E.+E
6
+
.
I3: E  id E  E.*E * I4
I5
SLR-Parsing Tables for Ambiguous Grammar

FOLLOW(E) = { $,+,*,) }

State I7 has shift/reduce conflicts for symbols + and *.

E + E
I0 I1 I4 I7

when current token is +


shift  + is right-associative
reduce  + is left-associative

when current token is *


shift  * has higher precedence than +
reduce  + has higher precedence than *
SLR-Parsing Tables for Ambiguous Grammar

FOLLOW(E) = { $,+,*,) }

State I8 has shift/reduce conflicts for symbols + and *.

E * E
I0 I1 I5 I7

when current token is *


shift  * is right-associative
reduce  * is left-associative

when current token is +


shift  + has higher precedence than *
reduce  * has higher precedence than +
SLR-Parsing Tables for Ambiguous Grammar

States Action Goto


id + * ( ) $ E

0 s3 s2 1
1 s4 s5 acc
2 s3 s2 6
3 r4 r4 r4 r4
4 s3 s2 7
5 s3 s2 8
6 s4 s5 s9
7 r1 s5 r1 r1
8 r2 r2 r2 r2
9 r3 r3 r3 r3

You might also like