LR Parsing and LR Parser
Generator
What is an LR(k) Parser
• An LR parser is a bottom-up parser
• Operates by scanning the input from left to
right (`L' for left-to-right) and generates:
– the right-most parse tree (`R' for right-most parse
tree)
– in a bottom-up fashion
• If parser needs to look ahead by no more than
k-symbols, we call it an LR(k) parser
An LR Parser
Same for ALL LR parsers
Differs from parser to parser
Handle
• Consider grammar
1. S → aABe
2. A → Abc
3. A→b
4. B→d
• Consider right-most production of abbcbcde
– S =>1 aABe =>4 aAde =>2 aAbcde =>2 aAbcbcde =>3
abbcbcde
• Right-most reduction follow above steps in the
reverse order
LR Parsing Theory
• Deals with how to make parser know exactly what to do at a particular
instance
• For this, an LR parser uses a stack to push symbols
• If first few symbols at t.o.s match the right side of some rule:
– these symbols are popped out from stack and the left side of the rule is pushed into the
stack
– this operation is called a reduction
– for example, if the stack is aAbc (where a is at bottom of stack)
• the parser will reduce using the rule A → Abc
• it will pop out Abc from the stack and push A
• the stack now becomes aA
• the sequence Abc is the handle at the time of reduction
• LR parser correctly identifies a handle on the top of the stack
– and replaces this handle in the stack with the left side of the rule
• We may also say that the parser has pruned the handle in the stack
– to the left side of the rule
• Every LR parser proceeds by carrying out a series of handle pruning
LR Parsing Theory
• Another name for LR parsers is shift-reduce parsers
• There are two fundamental actions:
– shift the current input token in the stack
• and read the next token
– reduce by some production rule
• Problem for the LR parser is when to shift and when to reduce
– if to reduce, by which rule
• Needs a recognizer for handles
– so that by scanning stack, can decide proper action
• Recognizer is actually a finite state machine (DFA)
– but language symbols include both terminals and non-terminals
LR Parsing Theory
• DFA corresponding to an LR parser can be table-driven
– However, it is slightly different from normal DFA
• Two parts of the DFA table, called the LR Parsing Table
– ACTION and GOTO
• For current state s and the current input a, ACTION[s][a]
can be:
– Shift-t: Push a and make t the next state
– Reduce by rule A → β
– Accept, signaling the end of successful parsing
– Error
• When the action is reduce:
– ACTION table does not tell what will be next state
– For that parser uses a different table GOTO
LR Parser Summary
• All LR parsers are shift-reduce parsers
• All LR parsers have identical driving routine
– the module of the parser that carries out shift or reduce
• The driving routine uses a stack to store a string of the form:
– s0X1s1X2s2 . . . Xmsm
• sm is on the top of stack
• symbols s0, s1, … etc., are states (of the DFA)
• X1, X2, ... etc., are grammar symbols
– (terminals or non-terminals)
• Driving routine uses parsing table to decide current action
• Parsing table is a DFA state table split in two parts
– ACTION and GOTO.
• There are some utilities that may be used to automatically
generate the parsing table for a grammar
– provided the grammar is indeed an LR(k) grammar of appropriate k
Example LR Parser for ETF Grammar
Rules:
1. E -> E + T
2. E -> T
3. T -> T * F
4. T -> F
5. F -> (E)
6. F -> id
Driving Routine
push(0);
read_next_token();
for(;;)
{
s = top(); /* current state is taken from top of stack */
if (ACTION[s,current_token] == "s-i") // shift and go to state i
{
push(current_token);
push(i);
read_next_token();
}
else if (ACTION[s,current_token] == "r-i") // reduce by rule i: X --> A1...An
{
perform pop() 2 * n times;
s = top();
push(X) // push the left hand side of chosen rule
push(GOTO[s,X]); // push state after reduction
// OUTPUT RULE
}
else if (ACTION[s,current_token] == "succ")
success!!
else error();
}
Parser Walk-through
• We will carry out LR paring of input:
– (a + b) * (c * (d + e))
– i.e., (id + id) * (id * (id + id))
• A configuration of an LR parser is the current
stack and the current unseen input:
– (s0X1s1X2s2 . . . Xmsm, aiai+1 . . . an$)
• The initial configuration is:
• (0, (id + id) * (id * (id + id)) $)
Some Observations
• An LR Parser “shift”-s till stack top is known to
contain the handle
• When stack top is known to contain the
handle, LR parser “reduce”-es
• So, the key aspect of LR Parser is to impart it
knowledge to “know” when a handle on the
top of stack
• Done using a DFA of Viable Prefixes
Viable Prefix
• A Viable Prefix is a prefix of a right-sentential form that
does NOT continue past the end of the rightmost handle of
that sentential form
• In the Right-most derivation’s last step
– E => T => T * F => T * (E) => T * (E + T)
• Highlighted the handle
• Hence possible Viable Prefixes are:
• T
• T*
• T*(
• T * (E
• T * (E +
• T * (E + T
• But NOT T * (E + T)
Simple LR or SLR Parser
• is a type of LR parser with small parse tables
and a relatively simple parser generator
algorithm
• quite efficient at finding the single correct
bottom-up parse in a single left-to-right scan
over the input stream, without guesswork or
backtracking
• mechanically generated from a formal
grammar for the language
LR(0) Item or “item”
• A Grammar Rule with a dot (.) at some position
on right hand side
• For a rule with |r.h.s| = n, there are n + 1 items
• Example:
– For rule E → E + T, the items are:
a) E→.E+T
b) E→E.+T
c) E→E+.T
d) E→E+T.
• An item loosely signify:
– how much of the rule has been seen by the parser
Augmented Grammar
• If S is the start-symbol of a grammar
• Augment the grammar with ONE extra rule
• S’ → S
• Hence the item S’ → . S means parsing has not
started
• And item S’→ S. means parsing is over (with
success)
Closure of a Set of Items
• Definition: For any set of items I, CLOSURE(I)
is formed as follows:
– Initialize CLOSURE(I) = I
– If A → α · B β is in CLOSURE(I) and B → γ is a
production, then add B → · γ to the closure and
repeat
• Why
– S =>* δ α B β φ =>* δ α γ β φ
The Set of Items I0
• From Augmented Grammar (having used
S’→ S):
– I0 = CLOSURE({S’→ . S})
• Thus, in ETF Grammar,
= { E' → ·E,
E → . E + T,
E → .T,
T → . T * F,
T → . F,
F → . ( E ),
F → . id }
GOTO(I,X)
• Definition: If I is a set of items and X is a
grammar symbol, then GOTO(I,X) is the
CLOSURE of the set of items A→α X . β
where A→α . X β is in I
• Example:
– GOTO(I0,E) = {E’→E ., E’→E . + T } = I1 (say)
– GOTO(I0,() = {F → (. E ), E → . E + T, E → .T,
T → . T * F, T → . F, F → . (E ), F → . id }
= I4 (as we will see later)
Make X a State Transition from a state
I to state J where J = GOTO(I, X)
I J
A Canonical Collection of LR(0) Items
• Start with Augmented Grammar and then I0
• Create DFA
• Make I0 the Initial State of the DFA
• Theorem of LR Parsing (without proof):
– ANY string traced out by travelling from I0 to any
other state traces out a viable prefix of the grammar
– i.e., the set of all viable prefixes of all the right
sentential forms of a grammar is a regular language
Canonical Collection for ETF Grammar
Construction of SLR Parsing Table
a
I J ACTION[State-I, a] = shift-J
A
I J GOTO[State-I, A] = J
.
.
A→α . ACTION[State-I, b] = reduce A→α
. For ALL a in FOLLOW(A)
.
S’→S . ACTION[State-I, S] = acc
Summary of Process
• Form Augmented Grammar
• Generate Canonical Collection of LR(0) items
• Construct SLR Parsing Table with above
• Fill up blank entries with appropriate error
messages
• Grammar is not SLR if any entry has duplicates
Are ALL LR Parsers SLR?
• NO
Conflicts
• Shift-reduce and reduce-reduce
• Example Grammar:
– S→L=R|R
– L → * R | id
– R→L
• We will have I2 = { S → L . = R, R → L . }
– Remember ‘=’ is in FOLLOW(R)
ACTION[2, ‘=‘] => Shift ACTION[2, ‘=‘] => Reduce(R → L)
Lookaheads → LR(1) Items
• Sometimes conflicts can be avoided with ‘lookaheads’
• LR(1) Item: [A → α · β , a], where a is terminal or ‘$’
• Is valid for viable prefix γ if:
– S ➔* δ A φ ➔ δ α β φ, where
• γ=δα
and
• a is either the first symbol of φ or φ is empty and a is ‘$’
• CLOSURE and GOTO are appropriately defined
• If [A → α · , a] (A ≠ S’) is in state Ii:
– ACTION[i, a] = reduce(A → α )
Example
• Augmented Grammar:
– S’→ S
– S→CC
– C→cC|d
LALR parsing Table
• Merge States. Example, we merge I3 and
I6:
– I36 = {
[C → c · C, c/d/$],
[C → · c C, c/d/$],
[C → · d, c/d/$]
}
• Merge rows of merged states in Parsing
Table
• Same number of states as will be in SLR
• So, smaller table but lookahead built-in
• Can also incorporate precedence and
associativity
Using Ambiguous Grammars
• The ‘E’ grammar:
– E → E + E | E * E | ( E ) | id
• Less states
• Many conflicts likely
• Impose precedence
and associativity
externally