Recursive Descent Parsing
! Goal
• Determine if we can produce the string to be parsed from the
grammar's start symbol
! Approach
• Construct a parse tree: starting with start symbol at root,
recursively replace nonterminal with RHS of production
! Key question: which production to use?
• There can be many productions for each nonterminal
• We could try them all, backtracking if we are unsuccessful
• But this is slow!
! Answer: lookahead
• Keep track of next token on the input string to be processed
• Use this to guide selection of production
CMSC 330 - Spring 2011 1
Recursive Descent Example
E
E → id = n | { L }
{ L }
L→E;L|ε
E ; L
{x=3;{y=4;};}
x = 3 E ; L
lookahead { L } ε
E ; L
y = 4 ε
CMSC 330 - Spring 2011 2
Recursive Descent: Basic Strategy
! Initially, “current node” is start node
! When processing the current node, 4 possibilities
• Node is the empty string
Ø Move to next node in DFS order that has not yet been
processed
• Node is a terminal that matches lookahead
Ø Advance lookahead by one symbol and move to next node in
DFS order to be processed
• Node is a terminal that does not match lookahead
Ø Fail! String cannot be parsed
• Node is a nonterminal
Ø Pick a production based on lookahead, generate children, then
process children recursively
CMSC 330 - Spring 2011 3
Recursive Descent Parsing (cont.)
! Key step
• Choosing which production should be selected
! Two approaches
• Backtracking
Ø Choose some production
Ø If fails, try different production
Ø Parse fails if all choices fail
• Predictive parsing
Ø Analyze grammar to find “First sets” for productions
Ø Compare with lookahead to decide which production to select
Ø Parse fails if lookahead does not match First
CMSC 330 - Spring 2011 4
First Sets
! Example
• Suppose the lookahead is x
• For grammar S → xyz | abc
Ø Select S → xyz since 1st terminal in RHS matches x
• For grammar S → A | B A→x |y B→z
Ø Select S → A, since A can derive string beginning with x
! In general
• We want to choose a production that can derive a
sentential form beginning with the lookahead
• Need to know what terminals may be first in any
sentential form derived from a nonterminal / production
CMSC 330 - Spring 2011 5
First Sets
! Definition
• First(γ), for any sentential form γ, is the set of initial
terminals of all strings that γ may expand to
• We’ll use this to decide what production to apply
! Examples
• Given grammar S → xyz | abc
Ø First(xyz) = { x }, First(abc) = { a }
Ø First(S) = First(xyz) U First(abc) = { x, a }
• Given grammar S → A | B A→x |y B→z
Ø First(x) = { x }, First(y) = { y }, First(A) = { x, y }
Ø First(z) = { z }, First(B) = { z }
Ø First(S) = { x, y, z }
CMSC 330 - Spring 2011 6
Calculating First(γ)
! For a terminal a
• First(a) = { a }
! For a nonterminal N
• If N → ε, then add ε to First(N)
• If N → α1 α2 ... αn, then (note the αi are all the
symbols on the right side of one single production):
Ø Add First(α1α2 ... αn) to First(N), where First(α1 α2 ... αn) is
defined as
• First(α1) if ε ∉ First(α1)
• Otherwise (First(α1) – ε) ∪ First(α2 ... αn)
Ø If ε ∈ First(αi) for all i, 1 ≤ i ≤ k, then add ε to First(N)
CMSC 330 - Spring 2011 7
First( ) Examples
E → id = n | { L } E → id = n | { L } | ε
L→E;L|ε L→E;L
First(id) = { id } First(id) = { id }
First("=") = { "=" } First("=") = { "=" }
First(n) = { n } First(n) = { n }
First("{")= { "{" } First("{")= { "{" }
First("}")= { "}" } First("}")= { "}" }
First(";")= { ";" } First(";")= { ";" }
First(E) = { id, "{" } First(E) = { id, "{", ε }
First(L) = { id, "{", ε } First(L) = { id, "{", ";" }
CMSC 330 - Spring 2011 8
Recursive Descent Parser Implementation
! For terminals, create function parse_a
• If lookahead is a then parse_a consumes the lookahead
by advancing to the next token and then returns
• Otherwise fails with a parse error if lookahead is not a
! For each nonterminal N, create a function parse_N
• Called when we’re trying to parse a part of the input
which corresponds to (or can be derived from) N
• parse_S for the start symbol S begins the parse
CMSC 330 - Spring 2011 9
Parser Implementation (cont.)
! The body of parse_N for a nonterminal N does
the following
• Let N → β1 | ... | βk be the productions of N
Ø Here βi is the entire right side of a production- a sequence of
terminals and nonterminals
• Pick the production N → βi such that the lookahead is
in First(βi)
Ø It must be that First(βi) ∩ First(βj) = ∅ for i ≠ j
Ø If there is no such production, but N → ε then return
Ø Otherwise fail with a parse error
• Suppose βi = α1 α2 ... αn. Then call parse_α1(); ... ;
parse_αn() to match the expected right-hand side,
and return
CMSC 330 - Spring 2011 10
Recursive Descent Parser
! Given grammar S → xyz | abc
• First(xyz) = { x }, First(abc) = { a }
! Parser
parse_S( ) {
if (lookahead == “x”) {
parse_x; parse_y; parse_z); // S → xyz
}
else if (lookahead == “a”) {
parse_a; parse_b; parse_c; // S → abc
}
else error( );
}
CMSC 330 - Spring 2011 11
Recursive Descent Parser
! Given grammar S → A | B A→x |y B
→z
• First(A) = { x, y }, First(B) = { z }
parse_A( ) {
! Parser if (lookahead == “x”)
parse_S( ) { parse_x(); // A → x
if ((lookahead == “x”) || else if (lookahead == “y”)
(lookahead == “y”)) parse_y(); // A → y
parse_A( ); // S → A else error( );
}
else if (lookahead == “z”) parse_B( ) {
parse_B( ); // S → B if (lookahead == “z”)
else error( ); parse_z(); // B → z
} else error( );
}
CMSC 330 - Spring 2011 12
Example
E → id = n | { L } First(E) = { id, "{" }
L→E;L|ε
parse_E( ) { parse_L( ) {
if (lookahead == “id”) { if ((lookahead == “id”) ||
parse_id(); (lookahead == “{”)) {
parse_=(); // E → id = n parse_E( );
parse_n(); parse_; (); // L → E ; L
} parse_L( );
else if (lookahead == “{“) { }
parse_{ (); else ; // L → ε
parse_L( ); // E → { L } }
parse_} ();
}
else error( );
}
CMSC 330 - Spring 2011 13
Things to Notice
! If you draw the execution trace of the parser
• You get the parse tree
! Examples
• Grammar • Grammar
S → xyz S→A|B
S → abc A→x |y
• String “xyz” B→z
parse_S( ) • String “x” S
S
parse_x() parse_S( ) |
/|\
parse_y()
x y z
parse_A( ) A
parse_z() parse_x |
x
CMSC 330 - Spring 2011 14
Things to Notice (cont.)
! This is a predictive parser
• Because the lookahead determines exactly which
production to use
! This parsing strategy may fail on some grammars
• Possible infinite recursion
• Production First sets overlap
• Production First sets contain ε
! Does not mean grammar is not usable
• Just means this parsing method not powerful enough
• May be able to change grammar
CMSC 330 - Spring 2011 15
Left Factoring
! Consider parsing the grammar E → ab | ac
• First(ab) = a
• First(ac) = a
• Parser cannot choose between RHS based on
lookahead!
! Parser fails whenever A → α1 | α2 and
• First(α1) ∩ First(α2) != ε or ∅
! Solution
• Rewrite grammar using left factoring
CMSC 330 - Spring 2011 16
Left Factoring Algorithm
! Given grammar
• A → xα1 | xα2 | … | xαn | β
! Rewrite grammar as
• A → xL | β
• L → α1 | α2 | … | αn
! Repeat as necessary
! Examples
• S → ab | ac ⇨ S → aL L→b|c
• S → abcA | abB | a ⇨ S → aL L → bcA | bB | ε
• L → bcA | bB | ε ⇨ L → bL’ | ε L’ → cA |
B
CMSC 330 - Spring 2011 17
Left Recursion
! Consider grammar S → Sa | ε
• First(Sa) = a, so we’re ok as far as which production
• Try writing parser parse_S( ) {
if (lookahead == “a”) {
parse_S( );
parse_a (); // S → Sa
}
else { }
}
• Body of parse_S( ) has an infinite loop
Ø if (lookahead = "a") then parse_S( )
• Infinite loop occurs in grammar with left recursion
CMSC 330 - Spring 2011 18
Right Recursion
! Consider grammar S → aS | ε
• Again, First(aS) = a
• Try writing parser parse_S( ) {
if (lookahead == “a”) {
parse_a();
parse_S( ); // S → aS
}
else { }
}
• Will parse_S( ) infinite loop?
Ø Invoking parse_tok( ) will advance lookahead, eventually stop
• Top down parsers handles grammar w/ right recursion
CMSC 330 - Spring 2011 19
Expression Grammar for Top-Down Parsing
E → T E'
E' → ε | + E
T → P T'
T' → ε | * T
P→n | (E)
• Notice we can always decide what production to
choose with only one symbol of lookahead
CMSC 330 - Spring 2011 20
Tradeoffs with Other Approaches
! Recursive descent parsers are easy to write
• The formal definition is a little clunky, but if you follow
the code then it’s almost what you might have done
if you weren't told about grammars formally
• They're unable to handle certain kinds of grammars
! Recursive descent is good for a simple parser
• Though tools can be fast if you’re familiar with them
! Can implement top-down predictive parsing as a
table-driven parser
• By maintaining an explicit stack to track progress
CMSC 330 - Spring 2011 21
Tradeoffs with Other Approaches
! More powerful techniques need tool support
• Can take time to learn tools
! Main alternative is bottom-up, shift-reduce parser
• Replaces RHS of production with LHS (nonterminal)
• Example grammar
Ø S → aA, A → Bc, B → b
• Example parse
Ø abc ⇒ aBc ⇒ aA ⇒ S
Ø Derivation happens in reverse
• Something to look forward to in CMSC 430
CMSC 330 - Spring 2011 22
What’s Wrong With Parse Trees?
! Parse trees contain too much information
• Example
Ø Parentheses
Ø Extra nonterminals for precedence
• This extra stuff is needed for parsing
! But when we want to reason about languages
• Extra information gets in the way (too much detail)
CMSC 330 - Spring 2011 23
Abstract Syntax Trees (ASTs)
! An abstract syntax tree is a more compact,
abstract representation of a parse tree, with only
the essential parts
parse
AST
tree
CMSC 330 - Spring 2011 24
Abstract Syntax Trees (cont.)
! Intuitively, ASTs correspond to the data structure
you’d use to represent strings in the language
• Note that grammars describe trees
• So do OCaml datatypes (which we’ll see later)
• E → a | b | c | E+E | E-E | E*E | (E)
CMSC 330 - Spring 2011 25