COMP
412
FALL
2015
Parsing
V
LL(1)
Parsing,
start
of
Bo2om-up
Parsing
Comp
412
source
code
IR
Front
End
OpMmizer
target
IR
Back
End
code
Copyright
2015,
Keith
D.
Cooper
&
Linda
Torczon,
all
rights
reserved.
Students
enrolled
in
Comp
412
at
Rice
University
have
explicit
permission
to
make
copies
of
these
materials
for
their
personal
use.
Faculty
from
other
educaMonal
insMtuMons
may
use
these
materials
for
nonprot
educaMonal
purposes,
provided
this
copyright
noMce
is
preserved.
Chapter
3
in
EaC2e
PredicMve
Parsing
Review
from
last
lecture
Given
a
grammar
that
has
the
LL(1)
property
We
can
write
a
simple
rouMne
to
recognize
an
instance
of
each
LHS
Code
is
paTerned,
simple,
&
fast
Consider
A
1
|
2
|
3,
with
FIRST+(Ai)
FIRST+
(Aj)
=
if
i
j
/*
nd
an
A
*/
if
(current_word
FIRST+(A1))
nd
a
1
and
return
true
else
if
(current_word
FIRST+(A2))
nd
a
2
and
return
true
else
if
(current_word
FIRST+(A3))
nd
a
3
and
return
true
else
report
an
error
and
return
false
COMP
412,
Fall
2015
Grammars
that
have
the
LL(1)
property
are
called
predicBve
grammars
because
the
parser
can
predict
the
correct
expansion
at
each
point
in
the
parse.
Parsers
that
capitalize
on
the
LL(1)
property
are
called
predicBve
parsers.
One
kind
of
predicMve
parser
is
the
recursive
descent
parser.
Of
course,
there
is
more
detail
to
nd
a
i
typically
a
recursive
call
to
another
small
rouMne
(see
pp.
108111
in
EaC2e)
Recursive
Descent
Parsing
Review
from
last
lecture
Recall
the
expression
grammar,
aHer
transformaBon
0
Goal
Expr
Expr
Term
Expr
Expr
+
Term
Expr
|
-
Term
Expr
Term
Factor
Term
Term
*
Factor
Term
|
/
Factor
Term
Factor
(
Expr
)
10
|
number
11
|
id
COMP
412,
Fall
2015
This
grammar
leads
to
a
parser
that
has
six
mutually
recursive
rouMnes:
1.
Goal
2.
Expr
3.
EPrime
4.
Term
5.
TPrime
6.
Factor
Each
rouMne
recognizes
an
RHS
for
that
NT.
The
term
descent
refers
to
the
direcMon
in
which
the
parse
tree
is
built.
Recursive
Descent
Parsing
Review
from
last
lecture
A
couple
of
rouBnes
from
the
expression
parser
Goal(
)
token
next_token(
);
if
(Expr(
)
=
true
&
token
=
EOF)
then
next
compilaIon
step;
else
report
syntax
error;
return
false;
Expr(
)
if
(Term(
)
=
false)
then
return
false;
else
return
Eprime(
);
looking
for
number,
idenMer,
or
(,
found
token
instead,
or
failed
to
nd
Expr
or
)
amer
(
COMP
412,
Fall
2015
Factor(
)
if
(token
=
number
)
then
token
next_token(
);
return
true;
else
if
(token
=
idenMer
)
then
token
next_token(
);
return
true;
else
if
(token
=
lparen
)
token
next_token(
);
if
(Expr(
)
=
true
&
token
=
rparen
)
then
token
next_token(
);
return
true;
//
fall
out
of
if
statement
report
syntax
error;
return
false;
EPrime,
Term,
&
TPrime
follow
the
same
basic
lines
(Figure
3.10,
EaC2e)
4
112 CHAPTER 3 Parsers
Recursive
Descent
Page
111
in
EaC2e
sketches
a
recursive
descent
parser
for
the
right-recursive
version
of
the
classic
expression
grammar.
One
rouMne
per
NT
Check
each
RHS
by
checking
each
symbol
Includes
-producMons
Your
lab
2
parsers
are
not
much
more
complex
than
the
example.
Main( )
/* Goal Expr */
word NextWord( );
if (Expr( ))
then if (word = eof )
then report success;
else Fail( );
TPrime( )
/* Term Factor Term
Review
from
last
lecture
/* Term Factor Term
Fail( )
report syntax error;
attempt error recovery or exit;
Expr( )
/* Expr Term Expr */
if ( Term( ) )
then return EPrime( );
else Fail();
EPrime( )
/* Expr + Term Expr */
/* Expr - Term Expr */
if (word = + or word = - )
then begin;
word NextWord( );
if ( Term() )
then return EPrime( );
else Fail();
end;
else if (word = ) or word = eof)
/* Expr */
then return true;
else Fail();
Term( )
/* Term Factor Term */
if ( Factor( ) )
then return TPrime( );
else Fail();
COMP
412,
Fall
2015
if (word = or word = )
then begin;
word NextWord( );
if ( Factor( ) )
then return TPrime( );
else Fail();
end;
else if (word = + or word = - or
word = ) or word = eof)
/* Term */
then return true;
else Fail();
Factor( )
/* Factor ( Expr ) */
if (word = ( ) then begin;
word NextWord( );
if (not Expr( ) )
then Fail();
if (word = ) )
then Fail();
word NextWord( );
return true;
end;
/* Factor num */
/* Factor name */
else if (word = num or
word = name )
then begin;
word NextWord( );
return true;
end;
else Fail();
FIGURE 3.12
*/
*/
Recursive-Descent Parser for Expressions
Top-Down
Recursive
Descent
Parser
At
this
point,
you
have
enough
informaBon
to
build
a
top-down
recursive-descent
parser
Need
a
right-recursive
grammar
that
meets
the
LL(1)
condiMon
Can
use
lem-factoring
to
eliminate
common
prexes
Can
transform
direct
lem
recursion
into
right
recursion
Need
a
general
algorithm
to
handle
indirect
lem
recursion
Need
to
build
FIRST,
FOLLOW,
and
FIRST
+
sets
Emit
a
rouMne
for
each
non-terminal
Nest
of
if-then-else
statements
to
check
alternate
rhss
Each
returns
true
on
success
and
throws
an
error
on
false
Simple,
working
(perhaps
ugly)
code
Could
automaMcally
construct
a
recursive-descent
parser
Can
we
do
beTer?
COMP
412,
Fall
2015
I
dont
know
of
a
system
that
does
this
ImplemenMng
a
Recursive
Descent
Parser
A
nest
of
if-then
else
statements
may
be
slow
A
good
case
statement
would
be
an
improvement
Python?
See
EaC2e,
7.8.3
Encode
with
computaMon
rather
than
repeated
branches
Order
the
cases
by
expected
frequency,
to
drop
average
cost
What
about
encoding
the
decisions
in
a
table?
Replace
if
then
else
or
case
statement
with
an
address
computaMon
Branches
are
slow
and
disrupMve
Interpret
the
table
with
a
skeleton
parser,
as
we
did
in
scanning
COMP
412,
Fall
2015
Building
Table-Driven
Top-down
Parsers
Strategy
Encode
knowledge
in
a
table
Use
a
standard
skeleton
parser
to
interpret
the
table
Example
The
non-terminal
Factor
has
3
expansions
Non-terminal
Symbols
Factor
Goal
Expr
Expr
Term
Expr
Expr
+
Term
Expr
-
Term
Expr
Term
Factor
Term
Term
*
Factor
Term
/
Factor
Term
(
Expr
)
10
number
11
idenMer
(
Expr
)
or
IdenMer
or
Number
Table
might
look
like:
Terminal
Symbols
Factor
Id.
Num.
EOF
11
10
Cannot
expand
Factor
into
an
COMP
412,
Fall
2015
operator
error
Expand
Factor
by
rule
10
with
input
number
8
Building
Top-down
Parsers
Building
the
complete
table
Need
a
row
for
every
NT
&
a
column
for
every
T
COMP
412,
Fall
2015
9
*
LL(1)
Table
for
the
Expression
Grammar
+
id
num
EOF
Goal
Expr
Expr
Term
Term
11
10
Row
we
built
earlier
Factor
Table
diers
from
Figure
3.11
on
page
112
in
EaC2e
because
the
order
of
non-terminals
(columns)
is
diferent.
COMP
412,
Fall
2015
10
Building
Top-down
Parsers
Building
the
complete
table
Need
a
row
for
every
NT
&
a
column
for
every
T
Need
an
interpreter
for
the
table
(skeleton
parser)
COMP
412,
Fall
2015
11
LL(1)
Skeleton
Parser
word
NextWord()
//
IniIal
condiIons,
including
push
EOF
onto
Stack
//
a
stack
to
track
local
goals
push
the
start
symbol,
S,
onto
Stack
TOS
top
of
Stack
loop
forever
if
TOS
=
EOF
and
word
=
EOF
then
break
&
report
success
//
exit
on
success
else
if
TOS
is
a
terminal
then
if
TOS
matches
word
then
pop
Stack
//
recognized
TOS
word
NextWord()
else
report
error
looking
for
TOS
//
error
exit
else
//
TOS
is
a
non-terminal
if
TABLE[TOS,word]
is
A
B1B2Bk
then
pop
Stack
//
get
rid
of
A
push
Bk,
Bk-1,
,
B1
//
in
that
order
else
break
&
report
error
expanding
TOS
TOS
top
of
Stack
COMP
412,
Fall
2015
12
Building
Top-down
Parsers
Building
the
complete
table
Need
a
row
for
every
NT
&
a
column
for
every
T
Need
a
table-driven
interpreter
for
the
table
Need
an
algorithm
to
build
the
table
Filling
in
TABLE[X,y],
X
NT,
y
T
1. entry
is
the
rule
X
,
if
y
FIRST+(X
)
2. entry
is
error
if
rule
1
does
not
dene
If
any
entry
has
more
than
one
rule,
G
is
not
LL(1)
Incrementally
tests
the
LL(1)
criterion
on
each
NT.
An
ecient
way
to
determine
if
a
grammar
is
LL(1)
This
algorithm
is
the
LL(1)
table
construcMon
algorithm
In
Lab
2,
you
will
build
a
recursive
descent
parser
for
a
modied
form
of
BNF
and
build
LL(1)
tables
for
the
grammars
that
are
LL(1).
COMP
412,
Fall
2015
13
Recap
of
Top-down
Parsing
Top-down
parsers
build
syntax
tree
from
root
to
leaves
Lem-recursion
causes
non-terminaMon
in
top-down
parsers
TransformaMon
to
eliminate
lem
recursion
TransformaMon
to
eliminate
common
prexes
in
right
recursion
FIRST,
FIRST+,
&
FOLLOW
sets
+
LL(1)
condiMon
LL(1)
uses
lem-to-right
scan
of
the
input,
lemmost
derivaMon
of
the
sentence,
and
1
word
lookahead
LL(1)
condiMon
means
grammar
works
for
predicMve
parsing
Given
an
LL(1)
grammar,
we
can
Build
a
recursive
descent
parser
Build
a
table-driven
LL(1)
parser
LL(1)
parser
doesnt
build
the
parse
tree
Keeps
lower
fringe
of
parMally
complete
tree
on
the
stack
COMP
412,
Fall
2015
14
Parsing
Techniques
Top-down
parsers
(LL(1),
recursive
descent)
Start
at
the
root
of
the
parse
tree
and
grow
toward
leaves
Pick
a
producMon
&
try
to
match
the
input
G
Bad
pick
may
need
to
backtrack
E
Some
grammars
are
backtrack-free
E
BoTom-up
parsers
(LR(1),
operator
precedence)
Start
at
the
leaves
and
grow
toward
root
As
input
is
consumed,
encode
possibiliMes
in
an
internal
state
Start
in
a
state
valid
for
legal
rst
tokens
We
can
make
the
process
determinisMc
COMP
412,
Fall
2015
<id,x>
<num,2>
F
<id,y>
Parse
tree
for
x
+
2
*
y
BoTom-up
parsers
can
recognize
a
strictly
larger
class
of
grammars
than
can
top-down
parsers.
15
BoTom-up
Parsing
(deniMons)
The
point
of
parsing
is
to
construct
a
deriva@on
A
derivaMon
consists
of
a
series
of
rewrite
steps
S
0
1
2
n1
n
sentence
Each
i
is
a
sentenMal
form
If
contains
only
terminal
symbols,
is
a
sentence
in
L(G)
If
contains
1
or
more
non-terminals,
is
a
sentenBal
form
To
get
i
from
i1,
expand
some
NT
A
i1
by
using
A
Replace
the
occurrence
of
A
i1
with
to
get
i
In
a
lemmost
derivaMon,
it
would
be
the
rst
NT
A
i1
A
leA-senten@al
form
occurs
in
a
le^most
derivaMon
A
right-senten@al
form
occurs
in
a
rightmost
derivaMon
BoEom-up
parsers
build
a
rightmost
deriva@on
in
reverse
COMP
412,
Fall
2015
16
BoTom-up
Parsing
(deniMons)
A
boTom-up
parser
builds
a
derivaMon
by
working
from
the
input
sentence
back
toward
the
start
symbol
S
S
0
1
2
n1
n
sentence
boTom-up
To
reduce
i
to
i1
match
some
rhs
against
i
then
replace
with
its
corresponding
lhs,
A.
(assuming
the
reducIon
is
A)
In
terms
of
the
parse
tree,
it
works
from
leaves
to
root
Nodes
with
no
parent
in
a
parMal
tree
form
its
upper
fringe
Since
each
replacement
of
with
A
shrinks
the
upper
fringe,
we
call
it
a
reducBon.
Rightmost
derivaMon
in
reverse
processes
words
le^
to
right
The
parse
tree
need
not
be
built,
it
can
be
simulated
|parse
tree
nodes
|
=
|terminal
symbols
|
+
|reducIons
|
Shrinks
the
Fuall
pper
fringe
implies
that
the
terminals
are
all
instanMated,
at
least
implicitly.
COMP
412,
2015
17
Finding
ReducMons
Consider
the
grammar
0
Goal
2
3
a
A
B
e
SentenIal
Form
Next
ReducIon
Prodn
Posn
A
b
c
abbcde
|
b
a
A
bcde
a
A
de
a
A
B
e
Goal
And
the
input
string
abbcde
The
trick
is
scanning
the
input
and
nding
the
next
reducMon.
The
mechanism
for
doing
this
must
be
ecient.
The
reducMons
are
obvious
from
the
derivaMon.
Of
course,
building
the
derivaMon
is
not
a
pracMcal
way
to
nd
it.
COMP
412,
Fall
2015
18
Finding
ReducMons
Consider
the
grammar
0
Goal
2
3
a
A
B
e
SentenIal
Form
Next
ReducIon
Prodn
Posn
A
b
c
abbcde
|
b
a
A
bcde
a
A
de
a
A
B
e
Goal
And
the
input
string
abbcde
The
trick
is
scanning
the
input
and
nding
the
next
reducMon
The
mechanism
for
doing
this
must
be
ecient
PosiIon
species
where
the
right
end
of
occurs
in
the
current
sentenIal
form.
While
the
process
of
nding
the
next
reducMon
appears
to
be
almost
oracular,
it
can
be
automated
in
an
ecient
way
for
a
large
class
of
grammars.
COMP
412,
Fall
2015
19
Finding
ReducMons
(Handles)
The
parser
nds
a
substring
of
the
trees
fronMer
that
derives
from
expansion
by
A
in
the
previous
step
in
the
rightmost
derivaIon
Informally,
we
call
this
substring
a
handle
Formally,
A
handle
of
a
right-sentenMal
form
is
a
pair
<A,k>
where
A
P
and
k
is
the
posiMon
in
of
s
rightmost
symbol.
If
<A,k>
is
a
handle,
then
replacing
at
k
with
A
produces
the
right
sentenMal
form
from
which
is
derived
in
the
rightmost
derivaMon.
Because
is
a
right-sentenMal
form,
the
substring
to
the
right
of
a
handle
contains
only
terminal
symbols
the
parser
doesnt
need
to
scan
(much)
past
the
handle
Handles
are
the
most
mysIfying
aspect
of
bo2om-up,
shi^-reduce
parsers.
It
usually
takes
a
couple
lectures
COMP
412,
Fall
2015
20
Using
Handles:
a
BoTom-up
Parser
As
with
the
top-down
parser,
we
will
introduce
a
stack
to
hold
the
upper
fringe
of
the
parMally
completed
parse
tree.
A
simple
shiH-reduce
parser:
push
INVALID
word
NextWord(
)
repeat
unIl
(top
of
stack
=
Goal
and
word
=
EOF)
if
the
top
of
the
stack
is
a
handle
A
then
//
reduce
to
A
pop
|
|
symbols
o
the
stack
push
A
onto
the
stack
else
if
(word
EOF)
then
//
shi^
push
word
word
NextWord(
)
else
//
need
to
shi^,
but
out
of
input
report
an
error
This
p4arser
s
someMmes
called
a
handle-pruning
parser.
COMP
12,
Fiall
2015
What
happens
on
an
error?
Parser
fails
to
nd
a
handle
Thus,
it
keeps
shiming
Eventually,
it
consumes
all
input
This
parser
reads
all
input
before
reporMng
an
error,
not
a
desirable
property.
To
x
this
issue,
the
parser
must
recognize
the
failure
to
nd
a
handle
earlier.
To
make
shim-reduce
parsers
pracMcal,
we
need
good
error
localizaMon
in
the
handle-
nding
process.
21
Example
0
Goal
Expr
1
Expr
Expr
+
Term
|
Expr
-
Term
|
Term
4
Term
Term
*
Factor
|
Term
/
Factor
|
Factor
(
Expr
)
7
Factor
8
|
number
|
id
BoTom-up
parsers
work
with
either
lem-recursive
or
right-recursive
grammars.
The
obvious
lem-recursive
grammar
is
lem
associaMve.
I
prefer
the
obvious
lem-recursive
grammar
because
its
associaMvity
matches
the
standard
rules
that
we
were
all
taught
as
children.
The
examples
will
use
the
lem-
recursive,
lem-associaMve
grammar.
A
simple
leA-recursive
form
of
the
classic
expression
grammar
COMP
412,
Fall
2015
22
Example
0
Goal
Expr
Prodn
SentenIal
Form
1
Expr
Expr
+
Term
Goal
|
Expr
-
Term
Expr
|
Term
Expr
-
Term
4
Term
Term
*
Factor
Expr
-
Term
*
Factor
|
Term
/
Factor
Expr
-
Term
*
<id,y>
|
Factor
Expr
-
Factor
*
<id,y>
(
Expr
)
7
Factor
Expr
-
<num,2>
*
<id,y>
|
number
Term
-
<num,2>
*
<id,y>
|
id
Factor
-
<num,2>
*
<id,y>
9
<id,x>
-
<num,2>
*
<id,y>
A
simple
leA-recursive
form
of
the
classic
expression
grammar
derivaMon
Rightmost
deriva@on
of
x
2
*
y
COMP
412,
Fall
2015
23
Example
0
Goal
Expr
Prodn
SentenIal
Form
1
Expr
Expr
+
Term
Goal
|
Expr
-
Term
Expr
|
Term
Expr
-
Term
4
Term
Term
*
Factor
Expr
-
Term
*
Factor
|
Term
/
Factor
Expr
-
Term
*
<id,y>
|
Factor
Expr
-
Factor
*
<id,y>
(
Expr
)
7
Factor
Expr
-
<num,2>
*
<id,y>
|
number
Term
-
<num,2>
*
<id,y>
|
id
Factor
-
<num,2>
*
<id,y>
A
simple
leA-recursive
form
of
the
classic
expression
grammar
<id,x>
-
<num,2>
*
<id,y>
parse
Handles
for
rightmost
deriva@on
of
x
2
*
y
COMP
412,
Fall
2015
24
Handles
At
this
point,
handles
appear
mysterious
Dont
Panic:
handles
are
mysterious
Next
lecture
will
focus
on
handles
If
it
were
easy,
it
would
not
have
taken
Knuth
to
invent
it!
Handles
can
be
discovered
in
an
easy
&
systemaMc
way
It
just
takes
another
lecture
or
so
to
get
to
that
point
If
we
had
a
handle-generaMng
oracle,
boTom-up
parsing
would
be
easy
We
will
show
how
to
derive
that
oracle
As
you
might
guess,
the
answer
lies
in
pracMcal
applicaMon
of
material
from
COMP
481
Next
Class
Handles,
handles,
and
more
handles
COMP
412,
Fall
2015
25