Principles of
Compiler Design
19ECSC203
Chapter 04 - Top Down Parsing
PoCD Team
School of Computer Science & Engineering
2020 - 21
Principles of Compiler Design
Chapter 4 Contents
Top Down Parsing
Eliminating Left Recursion
Left Factoring
Eliminating Left Recursion
Top Down Parsing
A production has left recursion if it the below
FIRST and FOLLOW sets
form:
AAα LL (1) Parsing
Top down parsing methods cannot handle Error recovery in Top Down
left recursive grammars. Hence we eliminate Parsing
them.
Example: ============================
1. A Aα | β
Can be converted to “Mind parser never knows a rule.
A β A’ May be that’s why we end up with
A’ α A’ | Є ambiguous decisions!”
2. E E + T | T
==========================
T T * F | F
F(E) | id
Eliminating Left recursion:
E TE’
E’+TE’ | Є
TFT’
T’*FT’ | Є
F(E) | id
Above eliminates only the immediate left
recursion. We apply the following general
algorithm.
SoCSE Page | 2
Principles of Compiler Design
ALGORITHM Eliminating_Left_Recursion
INPUT: Grammar G with no cycles or Є-Productions
OUTPUT: An equivalent grammar with no left recursion
METHOD: Apply the algorithm to G. Note that the resulting non left recursive grammar may have Є-
Productions
arrange all non terminals in some order A1, A2, …. An
for ( each i from 1 to n) {
for ( each j from 1 to i - 1) {
replace each production of the form Ai Ajγ by the
productions Ai δ1γ | δ2γ | … δkγ where
Aj δ1 | δ2 | … δk are all current Aj – productions
}
eliminate the immediate left recursion among the Aj – productions
}
Applying the above algorithm to the grammar
SAa | b
A Ac | Sd | Є
We have,
SAa | b
AbdA’ | A’
A’ cA’ | adA’ | Є
Left Factoring
Left Factoring is applied to produce a grammar suitable for predictive or top down parsing.
Consider the grammar:
stmt if expr then stmt else stmt
| if expr then stmt
We cannot immediately tell which one to use on seeing input “if”. Hence we defer the
decision by expanding it to the next step.
Example: A αβ1 | αβ2 can be rewritten as
A αA’ and
A’ β1 | β2.
SoCSE Page | 3
Principles of Compiler Design
Top-Down Parsing
Top-down Parsing is
Constructing a parse tree
In Preorder
Using Depth First Search
By finding the leftmost derivation for an input string
Consider the Grammar:
E TE’
E’ +TE | Є
T *FT’
T’ *FT’ | Є
F (E) | id
For the string id + id * id using the productions, we can construct the tree as
SoCSE Page | 4
Principles of Compiler Design
It’s very common that we as human beings, being intelligent can pick up the right set of
productions and construct the tree. But how would the machine do it?
Our task is to develop an algorithm which will generate above tree by picking up the right
productions amongst the available.
SoCSE Page | 5
Principles of Compiler Design
Recursive Descent Parsing
One possible option is to go for recursive descent parsing.
Consider the grammar:
ScAd
Aab | a
Constructing a parse tree top down for w = cad
And in next step substitute for ‘A’
Above one fails. Go back to A and check if there is any alternative.
Halt and announce the successful completion of parsing.
Implementation
Requires backtracking (not very efficient)
Go for tabular methods such as dynamic programming algorithm
The procedure is non deterministic in nature
FIRST and FOLLOW
FIRST:
a. If X is a terminal, then FIRST(X) is {X}
b. If X Є then add Є to FIRST(X)
c. If XY1Y2Y3…Yk, then add FIRST(Y1) to FIRST(X). If FIRST(Y1) has Є then add FIRST(Y2) to
FIRST(X) and so on till FIRST(Y k). If Є is in FIRST(Yi) where i = 1, 2, … k then add Є to the
FIRST(X)
SoCSE Page | 6
Principles of Compiler Design
FOLLOW:
a. Place a $ in FOLLOW(S), where S is the start symbol
b. if A αBβ, then everything in FIRST(β) except Є is in FOLLOW(B)
c. if A αB or A αBβ where FIRST(β) contains Є, then everything in FOLLOW(A) is in
FOLLOW(B)
For the grammar given below FIRST and FOLLOW will be,
E TE’
E’+TE’ | Є
TFT’
T’*FT’ | Є
F(E) | id
FIRST(F) = { (, id } FOLLOW(E) = { ), $ }
FIRST(T) = { (, id } FOLLOW(E’) = { ), $ }
FIRST(E) = { (, id } FOLLOW(T) = { +, ), $ }
FIRST(E’) = { +, Є } FOLLOW(T’) = { +, ), $ }
FIRST(T’) = { *, Є } FOLLOW(F) = { *, +, ), $ }
LL(1) Grammars
Predictive parsers, that is recursive-descent parsers needing no backtracking can be
constructed for a class of grammars called LL(1).
LL(1) stands for
L – scan input from left to right
L – producing leftmost derivations
1 – one input symbol of lookahead at each step
A grammar G is LL(1) iff whenever A α | β are two distinct productions of G and following
conditions hold:
For no terminal ‘a’ do both α and β derive strings beginning with ‘a’
At most one of α and β can derive the empty string
If β derives Є in zero or more steps, then α does not derive any string beginning with
a terminal in FOLLOW(A), similarly, if α derives Є in zero or more steps, then β does
not derive any string beginning with a terminal in FOLLOW(A)
Note: No left recursive or ambiguous grammars can be LL(1)
SoCSE Page | 7
Principles of Compiler Design
Predictive Parsing Table
Using all above information we can design a better algorithm by constructing a Predictive
parsing table.
ALGORITHM Construct_Predictive_Parsing_Table
INPUT: Grammar G
OUTPUT: Parsing Table M
For each production A α of the grammar, do the following
1. For each terminal ‘a’ in FIRST(A), add A α to M[A, a]
2. If Є is in FIRST(α), then for each terminal ‘b’ in FOLLOW(A), add A α to M[A, b]. If Є
is in FIRST(α) and $ in FOLLOW(A), add A α to M[A, $] as well
Constructing a Predictive Parsing table for the considered grammar:
Non Input
Terminal id + * ( ) $
E ETE’ ETE’
E’ E’+TE’ E’ Є E’ Є
T TFT’ TFT’
T’ T’Є T’*FT’ T’ Є T’ Є
F Fid F(E)
Conclusions:
Each parsing table entry uniquely identifies a production or signals an error
If G is left recursive or ambiguous then it will have one multiple defined entry
There are some grammars for which no amount of alteration will produce LL(1)
grammar
Example 02: Construct a Predictive Parsing Table for the given grammar
SiEtS|iETSeS|a
E b
Left factoring it,
S i E t S S’ | a
S’ e S | Є
E b
SoCSE Page | 8
Principles of Compiler Design
There is no left recursion in the grammar. We would have eliminated it if present.
Write the FIRST and FOLLOW
FIRST(S) = { i, a } FOLLOW(S) = { $, e }
FIRST(S’) = { e, Є } FOLLOW(S’) = { $, e }
FIRST(E) = { b } FOLLOW(E) = { t }
Build a Predictive Parsing Table:
Non Input
Terminal a b e i t $
S Sa S i E t S S’
S’ S’ Є S’ Є
S’ e S
E E b
Conclusion:
This grammar is dangling because there is a multiple entry defined for S’ and input symbol e.
Now using all above information we parse the given strings to check for acceptance or
rejection. We call the algorithm as “Non recursive Predictive Parsing.”
Non- Recursive Predictive Parsing
Maintains a stack explicitly rather than implicitly via recursive calls
Mimics leftmost derivation
We define configuration as stack content and the remaining input
If w is the input matched so far then stack holds a sequence of grammar symbols α
such that S derives wα in zero or more steps using left most derivation
Consider the same grammar:
E TE’
E’+TE’ | Є
TFT’
T’*FT’ | Є
F(E) | id
SoCSE Page | 9
Principles of Compiler Design
And string id + id * id and Tracing for Non recursive predictive parsing, we have
Matched Stack Input Action
E$ id + id * id$
TE’$ id + id * id$ Output ETE’
FT’E’$ id + id * id$ Output TFT’
id T’E’$ id + id * id$ Output Fid
Id T’E’$ + id * id$ Match id
Id E’$ + id * id$ Output T’Є
Id +TE’$ + id * id$ Output E’+TE’
id + TE’$ id * id$ Match +
id + FTE’$ id * id$ Output TFT’
id + idTE’$ id * id$ Output Fid
id + id T’E’$ * id$ Match id
id + id *FT’E’$ * id$ Output T’*FT’
id + id * FT’E’$ id$ Match *
id + id * idT’E’$ id$ Output Fid
id + id * id T’E’$ $ Match id
id + id * id E’$ $ Output T’Є
id + id * id $ $ Output E’Є
Error recovery in Top Down Parsing
Error Recovery Strategies
Panic mode: Discard input until a token in a set of designated synchronizing tokens is
found
Phrase-level recovery: Perform local correction on the input to repair the error
Error productions: Augment grammar with productions for erroneous constructs
Global correction: Choose a minimal sequence of changes to obtain a global least-
cost correction
SoCSE Page | 10
Principles of Compiler Design
Error Recovery in LL Parsing
Simple option: When see an error, print a message and halt
“Real” error recovery
o Insert “expected” token and continue – can have a problem with termination
o Deleting tokens – for an error for non-terminal F, keep deleting tokens until
see a token in follow(F).
o For example:
E() {
if (lookahead in {(,id} ) { T(); E_prime(); } E T E’
else { printf(“E expecting ( or identifier”); Follow(E) = $ )
while (lookahead != ) or $) lookahead = nextToken();
}
}
An error is detected whenever an empty table slot is encountered.
We would like our parser to be able to recover from an error and continue parsing.
Phase-level recovery
o We associate each empty slot with an error handling procedure.
Panic mode recovery
o Modify the stack and/or the input string to try and reach state from which we
can continue.
Panic mode recovery
Idea:
o Decide on a set of synchronizing tokens.
o When an error is found and there's a nonterminal at the top of the stack,
discard input tokens until a synchronizing token is found.
o Synchronizing tokens are chosen so that the parser can recover quickly after
one is found
e.g. a semicolon when parsing statements.
o If there is a terminal at the top of the stack, we could try popping it to see
whether we can continue.
Assume that the input string is actually missing that terminal.
SoCSE Page | 11
Principles of Compiler Design
Possible synchronizing tokens for a nonterminal A
o the tokens in FOLLOW(A)
When one is found, pop A of the stack and try to continue
o the tokens in FIRST(A)
When one is found, match it and try to continue
o tokens such as semicolons that terminate statements
~*~*~*~*~*~*~*~*~*~*~*~
SoCSE Page | 12