Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
10 views59 pages

Lecture 05

The document discusses top-down parsing techniques, specifically recursive descent and predictive parsing (LL(1)). It covers the construction of parse trees, issues such as ambiguity and left recursion, and methods for eliminating left recursion. Additionally, it explains the computation of First and Follow sets, and the construction of parsing tables for predictive parsing.

Uploaded by

nihafahima9
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views59 pages

Lecture 05

The document discusses top-down parsing techniques, specifically recursive descent and predictive parsing (LL(1)). It covers the construction of parse trees, issues such as ambiguity and left recursion, and methods for eliminating left recursion. Additionally, it explains the computation of First and Follow sets, and the construction of parsing tables for predictive parsing.

Uploaded by

nihafahima9
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 59

Top-Down Parsing

Recursive parsing
Predictive Parsing
LL(1) Parsing

1
Review of Parsing

• Given a language L(G), a parser consumes a


sequence of tokens s and produces a parse tree
• Issues:
– How do we recognize that s ∈ L(G) ?
– A parse tree of s describes how s ∈ L(G)
– Ambiguity: more than one parse tree (possible
interpretation) for some string s
– Error: no parse tree for some string s
– How do we construct the parse tree?

2
3
Second-Half of Lecture 5: Outline

• Implementation of parsers
• Two approaches
– Top-down
– Bottom-up
• Today: Top-Down
– Easier to understand and program manually
• Then: Bottom-Up
– More powerful and used by most parser generators

4
Top-down Parsing

•Top-Down Parsing starts from the root of


the parse tree and builds it by recursively
expanding the production rules to match the
input string.
•Tries to find a leftmost derivation for a
given input string.
•To match the input string starting from the
start symbol.

5
Top-down Parsing

S->aAb A->cd|c
input string ‘acb’

6
Methods of Top-Down Parsing

•Recursive Descent Parsing:


• Utilizes recursive procedures for each
non-terminal in the grammar.
•Non-Recursive Predictive Parsing (LL Parsing):
• Uses a parsing table to predict which production
rule to use.

7
Recursive Descent Parsing:

•Top-down parsing technique: Begins parsing from the


start symbol and attempts to match input with grammar
rules.
•Recursive functions: Each non-terminal in the grammar
has a corresponding recursive function.
•Leftmost derivation: Constructs the leftmost derivation
of the input string.
•Backtracking: May require backtracking if a production
rule fails.
•Simple but inefficient: Easy to implement but struggles
with left-recursive grammars and may be slow due to
backtracking.

8
Introduction to Top-Down Parsing

• Terminals are seen in order of 1


appearance in the token
stream: t2 t9
3
t t t6 t8 t9
2 5 4 7

• The parse tree is constructed


t5 t6 t8
– From the top
– From left to right

9
Recursive Descent Parsing

• Consider the grammar


E→T+E | T
T → int | int * T | (E)
• Token stream is:int5 * int2
• Start with top-level non-terminal E

• Try the rules for E in order

10
Recursive Descent Parsing
■ Consider the grammar
E→T+E|T
T → int | int * T | ( E )
■ Token stream is: int5 * int2
■ Start with top-level non-terminal E

■ Try the rules for E in order

11
Recursive Descent Parsing -
Example

■ Try E0 → T1 + E2
■ Then try a rule for T → ( E )
1 3
■ But ( does not match input token int5
■ Try T1 → int - Token matches.
■ But + after T1 does not match input token *
■ Try T1 → int * T2
■ This will match but + after T1 will be unmatched
■ Has exhausted the choices for T1
■ Backtrack to choice for E0

12
Left Recursion

A Grammar G (V, T, P, S) is left recursive if it has a


production in the form.

A → A α |β. may loop forever

A is a non-terminal.
α and β are sequences of
terminals/non-terminals.

The above Grammar is left recursive because the left of


production is occurring at a first position on the right side
of production. It can eliminate left recursion by replacing
a pair of production with

The String generate string βα*


13
Consider the grammar:

E→E+T|T
T→T*F|F
F → (E) | id
Here, E → E + T is left-recursive.

Problems with Left Recursion:


•Infinite Recursion: Top-down parsers will repeatedly call the same
production rule, leading to an infinite loop.
•Parsing Difficulty: It is difficult to handle left recursion in recursive
descent parsers.

14
Elimination of Left Recursion

15
Elimination of Left Recursion

• Consider the left-recursive grammar


S→Sα|β
• S generates all strings starting with a β and
followed by any number of α’s

• The grammar can be rewritten using right-


recursion
S → β S’
S’ → α S’ | ε

16
More Elimination of Left-Recursion

• In general
S → S α1 | … | S αn | β1 | … | βm
• All strings derived from S start with one of
β1,…,βm and continue with several instances of
α1,…,αn
• Rewrite as
S → β1 S’ | … | βm S’
S’ → α1 S’ | … | αn S’ | ε

17
Example1 −

In this grammar, the non-terminal E refers to itself as the leftmost


symbol in its own production rule, making it left-recursive.
Derivation Example
For an input string id + id * id, the derivation might look like
this:

18
The general form for left recursion is
A → Aα1|Aα2| … . |Aαm|β1|β2| … . . βn
can be replaced by
A → β1A′|β2A′| … . . | … . . |βnA′
A → α1A′|α2A′| … . . |αmA′|ε

Final Grammar
After eliminating immediate left
Left Recursive Grammar recursion, the grammar is:
E → E + T|T E → T E‘
T → T * F|F E' → + T E' | ε
F → (E)|id T → F T‘
T' → * F T' | ε
F → (E) | id

19
A grammar is right recursive if the recursion occurs on the
right side of the production rule. In this case, the non-terminal
being defined appears at the rightmost end of the production.

A → αA | β

•A is a non-terminal.
•α is a string of terminals and non-terminals.
•β is a production not containing A.

20
Example
S → aS | b

The parse tree for the right-recursive derivation aaab is


constructed by expanding the right-most non-terminal first, at
each level.

S → aS
→ aaS
→ aaaS
→ aaab

21
Consider the following right recursive grammar
for an input string id + id * id,
the derivation might look like this:

22
First() Set
The First() set of a non-terminal in a grammar represents the set of terminal
symbols that can appear at the beginning of any string derived from that
non-terminal. In simpler terms, it tells you which terminals could possibly
appear first when expanding a non-terminal.
Rules to Compute First():
•If X is a terminal:
•First(X) = { X }
•If X is a non-terminal and there is a production rule X → Y₁Y₂...Yn:
•Add First(Y₁) (excluding ε) to First(X).
•If Y₁ can derive ε (i.e., Y₁ → ε or Y₁ is nullable), then check First(Y₂), and
so on. If all Y₁, Y₂, ... Yn can derive ε, then add ε to First(X).
•If X can derive ε directly or indirectly (X → ε):
•Add ε to First(X).

23
For the grammar:

S → AB
A→a|ε
B→b|ε

First(S) = First(A)
First(A) = { a, ε }
First(B) = { b, ε }
First(S) = First(A) (because S → AB and A appears
first).
Since A can derive ε, look at First(B).
So, First(S) = { a, b, ε }.

24
Production Rules of Grammar
S -> ACB | Cbb | Ba
A -> da | BC
B -> g | ε
C -> h | ε

FIRST sets
FIRST(S) = FIRST(ACB) U FIRST(Cbb) U FIRST(Ba)
= { d, g, h, b, a, ε}
FIRST(A) = { d } U FIRST(BC)
= { d, g, h, ε }
FIRST(B) = { g , ε }
FIRST(C) = { h , ε }

25
Follow() Set

The Follow() set of a non-terminal is the set of terminal symbols


that can appear immediately to the right of that non-terminal in
some derivation. In other words, it contains the terminals that
can appear after a non-terminal in a sentential form.

Steps to Compute Follow() Set:


1.Start Symbol Rule: Add $ (end of input marker) to Follow() of the
start symbol.
2.Rule Type 1 (A → αBβ): For a production of the form A → αBβ,
where B is a non-terminal and β is a sequence of symbols, everything
in the First() of β (except ε) is added to Follow(B).
3.Rule Type 2 (A → αB): If B is the last symbol in the production
(i.e., A → αB), then everything in Follow(A) is added to Follow(B).
4.Rule Type 3 (A → αBβ and ε in First(β)): If β can derive ε, then
everything in Follow(A) is added to Follow(B).

26
Example Follow()
Consider a simple grammar:
Follow() Sets:
1. S → A B
Follow(S) = { $ }
2. A → a A | ε
Follow(A) = { b }
3. B → b B | ε
Follow(B) = { $ }
Step-by-Step Computation of Follow() Set:
1.Follow(S): The start symbol S gets $ by definition.
Follow(S) = { $ }
2.Follow(A):
•From S → A B, the symbol B follows A. So, First(B) is added to
Follow(A).
•First(B) contains { b, ε }, but we exclude ε. So, Follow(A) = { b }.
3.Follow(B):
•From S → A B, since B is the last symbol, Follow(S) is added to
Follow(B).
•Therefore, Follow(B) = { $ }.
4.Check for ε Productions:
•A → ε and B → ε do not affect Follow() sets as they do not produce any
new terminal.
27
For the grammar:

S → AB
A→a|ε
B→b|ε
•Follow(S) = { $ } (since S is the start symbol).
•From S → AB, add Follow(S) to Follow(B), so Follow(B) = { $ }.
•From S → AB, Follow(A) includes First(B) (excluding ε), so Follow(A) = { b
•Since A → ε, we also add Follow(S) to Follow(A), so Follow(A) = { b, $ }.
Thus:
•Follow(S) = { $ }
•Follow(A) = { b, $ }
•Follow(B) = { $ }

28
Grammar: Follow() Sets:
1. S → A C B 2. A → a A | ε 3. B → b | ε 4. C → c Follow(S) = { $ }
Follow(A) = { c }
Follow(C) = { b, $ }
Step-by-Step Computation of Follow() Set:
Follow(B) = { $ }
1. Follow(S):
The start symbol S always has $ in its Follow set. Follow(S) = { $ }
2. Follow(A):
•From the production S → A C B, A is followed by C. So, we add First(C) to
Follow(A).
•First(C) contains { c } (since C → c), so: Follow(A) = { c }
3. Follow(C):
•From the production S → A C B, C is followed by B. So, we add First(B) to
Follow(C).
•First(B) contains { b, ε }. Since ε is in First(B), we also add Follow(S) to
Follow(C) (because if B produces ε, C would be followed by whatever follows
S).
•Thus: Follow(C) = { b, $ }
4. Follow(B):
•From the production S → A C B, B is the last symbol. So, Follow(S) is added 29
to Follow(B). Therefore: Follow(B) = { $ }
Production Rules:
S -> ACB|Cbb|Ba
A -> da|BC
B-> g|Є
C-> h| Є

FIRST set
FIRST(S) = FIRST(A) U FIRST(B) U FIRST(C) = { d, g, h, Є, b,
a}
FIRST(A) = { d } U {FIRST(B)-Є} U FIRST(C) = { d, g, h, Є }
FIRST(B) = { g, Є }
FIRST(C) = { h, Є }

FOLLOW Set
FOLLOW(S) = { $ }
FOLLOW(A) = { h, g, $ }
FOLLOW(B) = { a, $, h, g }
FOLLOW(C) = { b, g, $, h } 30
Parsing Table
parsing table is a critical component in predictive parsing (e.g., LL(1)
parsers). It helps determine which production rule to apply based on the
input symbol and current state.

Steps to Construct a Parsing Table:


For each production rule A → α in the grammar:
Compute the First(α) set for the production.
For each terminal symbol 'a' in First(α), add the production A →
α to the parsing table entry [A, 'a'].
If ε (epsilon) is in First(α):
Add the production A → α to the table for every terminal symbol
in Follow(A).
For each terminal 'b' in Follow(A), add A → α to [A, 'b'].
Error Entries:
If a cell in the parsing table remains empty, it is considered an
error entry, meaning the parser should report an error for that
combination of non-terminal and terminal.

31
First() and Follow() Sets
Example Grammar •First(E) = { (, id }
Consider the following grammar: •First(E') = { +, ε }
1.E→TE′ •First(T) = { (, id }
2.E′→+TE′∣ϵ •First(T') = { *, ε }
3.T→FT′ •First(F) = { (, id }
4.T′→∗FT′∣ϵ •Follow(E) = { ), $ }
5.F→(E)∣id •Follow(E') = { ), $ }
•Follow(T) = { +, ), $ }
•Follow(T') = { +, ), $ }
•Follow(F) = { *, +, ), $ }

32
Example of Parsing
For input: id + id
1.Stack: Start with [E, $] and input id + id $.
2.Step 1: Look at E on top of the stack and id in input. Use rule
E → TE' from the parsing table.
3.Step 2: Replace E with TE' on the stack. Now stack is [E', T,
$] and input is id + id $.
4.Step 3: Look at T on top of the stack and id in input. Use rule
T → FT' from the parsing table.
5.Step 4: Replace T with FT'. Now stack is [E', T', F, $] and
input is id + id $.
6.Step 5: Look at F on top of the stack and id in input. Use rule
F → id from the parsing table.
The process continues by applying rules until the stack and
input are both empt

33
Using First() and Follow() in Parsing
First() and Follow() sets are used in parsers like LL(1) to build a
parsing table.
For example, for the grammar:
The First() and Follow() sets are:
S → AB •First(S) = { a, b, ε }
A→a|ε •First(A) = { a, ε }
B→b|ε •First(B) = { b, ε }
•Follow(S) = { $ }
•Follow(A) = { b, $ }
•Follow(B) = { $ }

34
Example
The grammar is given below:
G --> SG'
G' --> +SG' | ε
S --> FS'
S' --> *FS' | ε
F --> id | (G)
First Follow
G --> SG' { id, ( } { $, ) }
G' --> +SG' | ε { +, ε } { $, ) }
S --> FS' { id, ( } { +, $, ) }
S' --> *FS' | ε { *, ε } { +, $, ) }
F --> id | (G) { id, ( } { *, +, $, ) }

id + * ( ) $
G G -> SG’ G -> SG’
G’ G’ -> +SG’ G’ -> ε G’ -> ε
S S -> FS’ S -> FS’
S’ S’ -> ε S’ -> *FS’ S’ -> ε S’ -> ε
F F -> id F -> (G) 35
Benefit of FIRST ( ) and FOLLOW ( )
∙ It can be used to prove the LL (K)
characteristic of grammar.
∙ It can be used to promote in the
construction of predictive parsing tables.
∙ It provides selection information for
recursive descent parsers.

36
LL(1) Languages

• In recursive-descent, for each non-terminal


and input token there may be a choice of
production
• LL(1) means that for each non-terminal and
token there is only one production
• Can be specified via 2D tables
– One dimension for current non-terminal to expand
– One dimension for next token
– A table entry contains one production

37
LL(1) Parsing

LL parser in compiler design is a predictive parser


implementation that uses an implicit stack and parsing table to
determine the production that should be used for a
non-terminal.

LL(1) Parsing is a top-down parsing technique.


•It stands for:
•L: Left-to-right scanning of the input.
•L: Leftmost derivation.
•1: Using one lookahead symbol.
•Key Points:
•Efficient for parsing deterministic context-free
grammars.
•Predictive and non-backtracking.

38
Steps in LL(1) Parsing
1.Compute First() and Follow() sets.
2.Construct the Parsing Table.
3.Parse the input string using the table
and the stack.

•Input: This contains a string that will be parsed with the end-marker $.

•Stack: A predictive parser sustains a stack. It is a collection of


grammar symbols with the dollar sign ($) at the bottom.

•Parsing table: M[A, S] is a two-dimensional array, where A is a


non-terminal, and S is a terminal. With the entries in this table, it
becomes effortless for the top-down parser to choose the production
to be applied.
39
Example Grammar:
S→AB
A→a|ε
B→b|ε
Step 1: First() and Follow() Sets
Calculation
First(S) = {a, ε}
First(A) = {a, ε}
First(B) = {b, ε}
Follow(S) = {$}
Follow(A) = {b, $}
Follow(B) = {$}

40
First Sets:
•First(S) = First(A B) = First(A) (because A is the first symbol of the right-hand side):
•First(A) = {a, ε}
•First(S) = {a, ε}
•First(A):
•First(A → a) = {a}
•First(A → ε) = {ε}
•First(A) = {a, ε}
•First(B):
•First(B → b) = {b}
•First(B → ε) = {ε}
•First(B) = {b, ε}

Follow Sets:
•Follow(S):
•Since S is the start symbol, Follow(S) = {$}.
•Follow(A):
•Since A is followed by B in the production S → A B, Follow(A) = First(B)
(excluding ε) ∪ Follow(S).
•Follow(A) = {b} ∪ {$} = {b, $}.
•Follow(B):
•Since B is the last symbol in the production S → A B, Follow(B) = Follow(S).
•Follow(B) = {$}. 41
Explanation Parsing Table:
•For S → A B:
•First(A B) contains a (from First(A)) and ε.
•Hence, in the parsing table, we place S → A B in the columns for a
and $ (since ε in A leads to B being the next symbol).
•For A → a:
•First(A) contains a, so A → a is placed in the column for a.
•For A → ε:
•Since ε is in First(A), place A → ε in columns b and $ (from
Follow(A)).
•For B → b:
•First(B) contains b, so B → b is placed in the column for b.
•For B → ε:
•Since ε is in First(B), place B → ε in the column $ (from Follow(B)).

42
Step 3: Parsing Process Using Example Input
Input String: ab
Stack Initialization:
•Stack: [$, S]
•Input: ab$

43
Explanation:
1.Step 1: The parser begins with S on the stack and looks at the
input a. According to the parsing table, S → A B is selected.
2.Step 2: Now, A B is on the stack. Looking at A and the input a, A →
a is selected.
3.Step 3: The parser matches the terminal a with the input a,
popping it from the stack and moving to the next symbol.
4.Step 4: Now, the stack has B, and the input symbol is b. According
to the parsing table, B → b is selected.
5.Step 5: The terminal b matches the input b.
6.Step 6: Finally, $ on the stack matches the $ in the input,
successfully parsing the string.

44
Example of LL(1)
Parser: Example 1
S ➔ aABb
A➔c|€
B➔d|€
Step: 1: No left recursion in the grammar, hence no
modification required.
Step 2: Calculation of First Set
First(S) = {a}
First(A) = {c, €}
First(B) = {d, €}

45
Example of LL(1) Parser
S ➔ aABb
A➔c|€
B➔d|€
Step 3: Calculation of Follow Set Follow(S) = {$}
Follow(A) = First(Bb) = First(B) = {d, €)
Since it contains €, continue FIRST rule. First(Bb) = First(B) -
€ U First (b) = {d, b} Follow(A) = {d,b}
Follow(B)= {b}

46
Example of LL(1) Parser
S ➔ aABb ➔ First(aABb)= a
A ➔ c | € ➔ First(c)=c and First(€) = € (Use follow) B
➔ d | € ➔ First(d)=d and First(€) = € (Use follow) Step
4: Parsing Table
Grammar is LL(1)
Example String: acdb$ H/w string: adb$

a b c d $
S S➔aABb
A A➔€ A➔c A➔€
B B➔€ B➔d

47
Example of LL(1) Parser
Example String: acdb$
a b c d $
S S➔aAB
b
A A➔€ A➔c A➔€
B B➔€ B➔d
Stack Input Action

$ acdb$ Push S into Stack


$S acdb$ S➔aABb
$bBAa acdb$ Pop a
$bBA cdb$ A➔c
$bBc cdb$ Pop c
$bB db$ B➔d
$bd db$ Pop d
$b b$ Pop b
$ $ Accept
48
Example of LL(1) Parser: Example 2
S ➔ AaAb | BbBa
A➔€
B➔€
Step: 1: No left recursion in the grammar, hence no modification
required.
Step 2: Calculation of First Set
First(S) = First(AaAb) U
First(BbBa) First(AaAb) =
First(A) = €
Since it contains €, continue FIRST rule
First(AaAb) = First(A) - € U First(aAb)
= {a} Similarly: First(BbBa) = {b}
First(S) = {a,b}

49
Example of LL(1) Parser: Example 2
S ➔ AaAb | BbBa A ➔ €
B➔€
Step 3: Calculation of Follow Set Follow(S) = {$}

Follow(A) = First(aAb) Follow(A) = First(b) = {b}


= a Follow(A) = {a,b}
Similarly Follow(B) =
{a,b}

50
Example of LL(1) Parser: Example 2
S ➔ AaAb |
BbBa A ➔ €
B➔€
Step 4:Construction of Parsing Table: Grammar
is LL(1)

a b $
S S➔AaAb S➔BbBa
A A➔€ A➔€
B B➔€ B➔€

51
Example of LL(1) Parser: Example 3
S ➔ AB |eDa
A ➔ ab |c
B ➔ dC
C ➔ eC | €
D ➔ fD | €

Step: 1: No left recursion in the grammar, hence no modification


required.
Step 2: Calculation of First Set
First(S) = {a,c,e}
First(A) = {a,c}
First(B) = {d}
First(C) = {e, €}
First(D) = {f, €}

52
Example of LL(1) Parser: Example 3
S ➔ AB |eDa S ➔ eDA
A ➔ ab |c Follow(D) = First(a) = {a} B ➔ dC
B ➔ dC Follow(C) = Follow(B) = {$} C ➔ eC

C ➔ eC | € Follow(C) = Follow(C) = {$} D ➔ fD


Follow(D) = Follow(D) = {a}
D ➔ fD | €

Step 3: Calculation of Follow Set


Follow(S) = {$}
S ➔ AB
Follow(A) = First(B) = {d}

Follow(B) = Follow(S) = {$}

53
Example of LL(1) Parser: Example 3
S ➔ AB |eDa
A ➔ ab |c
B ➔ dC
C ➔ eC | €
D ➔ fD | €
Construction of Parsing Table:
a b c d e f $
S S➔AB S➔AB S➔eD
a
A A➔ab A➔c
B B➔dC
C C➔eC C➔€
D D➔€ D➔fD

54
LL(1) Parsing Table Example

• Left-factored grammar
E→TX X→+E |
T → ( E ) | int Y ε
Y→*T |
ε
• The LL(1) parsing table:

int * + ( ) $

E TX TX

X +E ε ε

T int Y (E)

Y *T ε ε ε

5
Using Parsing Tables

• Method similar to recursive descent, except


– For each non-terminal S
– We look at the next token a
– And chose the production shown at [S,a]
• We use a stack to keep track of pending non-
terminals
• We reject when we encounter an error state
• We accept when we encounter end-of-input

56
LL(1) Parsing Example
Stack Input Action
E$ int * int $ TX
TX$ int * int $ int Y
int Y X $ int * int $ terminal
YX$ * int $ *T
* int
*TX$ terminal
$ int
TX$ int Y
$ int
int Y X $ terminal
$
YX$ ε
$ ACCEPT
X$ ε int * + ( ) $
$ E TX TX
$
$ X +E ε ε
T int Y (E)
Y *T ε ε ε57
4
Constructing Parsing Tables

• LL(1) languages are those defined by a parsing


table for the LL(1) algorithm
• No table entry can be multiply defined

• We want to generate parsing tables from CFG

58
Constructing Parsing Tables (Cont.)

• If A → α, where in the line of A we place α ?


• In the column of t where t can start a string
derived from α
– α →* t β
– We say that t ∈ First(α)
• In the column of t if α is ε and t can follow an
A
– S →* β A t δ
– We say t ∈ Follow(A)

59

You might also like