Parsing
Parsing
Ref : Ch.4 Compilers Principles, Techniques, and Tools by Alfred Aho, Ravi Sethi, and Jeffrey Ullman
1
Glimpse
• Introduction to Parsing
• Role of the Syntax Analyzer (Parser)
• Components of Syntax Analysis
• Grammar
• Ambiguous Grammar
• Type of Parsing
• Top Down Parsing
• Predictive parsing
• Recursive descent parser
• Parse Table Generation
• Error Recovery
Prof Monika Shah (Nirma University)
2
Position of a Parser in the Compiler Model
sentence
article noun
“I gave him the book”
the book
Prof Monika Shah (Nirma University)
Introduction to Parser (validate syntax)
Source Token, SYNTAX
Lexical tokenval Parser Tree
Program Analyzer (Syntax Analyzer)
character stream Get next
token
int main(){ DT ID(){
int a,b; error DT ID, ID; error
total = a + b ID = ID + ID SYNTAX ERROR:
per = total |3.0; ID = ID | FNUM ; ||Semicolon missing
} } at line 3
if ( b == 0 ) a = b ;
IF ( ID == NUM ) ID = ID ;
if
abstract syntax tree
== = or parse tree
b 0 a b If no syntax Error
Prof Monika Shah (Nirma University)
Parse Tree vs Abstract Syntax Tree
S Parse tree also called “concrete syntax”
E + S
( S ) E +
E + S 5
+ 5
1 E + S 1 +
2 E 2 +
( S ) 3 4
AST discards (abstracts) unneeded
E + S
information – more compact format
3 E 4 Prof Monika Shah (Nirma University)
Role of Syntax Analyzer
article noun
“I gave him the book”
the book
Prof Monika Shah (Nirma University)
Chomsky Hierarchy: Language
Classification
L(regular) L(context free) L(context sensitive) L(unrestricted)
• A grammar G is said to be
• Regular if it is right linear where each production is of the form
A → w B or A → w
or left linear where each production is of the form
A → B w or A → w
• Context free if each production is of the form
A→
where A N and (NT)*
• Context sensitive if each production is of the form
A→
where A N, ,, (NT)*, || > 0
• Unrestricted Prof Monika Shah (Nirma University)
12
Grammars (Recap)
ST : IFST
|E
;
IFST : IF ( E ) ST
| IF ( E ) ST ELSE ST
;
*
+
+ 3
1 *
2 3 1 2
T T T T T T
+ id + id + id
Prof Monika Shah (Nirma University) 30
Problem with Top-Down Parsing
Want to decide which production to apply based
on next symbol
S→E+S|E
E → num | (S)
input a + b $
stack
Predictive parsing
X output
program (driver)
Y
Z Parsing table
Prof Monika Shah (Nirma University) $ M 34
Table-Driven Predictive Parsing Stack Input Que Production applied
$E id+id*id$ E → T ER
$ERT id+id*id$ T → F TR
id + * ( ) $ id+id*id$ F → id
$ERTRF
$ERTRid id+id*id$
E TER T ER
$ERTR +id*id$ TR →
$ER +id*id$ ER → + T ER
$ERT+ +id*id$
ER
+TER $ERT id*id$ T → F TR
$ERTRF id*id$ F → id
T FTR FTR $ERTRid id*id$
$ERTR *id$ TR → * F TR
TR *FTR $ERTRF* *id$
id$ F → id
$ERTRF
$ERTRid id$
F Id (E) $ TR →
$ERTR
$ER $ ER →
Prof Monika Shah (Nirma University) $ $ 35
Table-Driven Stack Input Que Production applied
Predictive Parsing $E Id Id $
$
id + * ( ) $
E TER T ER
ER +TE
R
T FTR FTR
TR *FTR
F Id (E)
num + ( ) $
S → ES’ → ES’
S’ → +S → →
E → num → (S)
Prof Monika Shah (Nirma University)
How to Implement This?
• Table can be converted easily into a recursive descent parser
• 3 procedures: parse_S(), parse_S’(), and parse_E()
• start parsing with start non-terminal
int main()
{ token = input.read();
parse_S( ) ; }
num + ( ) $
S → ES’ → ES’
S’ → +S → →
E → num → (S)
Prof Monika Shah (Nirma University)
Recursive-Descent Parser
lookahead token
void parse_S() {
switch (token) {
case num: parse_E(); parse_S’(); return;
case ‘(‘: parse_E(); parse_S’(); return;
default: ParseError();
}
}
num + ( ) $
S → ES’ → ES’
S’ → +S → →
E → num → (S)
Prof Monika Shah (Nirma University)
Recursive-Descent Parser cont…
void parse_S’() {
switch (token) {
case ‘+’: token = input.read(); parse_S(); return;
case ‘)‘: return;
case EOF: return;
default: ParseError();
}
}
num + ( ) $
S → ES’ → ES’
S’ → +S → →
E → num → (S)
Prof Monika Shah (Nirma University)
Recursive-Descent Parser cont…
void parse_E() {
switch (token) {
case number: token = input.read(); return;
case ‘(‘: token = input.read(); parse_S();
if (token != ‘)’) ParseError();
token = input.read(); return;
default: ParseError();
}
}
num + ( ) $
S → ES’ → ES’
S’ → +S → →
E → num → (S)
Prof Monika Shah (Nirma University)
Predictive Parser
• S : iStS | iStSeS
• After Left factoring
• S : iStSS’
• S’ : null | eS
59
Panic Mode Recovery
Add synchronizing actions to FOLLOW(E) = { ) $ }
undefined entries based on FOLLOW FOLLOW(ER) = { ) $ }
FOLLOW(T) = { + ) $ }
Pro: Can be automated FOLLOW(TR) = { + ) $ }
Cons: Error messages are needed FOLLOW(F) = { + * ) $ }
id + * ( ) $
E E → T ER E → T ER synch synch
ER ER → + T ER ER → ER →
T T → F TR synch T → F TR synch synch
TR TR → TR → * F TR TR → TR →
F F → id synch synch F→(E) synch synch
60
synch: the driver pops current nonterminal A and skips input till
synch token or skips input until one of FIRST(A) is found
Panic mode Error Recovery
1. If the parser looks up entry M[A,a] and finds that it
is blank, the input symbol a is skipped.
2. If the entry is synch, the nonterminal on top of the
stack is popped., skip token until sync or First(A)
3. If a token on top of the stack does not match the
input symbol, then we pop the token from the stack.
s = synch
id + * ( ) $
E TER T ER Synch Synch
ER +TER
T FTR Synch FTR Synch Synch
TR *FTR
F Id Synch Synch (E) Synch Synch
Stack Input Remark
Example 2 : Panic mode recovery
$E ( Id ( + $
$ERT ( Id ( + $
1. If M[A,a]=blank, skip the input $ERTRF ( Id ( + $
symbol $ERTR)E( ( Id ( + $ Match (
2. If M[A,a]= synch, Pop the Non- $ERTR)E Id ( + $
terminal $ERTR)ERT Id ( + $
$ERTR)ERTRF Id ( + $
3. If top(stack) =terminal and
$ERTR)ERTRId Id ( +$ Match Id
input!=top(stack), pop the token
$ERTR)ERTR (+$ M[TR,(]=blank.➔ Recovery :Skip (.
Then s = synch Error: Unexpected (
$ERTR)ERTR +$
id + * ( ) $
$ERTR)ER +$
E TER skip skip T ER Synch Synch
$ERTR)ERT+ +$ Match +
ER skip +TER skip skip
$ERTR)ERT $ M[TR,$]=sync➔Recovery :pop T.
T FTR Synch skip FTR Synch Synch
Error: missing Id
TR skip *FTR skip $ERTR)ER $
F Id Synch Synch (E) Synch Synch
$ERTR) $ Top(stack)=‘)’ != $➔Recovery :pop ).
Error: missing )
$ERTR $
Phrase-Level Recovery
Change input stream by inserting missing tokens
For example: id id is changed into id * id
Pro: Can be automated
Cons: Recovery not always intuitive
Can then continue here
id + * ( ) $
E E → T ER E → T ER synch synch
ER ER → + T ER ER → ER →
T T → F TR synch T → F TR synch synch
TR insert * TR → TR → * F TR TR → TR →
F F → id synch synch F→(E) synch synch
64
insert *: driver inserts missing * and retries the production
Example : Phrase level error recovery
Stack Input Remark
$E ( Id Id $
$ERT ( Id Id $
$ERTRF ( Id Id $
$ERTR)E( ( Id Id $ Match (
$ERTR)E Id Id $
$ERTR)ERT Id Id $
$ERTR)ERTRF Id Id $
$ERTR) ERTRId Id Id $ Match Id
id + * ( ) $
$ERTR) ERTR Id $ M[TR,Id]=Insert *.
E TER T ER Synch Synch Recovery Insert *.
Error: Missing *
ER +TER
$ERTR) ERTR * Id $
T FTR Synch FTR Synch Synch
$ERTR) ERTR F * * Id $ Match *
Insert
TR * *FTR $ERTR) TR F Id $
$ERTR) TR Id Id $ Match Id
F Id Synch Synch (E) Synch Synch
…
Error Productions
E → T ER Add “error production”:
ER → + T E R | TR → F T R
T → F TR to ignore missing *, e.g.: id id
TR → * F TR | Pro: Powerful recovery method
F → ( E ) | id Cons: Cannot be automated
id + * ( ) $
E E → T ER E → T ER synch synch
ER ER → + T ER ER → ER →
T T → F TR synch T → F TR synch synch
TR T R → F TR TR → TR → * F TR TR → TR →
66
F F → id synch synch F→(E) synch synch
Example : error recovery using Error productions
Stack Input Remark
$E ( Id Id $
$ERT ( Id Id $
$ERTRF ( Id Id $
$ERTR)E( ( Id Id $ Match (
$ERTR)E Id Id $
$ERTR)ERTRF Id Id $
$ERTR) ERTRId Id Id $ Match Id
$ERTR) ERTR Id $ Error: Missing *
id + * ( ) $ $ERTR) ERTR F Id $
E TER T ER Synch Synch $ERTR) ERTR Id Id $ Match Id
ER +TER
T FTR Synch FTR Synch Synch
TR FTR *FTR
F Id Synch Synch (E) Synch Synch
Self Evaluation
• Trace top-down parsing with different error recovery for following strings
1. ) id ( id + * id $
2. ( id + ) id $