Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
25 views43 pages

04 Syntax Analysis

This document discusses syntax analysis in compiler construction, focusing on the role of parsers in analyzing source programs based on context-free grammar (CFG). It categorizes parsers into top-down and bottom-up types, explains derivations and parse trees, and addresses issues like ambiguity and left recursion in grammars. Additionally, it covers predictive parsing techniques, including recursive and non-recursive methods, and the importance of eliminating left recursion and left-factoring for effective parsing.

Uploaded by

hidmid6
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views43 pages

04 Syntax Analysis

This document discusses syntax analysis in compiler construction, focusing on the role of parsers in analyzing source programs based on context-free grammar (CFG). It categorizes parsers into top-down and bottom-up types, explains derivations and parse trees, and addresses issues like ambiguity and left recursion in grammars. Additionally, it covers predictive parsing techniques, including recursive and non-recursive methods, and the importance of eliminating left recursion and left-factoring for effective parsing.

Uploaded by

hidmid6
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 43

Compiler Construction Syntax Analysis

Syntax Analysis
Reading: Chapter 4

Deals with techniques for specifying and


implementing parser
Compiler Construction Syntax Analysis

Parser
Source Lexical
Program Parser
Analyzer Get next
token

error error

Symbol Table

Syntax analyzer is also called the parser. Its job is to analyze


the source program based on the definition of its syntax. It
works in lock-step with the lexical analyzer and is responsible
for creating a parse-tree of the source code.
By Bishnu Gautam 2
Compiler Construction Syntax Analysis

Parser
A parser implements a Context-Free Grammar
The parser checks whether a given source program satisfies the rules
implied by a context-free grammar or not.
If it satisfies, the parser creates the parse tree of that program.
Otherwise the parser gives the error messages.

A context-free grammar
– gives a precise syntactic specification of a programming language.
– the design of the grammar is an initial phase of the design of a compiler.
– a grammar can be directly converted into a parser by some tools.

By Bishnu Gautam 3
Compiler Construction Syntax Analysis

Parser
We categorize the parsers into two groups:
Top-Down Parser
the parse tree is created top to bottom, starting from the root.
Bottom-Up Parser
the parse is created bottom to top; starting from the leaves

Both top-down and bottom-up parsers scan the input from left to
right (one symbol at a time).
Efficient top-down and bottom-up parsers can be implemented only
for sub-classes of context-free grammars.
LL for top-down parsing
LR for bottom-up parsing

By Bishnu Gautam 4
Compiler Construction Syntax Analysis

Context-Free Grammars (Recap)


Programming languages usually have recursive structures that can be
defined by a context-free grammar (CFG).
CFGs are made of definitions of the form:
if S1 and S2 are statements and E is an expression, then
if E then S1 else S2 is a statement
Context-free grammar is a 4-tuple G = (N, T, P, S) where
• T is a finite set of tokens (terminal symbols)
• N is a finite set of nonterminals
• P is a finite set of productions of the form
α→β
where α ∈ (N∪T)* N (N∪T)* and β ∈ (N∪T)*
• S ∈ N is a designated start symbol

By Bishnu Gautam 5
Compiler Construction Syntax Analysis

CFG: Notational Conventions


Terminals are denoted by lower-case letters and symbols (single atoms) and
bold strings (tokens)
a,b,c,… ∈ T
specific terminals: 0, 1, id, +
Non-terminals are denoted by lower-case italicized letters or upper-case
letters symbols
A,B,C,… ∈ N
specific nonterminals: expr, term, stmt
Production rules are of the form A → α, that is read as “A can produce α”
Strings comprising of both terminals and non-terminals are denoted by greek
letters (α, β etc.)

By Bishnu Gautam 6
Compiler Construction Syntax Analysis

CFG: Derivations
E ⇒ E+E means E+E derives from E
» we can replace E by E+E
» to able to do this, we have to have a production rule E→E+E in
our grammar.
E ⇒ E+E ⇒ id+E ⇒ id+id
A sequence of replacements of non-terminal symbols is called a derivation of id+id
from E.
In general a derivation step is
αAβ ⇒ αγβ if there is a production rule A→γ in our grammar, where α
and β are arbitrary strings of terminal and non-terminal symbols
α1 ⇒ α2 ⇒ ... ⇒ αn (αn derives from α1 or α1 derives αn )
⇒ : derives in one step

* : derives in zero or more steps

+ : derives in one or more steps

By Bishnu Gautam 7
Compiler Construction Syntax Analysis

CFG: Derivations
L(G) is the language of G (the language generated by G) which is a
set of sentences.
A sentence of L(G) is a string of terminal symbols of G.
If S is the start symbol of G then
ω is a sentence of L(G) iff S ⇒ ω where ω is a string of terminals of G.

If G is a context-free grammar, L(G) is a context-free language.


Two grammars are equivalent if they produce the same language.
S⇒α - If α contains non-terminals, it is called as a sentential form of G.
- If α does not contain non-terminals, it is called as a sentence of G.

By Bishnu Gautam 8
Compiler Construction Syntax Analysis

CFG: Derivations
E ⇒ -E ⇒ -(E) ⇒ -(E+E) ⇒ -(id+E) ⇒ -(id+id)
OR
E ⇒ -E ⇒ -(E) ⇒ -(E+E) ⇒ -(E+id) ⇒ -(id+id)
At each derivation step, we can choose any of the non-terminal in the
sentential form of G for the replacement.
If we always choose the left-most non-terminal in each derivation step, this
derivation is called as left-most derivation.
E⇒ lm
-E ⇒
lm
⇒ -(E+E) lm
-(E) lm ⇒ -(id+E) lm
⇒ -(id+id)
If we always choose the right-most non-terminal in each derivation step, this
derivation is called as right-most derivation.
E⇒
rm
-E ⇒
rm
-(E) ⇒
rm
-(E+E) ⇒
rm
-(E+id) ⇒
rm
-(id+id)
By Bishnu Gautam 9
Compiler Construction Syntax Analysis

CFG: Derivations Example


Grammar G = ({E}, {+,*,(,),-,id}, P, E) with
productions P = E→E+E
E→E*E
E→(E)
E→-E
E → id
Example derivations:
E ⇒ - E ⇒ - id
E ⇒rm E + E ⇒rm E + id ⇒rm id + id
E ⇒* E
E ⇒* id + id
E ⇒+ id * id + id
By Bishnu Gautam 10
Compiler Construction Syntax Analysis

Parse Trees
A Parse-tree is a graphical representation of a CFG derivation.
Inner nodes of a parse tree are non-terminal symbols
The leaves of a parse tree are terminal symbols.

E ⇒ -E E
⇒ -(E) E
⇒ -(E+E)
E
- E - E - E

( E ) ( E )

E E E + E
- E - E
⇒ -(id+E) ⇒ -(id+id)
( E ) ( E )

E + E E + E

id id id
By Bishnu Gautam 11
Compiler Construction Syntax Analysis

Ambiguity
A grammar produces more than one parse tree for a sentence is called
as an ambiguous grammar.
E
E ⇒ E+E ⇒ id+E ⇒ id+E*E E + E
⇒ id+id*E ⇒ id+id*id
id E * E

id id

E
E ⇒ E*E ⇒ E+E*E ⇒ id+E*E
⇒ id+id*E ⇒ id+id*id E * E

E + E id

By Bishnu Gautam id id 12
Compiler Construction Syntax Analysis

Parsing
Given a stream of input tokens, parsing involves the process of
“reducing” them to a non-terminal. The input string is said to
represent the non-terminal it was reduced to.
Parsing can be either top-down or bottom-up.
Top-down parsing involves generating the string starting from the
first non-terminal and repeatedly applying production rules.
Bottom-up parsing involves repeatedly rewriting the input string
until it ends up in the first non-terminal of the grammar.

By Bishnu Gautam 13
Compiler Construction Syntax Analysis

Top-Down Parsing
The parse tree is created top to bottom.
Top-down parser
– Recursive-Descent Parsing
• Backtracking is needed (If a choice of a production rule does not work, we
backtrack to try other alternatives.)
• It is a general parsing technique, but not widely used.
• Not efficient
– Predictive Parsing
• no backtracking
• Efficient
• Use LL (Left-to-right, Leftmost derivation) methods
• needs a special form of grammars (LL(1) grammars).
• Recursive Predictive Parsing is a special form of Recursive Descent parsing
without backtracking.
• Non-Recursive (Table Driven) Predictive Parser is also known as LL(1) parser.

By Bishnu Gautam 14
Compiler Construction Syntax Analysis

Recursive-Descent Parsing
Method: let input w = abc, initially create
Backtracking is needed. the tree of single node S. The left most node
It tries to find the left-most derivation. a match the first symbol of w, so advance
the pointer to b and consider the next leaf B.
Then expand B using first choice bc.
There is match for b and c, and advanced
S → aBc to the leaf symbol c of S, but there is no
match in input, report failure and go back to
B → bc|b B to find another alternative b that produce
match.
S S
input: abc
a B c a B c
fails, backtrack
b c b
A left-recursive grammar can causes a recursive-decent parser
to go into a infinite loop
By Bishnu Gautam 15
Compiler Construction Syntax Analysis

Left Recursion
A grammar is left recursive if it has a non-terminal A such that there
is a derivation.
A ⇒ Aα for some string α

Top-down parsing techniques cannot handle left-recursive grammars.


So, we have to convert our left-recursive grammar into an equivalent
grammar which is not left-recursive.
The left-recursion may appear in a single step of the derivation
(immediate left-recursion), or may appear in more than one step of the
derivation.

By Bishnu Gautam 16
Compiler Construction Syntax Analysis

Immediate Left-Recursion
A→Aα| β where β does not start with A
⇓ eliminate immediate left recursion
A → β A’
A’ → α A’ | ε an equivalent grammar

In general,
A → A α1 | ... | A αm | β1 | ... | βn where β1 ... βn do not start with A
⇓ eliminate immediate left recursion
A → β1 A’ | ... | βn A’
A’ → α1 A’ | ... | αm A’ | ε an equivalent grammar
By Bishnu Gautam 17
Compiler Construction Syntax Analysis

Immediate Left-Recursion - Example


E → E+T | T
T → T*F | F
F → id | (E)

⇓ eliminate immediate left recursion


E → T E’
E’ → +T E’ | ε
T → F T’
T’ → *F T’ | ε
F → id | (E)

By Bishnu Gautam 18
Compiler Construction Syntax Analysis

Non-Immediate Left-Recursion
By just eliminating the immediate left-recursion, we
may not get a grammar which is not left-recursive.
S → Aa | b
A → Sc | d This grammar is not immediately left-recursive,
but it is still left-recursive.

S ⇒ Aa ⇒ Sca or
A ⇒ Sc ⇒ Aac causes to a left-recursion

So, we have to eliminate all left-recursions from our grammar

By Bishnu Gautam 19
Compiler Construction Syntax Analysis

Eliminate Left-Recursion - Algorithm


Input: Grammar G with no cycles or ε-productions
Output: An equivalent grammar with no left-recursion (but may have ε-
productions)
Arrange non-terminals in some order: A1 ... An
for i from 1 to n do {
for j from 1 to i-1 do {
replace each production
Ai → Aj γ
by
Ai → α1 γ | ... | αk γ
where Aj → α1 | ... | αk
}
eliminate immediate left-recursions among Ai productions
}
By Bishnu Gautam 20
Compiler Construction Syntax Analysis

Eliminate Left-Recursion - Example


S → Aa | b
A → Ac | Sd | f
- Let the order of non-terminals: S, A
for S:
- do not enter the inner loop.
- there is no immediate left recursion in S.
for A:
- Replace A → Sd with A → Aad | bd
So, we will have A → Ac | Aad | bd | f
- Eliminate the immediate left-recursion in A
A → bdA’ | fA’
A’ → cA’ | adA’ | ε

So, the resulting equivalent grammar which is not left-recursive is:


S → Aa | b
A → bdA’ | fA’
A’ → cA’ | adA’ | ε
By Bishnu Gautam 21
Compiler Construction Syntax Analysis

Left-Factoring
When a nonterminal has two or more productions whose right-hand
sides start with the same grammar symbols, the grammar is not LL(1)
and cannot be used for predictive parsing
Replace productions
A → α β 1 | α β2 | … | α βn | γ
with
A → α A’ | γ
A’ → β1 | β2 | … | βn

By Bishnu Gautam 22
Compiler Construction Syntax Analysis

Predictive Parsing
A predictive parser tries to predict which production produces
the least chances of a backtracking and infinite looping.
When re-writing a non-terminal in a derivation step, a predictive
parser can uniquely choose a production rule by just looking the
current symbol in the input string.
Example
stmt → if ...... |
while ...... |
begin ...... |
for .....
When we are trying to write the non-terminal stmt, if the current token is if we
have to choose first production rule.

By Bishnu Gautam 23
Compiler Construction Syntax Analysis

Predictive Parsing

Two variants:
– Recursive (recursive-descent parsing)
– Non-recursive (table-driven parsing)

By Bishnu Gautam 24
Compiler Construction Syntax Analysis

Recursive Predictive Parsing


Each non-terminal corresponds to a procedure.
Ex: A → aBb | bAB
proc A {
case of the current token {
‘a’: - match the current token with a, and move to the next token;
- call ‘B’;
- match the current token with b, and move to the next token;
‘b’: - match the current token with b, and move to the next token;
- call ‘A’;
- call ‘B’;
}
}

By Bishnu Gautam 25
Compiler Construction Syntax Analysis

Recursive Predictive Parsing


When to apply ε-productions.

A → aA | bB | ε

If all other productions fail, we should apply an ε-production. For


example, if the current token is not a or b, we may apply the
ε-production.
Most correct choice: We should apply an ε-production for a non-
terminal A when the current token is in the follow set of A (which
terminals can follow A in the sentential forms).

By Bishnu Gautam 26
Compiler Construction Syntax Analysis

Non-Recursive Predictive Parsing


Non-Recursive predictive parsing is a table-driven parser.
Given an LL(1) grammar G = (N, T, P, S) construct a table M[A,a]
for A ∈ N, a ∈ T and use a driver program with a stack

input a + b $

stack
Predictive parsing
X output
program (driver)
Y
Z Parsing table
$ M

By Bishnu Gautam 27
Compiler Construction Syntax Analysis

Non-Recursive Predictive Parsing


input buffer
– our string to be parsed. We will assume that its end is marked with a special
symbol $.
output
– a production rule representing a step of the derivation sequence (left-most
derivation) of the string in the input buffer.
stack
– contains the grammar symbols
– at the bottom of the stack, there is a special end marker symbol $.
– initially the stack contains only the symbol $ and the starting symbol S.
– when the stack is emptied (ie. only $ left in the stack), the parsing is
completed.
parsing table
– a two-dimensional array M[A,a]
– each row is a non-terminal symbol
– each column is a terminal symbol or the special symbol $
– each entry holds a production rule.
By Bishnu Gautam 28
Compiler Construction Syntax Analysis

Non-Recursive Predictive Parsing


Input : a string w. Algorithm
Output: if w is in L(G), a leftmost derivation of w; otherwise error
1. Set ip to the first symbol of input stream
2. Set the stack to $S where S is the start symbol of the grammar
3. repeat
Let X be the top stack symbol and a be the symbol pointed by ip
If X is a terminal or $ then
if X = a then pop X from the stack and advance ip
else error()
else /* X is a non-terminal */
if M[X, a] = X → Y1, Y2, …, …, Yk then
pop X from stack
push Yk, Yk-1, …, …, Y1 onto stack (with Y1 on top)
output the production X → Y1, Y2, …, Yk
else error()
4. until X = $ /* stack is empty */
By Bishnu Gautam 29
Compiler Construction Syntax Analysis

Non-Recursive Predictive Parsing


Example
S → aBa a b $ LL(1) Parsing
B → bB | ε S S → aBa Table
B B→ε B → bB
stack input output
$S abba$ S → aBa
$aBa abba$
$aB bba$ B → bB
$aBb bba$
$aB ba$ B → bB
$aBb ba$
$aB a$ B→ε
$a a$
$ $ accept, successful completion

By Bishnu Gautam 30
Compiler Construction Syntax Analysis

Non-Recursive Predictive Parsing


Example
Outputs: S → aBa B → bB B → bB B→ε
Derivation(left-most): S ⇒ aBa ⇒ abBa ⇒ abbBa ⇒ abba
S
parse tree
a B a

b B

b B

ε
By Bishnu Gautam 31
Compiler Construction Syntax Analysis

Constructing LL(1) Parsing Tables

Eliminate left recursion from grammar


Left factor the grammar
a grammar Î Î a grammar suitable for predictive
eliminate left parsing (a LL(1) grammar)
left recursion factor

Compute FIRST and FOLLOW functions

By Bishnu Gautam 32
Compiler Construction Syntax Analysis

Constructing LL(1) Parsing Tables


FIRST and FOLLOW
FIRST(α) is a set of the terminal symbols which occur as first
symbols in strings derived from α where α is any string of grammar
symbols.
if α derives to ε, then ε is also in FIRST(α) .

FOLLOW(A) is the set of the terminals which occur immediately after


(follow) the non-terminal A in the strings derived from the starting
symbol.
– a terminal a is in FOLLOW(A) if S ⇒ * αAaβ

– $ is in FOLLOW(A) if S ⇒ * αA

By Bishnu Gautam 33
Compiler Construction Syntax Analysis

Constructing LL(1) Parsing Tables


Compute FIRST
1. If X is a terminal symbol then FIRST(X) = {X}
2. If X is a non-terminal symbol and X → ε is a production rule then
FIRST(X) = FIRST(X) ∪ ε.
3. If X is a non-terminal symbol and X → Y1Y2..Yn is a production rule then
a. if a terminal a in FIRST(Y1) then FIRST(X) = FIRST(X) ∪ FIRST(Y1)
b. if a terminal a in FIRST(Yi) and ε is in all FIRST(Yj) for j=1,...,i-1 then
FIRST(X) = FIRST(X) ∪ a.
c. if ε is in all FIRST(Yj) for j=1,...,n then FIRST(X) = FIRST(X) ∪ ε.
• If X is ε then FIRST(X)={ε}
• If X is Y1Y2..Yn
a. if a terminal a in FIRST(Yi) and ε is in all FIRST(Yj) for j=1,...,i-1 then
FIRST(X) = FIRST(X) ∪ a
b. if ε is in all FIRST(Yj) for j=1,...,n then FIRST(X) = FIRST(X) ∪ ε.

By Bishnu Gautam 34
Compiler Construction Syntax Analysis

Constructing LL(1) Parsing Tables


Compute FIRST: Example
E → TE’
E’ → +TE’ | ε
T → FT’
T’ → *FT’ | ε
F → (E) | id

FIRST(F) = {(,id} FIRST(TE’) = {(,id}


FIRST(T’) = {*, ε} FIRST(+TE’ ) = {+}
FIRST(T) = {(,id} FIRST(ε) = {ε}
FIRST(E’) = {+, ε} FIRST(FT’) = {(,id}
FIRST(E) = {(,id} FIRST(*FT’) = {*}
FIRST(ε) = {ε}
FIRST((E)) = {(}
FIRST(id) = {id}

By Bishnu Gautam 35
Compiler Construction Syntax Analysis

Constructing LL(1) Parsing Tables


Compute FOLLOW
Apply the following rules until nothing can be added to any
FOLLOW set:

1. If S is the start symbol then $ is in FOLLOW(S)


2. if A → αBβ is a production rule then everything in FIRST(β) is placed in
FOLLOW(B) except ε
3. If (A → αB is a production rule ) or ( A → αBβ is a production rule and ε is
in FIRST(β) ) then everything in FOLLOW(A) is in FOLLOW(B).

By Bishnu Gautam 36
Compiler Construction Syntax Analysis

Constructing LL(1) Parsing Tables


Compute FOLLOW Example
E → TE’
E’ → +TE’ | ε
T → FT’
T’ → *FT’ | ε
F → (E) | id

FOLLOW(E) = { $, ) }
FOLLOW(E’) = { $, ) }
FOLLOW(T) = { +, ), $ }
FOLLOW(T’) = { +, ), $ }
FOLLOW(F) = {+, *, ), $ }

By Bishnu Gautam 37
Compiler Construction Syntax Analysis

Constructing LL(1) Parsing Tables


Algorithm
Input: LL(1) Grammar G
Output: Parsing Table M
for each production rule A → α of a grammar G
for each terminal a in FIRST(α)
add A → α to M[A,a]
If ε in FIRST(α) then
for each terminal a in FOLLOW(A)
add A → α to M[A,a]
If ε in FIRST(α) and $ in FOLLOW(A) then
add A → α to M[A,$]

make all other undefined entries of the parsing table M be error

By Bishnu Gautam 38
Compiler Construction Syntax Analysis

Constructing LL(1) Parsing Tables


Example
E → TE’ FIRST(TE’)={(,id} E → TE’ into M[E,(] and M[E,id]
E’ → +TE’ FIRST(+TE’ )={+} E’ → +TE’ into M[E’,+]
E’ → ε FIRST(ε)={ε}
but since ε in FIRST(ε)
and FOLLOW(E’)={$,)} E’ → ε into M[E’,$] and M[E’,)]
T → FT’ FIRST(FT’)={(,id} T → FT’ into M[T,(] and M[T,id]
T’ → *FT’ FIRST(*FT’ )={*} T’ → *FT’ into M[T’,*]
T’ → ε FIRST(ε)={ε}
but since ε in FIRST(ε)
and FOLLOW(T’)={$,),+}
T’ → ε into M[T’,$], M[T’,)] and M[T’,+]
F → (E) FIRST((E) )={(} F → (E) into M[F,(]
F → id FIRST(id)={id} F → id into M[F,id]

By Bishnu Gautam 39
Compiler Construction Syntax Analysis

LL(1) Grammars
A grammar whose parsing table has no multiply-defined entries is
said to be LL(1) grammar.
What happen when a parsing table contains multiply defined entries ?
– The problem is ambiguity

A left recursive, not left factored and ambiguous grammar cannot be


a LL(1) grammar (i.e. left recursive, not left factored and ambiguous
grammar may have multiply –defined entries in parsing table)
There are no general rules by which multiply-defined entries can be made
single-valued without affecting the language recognized by a grammar –
therefore there should be LL(1) grammar as an input to construct the
parsing table

By Bishnu Gautam 40
Compiler Construction Syntax Analysis

Properties of LL(1) Grammars


one input symbol used as a look-head symbol do determine parser action

LL(1) left most derivation


input scanned from left to right

A grammar G is LL(1) if and only if the following conditions hold for


two distinctive production rules A → α and A → β
1. Both α and β cannot derive strings starting with same terminals.
2. At most one of α and β can derive to ε.
3. If β can derive to ε, then α cannot derive to any string starting with a
terminal in FOLLOW(A).

By Bishnu Gautam 41
Compiler Construction Syntax Analysis

Error Recovery in Predictive Parsing


An error may occur in the predictive parsing (LL(1) parsing)
– if the terminal symbol on the top of stack does not match with
the current input symbol.
– if the top of stack is a non-terminal A, the current input symbol is
a, and the parsing table entry M[A,a] is empty.
What should the parser do in an error case?
– The parser should be able to give an error message (as much as
possible meaningful error message).
– It should be recover from that error case, and it should be able
to continue the parsing with the rest of the input.

By Bishnu Gautam 42
Compiler Construction Syntax Analysis

Exercise

Q. No. 4.1, 4.2, 4.3, 4.4, 4.5, 4.6, 4.11, 4.12,


4.14, 4.16 and 4.17

By Bishnu Gautam 43

You might also like