Compiler design
Department of Computer Science
FACULTY OF COMPUTING
Debre Markos University
Year IV Semester I
by…
Birku L
B.Sc. CS, M.Sc. SE
The Academic year of 2016 E.C
Chapter 3 Outline
Role of parsing and Parsing tree?
Context free Grammar
Derivation
Ambiguity
Top-Down Parsing
Recursive parsing
Non Recursive parsing
Bottom-Up Parsing
S-R parsing
Operator-Precedence Parsing
L- R Parsing, LALR parsing
SLR and CLR Parsing
Syntax Analysis
This phase takes the
produced by the lexical analysis and arranges
these in a tree-structure (called the ) that reflects
the structure of the program. Also known as
the
List of Token Syntax Analysis Syntax tree
Error Messages
Each represents an
The of represent .
Key concepts of Syntax Analysis
refers to the set of rules, principles, and processes that
govern the structure of sentences in a given language,
specifically word order and hierarchical structure.
:- is the process of analyzing a sequence of input tokens
(words or symbols) to determine its grammatical structure.
This structure is often represented as a parse tree or syntax
tree.
:-A formal grammar defines the syntactic rules of a
language. Common types of grammars used in parsing include
context-free grammars (CFGs) and regular grammars.
:-A parse tree is a tree representation that depicts the
syntactic structure of a string according to some formal
grammar. The tree’s nodes represent syntactic categories, and
the edges represent the relationship between these categories.
Syntax Analysis
: Syntax Tree for Assignment Statement
The above have an labeled with as
( and a
Labeled with as ( and the multiplication of ( and a
Labeled assign the in to (
So are interior node also the
of
Role of Syntax Analysis(Parsing)
1. To the
2. To them into using
3. To a parse tree to the next phase(Semantic
Analysis)
4. To if those do not
a .
Source Lexical Token To semantic
program
Parser analysis
Analyzer getNextToken
Symbol table
Context-free grammar (CFG)
Context-Free Grammars (CFGs) are a type of formal grammar used
to define the syntax of programming languages and other formal
languages.
are good for the of
.
Can define the languages, a strict of the
, i.e. than
Normally used to classify …
Simpler and more concise for than a
More efficient can be built from
CFGs are best explained by example...
are used to impose structure
Context-free grammar (or CFG)
Many constructs have an
that can be defined by
.
we might have a defined by
a such as
and are statements and is an expression. then
This form of cannot be using
the notation of ;
On the other hand use the
to denote the class of and
the class of . we can express
using the
For this kinds of we are using
Context-Free Grammar (CFG)
textbooks use different and terms to describe
Formally, a
= or a finite set
= or a finite set
= a finite set
= Start SV
Productions’ form, where
:
: and/or
• Rules how to rewrite (beginning with
) into terminals
Context-free grammar (CFG)
four those
are the from which
The word ” “ is a synonym for " “ when
we are talking about for . each
of the , . and is a terminal.
are that denote .
and are . The nonterminal define sets of
that define the by the
In a grammar. one is as
the .
of a grammar specify the in which the
and can be to form .
Each consists of a . followed by an
(sometimes the symbol used in
followed by a of and .
Arithmetic Expressions
we want to all
using , , , and
.
Here is one possible
Expr → Expr Op Expr
The nonterminal symbols are
V = {Expr, OP}
Expr → (Expr )
The terminal symbols are
Expr → id
T = {(,),id,+,-,*,/,%}
Op → +
How many Production are
Op → -
P = 3 for Expr and 5 for Op
Op → *
Total 8 productions
Op → /
S = Start Symbol is Expr
Op → %
Example of CFG
Suppose we want to describe only and
.
Here is one possible
V = {Expr}
Expr → Expr + Expr T = {id,+,*}
Expr → Expr * Expr P=3
S = Expr
Expr → id
V = {S}
T = {a}
S → | aS | S a | a P=3
S=S
V = {S}
T = {a,b,e}
S → |aSbS | bSaS | e P=3
S=S
Notational Conventions
To avoid always having to state that " ,"
"these are the ”, and so on. we shall employ the
following with regard to
1. These are :
letters early in the alphabet such as a, b, c.
symbols such as +, - ,* , / etc.
symbols such as parentheses, comma.
0, 1, …….., 9.
strings such as or .
2. These are :
letters early in the alphabet such as
b) The , which. when it appears, is usually the
,
italic names such as or .
Notational Conventions
letters in the alphabet, chiefly , represent
strings of .
4. , it for , represent
strings or grammar . Thus, a generic could be
written as , indicating that there is a single
on the of the (
) and a symbols to the of
the ( ).
5. If are an with
on the left (we call them ), we may
the. for
.
6. Unless otherwise stated. the of the first is
the start .
Using these , we could write the of
concisely as Expr → Expr A Expr | (Expr) |id
A→ +|-|*|/|%
Derivation
Productions are treated as rules to generate a string this
process is called .
Show that a is in the ( )
• Start with the
replace of the by a side
of a
when the contains only
At step, we choose a to replace.
• This can to .
Left-most derivation
. of a sentential form is one
in which rules transforming the nonterminal are
always applied
Expr → | Expr + Expr V = {Expr}
T = {id,+,*}
Expr → | Expr * Expr P=3
Expr → |id S = Expr
String id + id * id 2+ 3 * 4
Expr Expr + Expr Expr Expr * Expr
Expr Id + Expr Expr Expr + Expr * Expr
Expr Id + Expr * Expr Expr Id + Expr * Expr
Expr Id + Id * Expr Expr Id + Id * Expr
Expr Id + Id * Id Expr Id + Id * Id
Right-most derivation
. of a sentential form is one in
which rules transforming the right-most nonterminal are always
applied
Expr → | Expr + Expr V = {Expr}
T = {id,+,*}
Expr → | Expr * Expr P=3
Expr → |id S = Expr
String id + id * id 2+ 3 * 4
Expr Expr + Expr Expr Expr * Expr
Expr Expr + Expr * Expr Expr Expr * Id
Expr Expr + Expr * Id Expr Expr + Expr * Id
Expr Expr + Id * Id Expr Expr + Id * Id
Expr Id + Id * Id Expr Id + Id * Id
Parser Tree and derivation
is the process of checking that a string is in the CFG
for your programming language. It is usually coupled
with creating an abstract syntax tree.
Expr Expr + Expr Expr Expr * Expr
Expr Id + Expr Expr Expr + Expr * Expr
Expr Id + Expr * Expr Expr Id + Expr * Expr
Expr Id + Id * Expr Expr Id + Id * Expr
Expr Id + Id * Id Expr Id + Id * Id
Expr Expr
Expr + Expr Expr * Expr
Id Expr * Expr Expr + Expr Id
Id Id Id Id
14 = 2 + 3 * 4 2 + 3 * 4 = 20
Ambiguity
has more than for
then it is
is to be if there is at
string with or .
that is a property of
, not .
is no for converting grammar
into an one.
We say that a grammar is an Ambiguous, if there is
Resolving ambiguity
may be by Appling Disambiguated rule
( ,
and
( ,
, we can transform a grammar to have this property:
For each A find the common to or
of its .
no for a single have a
.
Backus Normal Form(BNF)
BNF is a notation techniques for context-free
grammars, often used to describe the syntax of
languages used in computing
A BNF specification is a set of derivation rules,
written as
<symbol> ::= expression
BNF for valid arithmetic expression
<expr> ::= <expr> <op> <expr>
<expr> ::= ( <expr> )
<expr> ::= <expr>
<expr> ::= id
<op> ::= + | - | * | /
Left factoring
<expr> : : = <term> + <expr> Two
| <term> - <expr> must be
| <term> :
<term>: := <factor> * <term>
| <factor> / <term>
| <factor>
<expr> : : = <term> <expr´>
<expr´>: : = + <expr>
| - <expr>
| ε
<term> : : = <factor> <term´>
<term´>: := * <term>
| / <term> 22
| ε
Left factoring
,
<stmt> : : = if < > then <stmt>
| if < > then <stmt> else <stmt>
| <otherstmt>
<stmt> : := if < > then <stmt><stmt’>
<stmt’> : := ε|else <stmt>
<stmt> : := <otherstmt>
This generates the as the ambiguous grammar,
but applies the rule:
Resolving
Resolving ambiguity
ambi
grow
which means
means in the
equal
E→E+E E E + T/T
E
E →id T Id E + T
String id + id + id
E + T Id
T Id
Id
Resolving ambiguity
concerned about the priority of operators
get the
get from
Expr → | Expr + Expr
Expr → | Expr * Expr
Expr → |id
String id + id * id
Expr Expr
Expr + Expr Expr * Expr
Id Expr * Expr Expr + Expr Id
Id Id Id Id
14 = 2 + 3 * 4 2 + 3 * 4 = 20
Resolving ambiguity
concerned about the priority of
operators
E→|E+E E → | E + T/T
E→|E*E T → | T * F/F
E → |id F → |id
String id + id * id
E
E + T
T T * F
F F Id 14 = 2 + (3 * 4)
Id Id
Resolving ambiguity
concerned about the priority of
operators
E → | E or E E → | E or T/T
E → | E and E T → | T and F/F
E → |not E F → |not F/true/false
String id or id and (not id)
E
E or T True = T or (F and (not T))
T T and F
F F not
True False True
Parser
Parser
A
Parser
A tree may be viewed as a
for a that filters out the choice
order.
Mainly there are of parse tree
:
• Starts at the of tree and fills in
• Picks a and tries to the input
• Some grammars are ( )
:
• Starts at the and fills in
• Up to a valid for
• Uses a to store both and forms
Top-Down Parser
A tries to create a from the
the leafs input from to
It can be also as finding a for an
input
E E E E E E
E -> TE’ lm lm lm lm lm
E’ -> +TE’ | Ɛ T E’ T E’ T E’ T E’ T E’
T -> FT’ F T’ F T’ F T’ F T’ + T E’
T’ -> *FT’ | Ɛ
F -> (E) | id id id Ɛ id Ɛ F T’ Ɛ
id
* F T’
id
Ɛ
Top-down parsing
Bottom-up parser
for an input string at the
(the ) and working the (the top)
Start from to (start Symbol)
E -> E + T | T id*id F * id T * id T*F T E
T -> T * F | F id F id
F id
F T*F T
F -> id
id id F id
F id T*F
id
id F id
id
Bottom-up parsing
Top-down parser
on the the parser has about the
input, a is made with one
.
If this choice to a , the would have to
to that point, moving through
the , and
Start again a choice and so on it
found the that was the or
• For example, this :
Top-down parser
Let’s follow S bcd Try
the input . bab bcd match b
In the below, ab cd dead-end, backtrack
the on the S bcd Try
will be the bA bcd match b
thus far, A cd Try
the is the d cd dead-end, backtrack
A cd Try
, and
cA cd match c
the is the
A d Try
action at
d d match d
step:
Success!
Recursive Descent parsing
1. A consists of
, one for each in the .
As we a , we call the that
to the of the
we are applying. If these productions are , we end
up the .
begins with the for
A procedure for a
void A() {
choose an A-production, AX1X2..Xk
for (i=1 to k) {
if (Xi is a nonterminal
call procedure Xi();
else if (Xi equals the current input symbol a)
advance the input to the next symbol;
else /* an error has occurred */
Top-down parser
A is by its to choose the
to solely on the of the
and the current being .
To this, the must take a form.
We call such a
• The means we scan the input from to ; the
means we create a ; and
• the means one input of .
Informally, an LL(1) has no
it is a special form of Recursive Descent
Parsing without backtracking.
Non Recursive Descent parsing
It is to build a predictive by
a , than via .
The problem during is that of
the to be for a .
A table-driven predictive parser has an input buffer, a stack, a
parsing table. and an output stream.
the string to be followed by a
used as a end to the of the
.
contains a of symbols with on the
, the of the .
the the of the grammar on
is a two dimensional array . Where
is a , and or the
Non Recursive Descent parsing
is by a that behaves as follows. The
considers , the symbol on of the , and the
current in . These determine the of the
.
are .
1. If the parser and announces of
.
2. If the parser off the and the
input to the next .
3. If is a , Parser looks at the parsing table entry
XY1Y2...Yk, it pops X from the
stack and pushes Yk,Yk-1,...,Y1 into the stack. of the
XY1Y2...Yk to represent a step of the derivation.
4. None of the above error
• All empty entries in the parsing table are errors.
• If X is a terminal symbol different from a, this is also an error
Non Recursive Descent parsing
Non Recursive Descent parsing Algorithm
• Input buffer
– our string to be parsed. We will assume that its end is
marked with a special symbol $.
• Output
– A production rule representing a step of the derivation
sequence (left-most derivation) of the string in the input
buffer.
• Stack
– Contains the grammar symbols
– At the bottom of the stack, there is a special end marker
symbol $.
– Initially the stack contains only the symbol $ and the
starting symbol S.
– When the stack is emptied (i.e. only $ left in the stack), the
Non Recursive Descent parsing Algorithm
• Parsing table
– A two-dimensional array M[A,a]
– Each row is a non-terminal symbol
– Each column is a terminal symbol or the special
symbol $
– Each entry holds a production rule.
LL(1) Grammars
are those
for which we can create are called
means scanning input
means
stands for one for
• A grammar if and only if are
of , the following conditions hold:
– For a do and both
with a
– At of or can derive
– If then derive any string with a
in
LL(1) Grammars
Example
left factoring
First and Follow
()
()
is set of that derived from
• If then is also in
• In when we have , if and
are then we can
by looking at the
for any , is set of a that
can in some form
we have for some and then is in
• If A can be the in some form,
then is in
Computing First
• To compute ) for all grammar , apply
following rules no or can be added to
any
is a terminal then =
is a and is a
for some , then place a
for some a is in and is in all of
that is . if is in
for then add to
1. If is a production then add to
!
• of is
• of is
Computing Follow
• To compute for all , apply
following rules until nothing can be added to any
follow set:
in where is
is a production then in
except is in .
there is a production or a production
where contains , then
in is in
!
• of is
of is
Examples
To find the first of the given grammar remember the above
rules.
1. Sabc|def|ghi the first(S)={a,d,g}
2. SABC|def|ghi the first(S)=first(A)={a,b,c,d,g}
Aa|b|c
Bb
Dd
3. SABC The first(S)=First(A)={a,b, } but we didn’t write
epsilon before the remaining symbol is present instead
Aa|b| of writing epsilon directly goes to B The
Bc|d| first(S)=first(A)after reached to epsilon first(B) then
again goto first(C) when you reached at epsilon finally ,
Ce|f| First(S)={a,b,c,d,e,f, }
Examples
To find the follow of G: remember the above rules.
1. The follow of starting symbol is {$}
2. SACD the follow(A)={a,b} and follow(D)=follow(S)={$}
Ca|b
3. SaSbS|bSaS the follow(S)={$,b,a}
Aa|b|c
Bb
Dd
4. SABC follow(A)=first(B) since
ADEF to the next=first(C) again
B
C
D
First and Follow function
First() Follow()
S ABCDE {a, b, c} {$}
A a/Ɛ {a, Ɛ} {b, c}
B b/Ɛ {b, Ɛ} {c}
C c {c} {d, e, $}
D d/Ɛ {d, Ɛ} {e, $}
E e/Ɛ {e, Ɛ} {$}
First() Follow()
S Bb/Cd {a, b, c, d } {$}
B aB/Ɛ {a, Ɛ} {b}
C cC/Ɛ {c, Ɛ} {d}
Construction of predictive parsing table
• For each in do the
:
1. For a in add in
is in then for each in
to .
is in and $ is in , add
to
as well
the , there is
in then set to
Example
• For the
Eliminate left recursion
Eliminate left factoring
Calculate First and follow
Construct parser table
Check weather the string is acceptable
or not
Construction of predictive parsing table
First() Follow()
E TE’ {Id,(} {$, )}
E’ + TE’/Ɛ {+, Ɛ} {$, )}
T FT’ {Id, (} {+, $, )}
T' *
FT’/Ɛ {*, Ɛ} {+, $, )}
F Id/(E) {Id, (} {*, +, $, )}
Construction of predictive parsing table
Id + * ( ) $
E E TE’ E TE’
E’ E’ + TE’ E’ Ɛ E’ Ɛ
T T FT’ T FT’
T' T’ Ɛ T’ *FT’ T’ Ɛ T’ Ɛ
F F Id F (E)
Construction of predictive parsing table
E
$E Id + Id * Id$ T E’
$E’T Id + Id * Id$ E TE’
T’ + T
F E’
$E’T’F Id + Id * Id$ T FT’
id T’
$E’T’ Id Id + Id * Id$ F Id Ɛ F
$E’T’ + Id * Id$ id
$E’ + Id * Id$ T’ Ɛ
$E’T + + Id * Id$ E’ + TE’
$E’T Id * Id$
$E’T’F Id * Id$ T FT’
$E’T’ Id Id * Id$ F Id
$E’T’ * Id$
Construction of predictive parsing table
E
$E’T’F * * Id$ T’ *FT’ T E’
$E’T’F Id$ F T’ + T E’
$E’T’ Id Id$ F Id
id T’ Ɛ
$E’T’ $ Ɛ F
$E’ $ T’ Ɛ id * F T’
$ $ E’ Ɛ
id Ɛ
Characteristics of LL(1) Grammar
Left-to-Right Scanning The parser read the input from
left to right.
Leftmost Derivation: The parser constructs the leftmost
derivation of the sentence.
One Lookahead Token: The parser uses one token of
lookahead to make parsing decisions.
No Left Recursion: The grammar should not have left
recursion, as it can lead to infinite recursion in the
parser.
Non-Ambiguous: Each production should be
unambiguously chosen based on the current non-
terminal and lookahead token.
Bottom Up
Bottom Up parsing
parsing
Bottom Up parsing
Bottom up parser is constructs parse tree from the bottom to the
top i.e. leave to root
It is the process of reducing the string to the starting symbols of
the grammar
• It construct the parser tree in the reverse which means it uses
reverse Right most derivative (RMD) in reducing the input string
• The popular bottom up parser is LR parser
• The main objective of bottom up parser is construct a parser tree
starting from input string and proceed upward to generate the
starting symbol of grammar
• Steps
Parsing start with input string
Scan input left to right
Detect the right handle
Apply production rule to reduce the handle
Procedure continue until drives the starting symbol
Shift-reduce parser
• The idea is to some of input to the
until a can be
, a specific the
of a is by the at the of
the
during parsing are about
and about
• A is a of a in a
of a parser is to a in
: that means
Types of Bottom up parser
Shift-reduce parser
• A is a that the of a
and whose represents one
the of a
Shift-reduce parser
is used to
always on of the
:
Shift-reduce parser
$ Id + id*id$ shift
$id +id*id$ reduce by Eid
$E +id*id$ shift
$E+ id*id$ shift
$E + id *id$ reduce by Eid
$E + E *id$ shift
$E + E * id$ shift
$E + E * id $ reduce by Eid
$E + E * E $ reduce by EE*E
E reduce by EE+E
$E + E $
$E $ accept
E E
Fig. Configurations of Shift Reduce
E E Parser on id + id * id
id + id * id
Operator Precedence parsing
These grammars have the properly that no
production right side is or has
E EAE/ Id
A */+
b/c
These two are by ( )
E | E*E From the above example
E |E + E EEAE|id EE+E|E*E|id
E |E-E A+|* A+|*
E |E^E
The first production is not operator grammar
E | Id
but we can change it into operator grammar
Operator Precedence parsing
There are three possibility precedence
relations
a>b terminal a have high
precedence than terminal b
a<b terminal b have least
precedence than b
a=b terminal a and b have the
same precedence
E → | E or E
E → | E and E
E → |not E
E →|Id
String id or id and id
Construct of Operator Precedence Relation table
When we construct
relation table the id
E E+E
Id + have high precedence
E E*E and $ have lower
E E* Id Id -- ∙> ∙> ∙> precedence
+ <∙ ∙> <∙ ∙> It there is + + give
high precedence of the
* <∙ ∙> ∙> ∙> left b/c we applied left
Example
Id+ Id * Id$
$ <∙ <∙ <∙ accepted
associative and similar
for *
E
$ Id + Id * Id
E E
side and get E E
stack is $
of a it id + id * id
it.
Operator Precedence parsing Algorithm
Stack Relation Input Operation
$ < Id+id*id$ Push/shift id
$id > +id*id$ Pop/reduce E id
$E < +id*id$ Push/shift +
$E+ < Id*id$ Push/shift id
$E+id > *id$ Pop/reduce Eid
$E+E < *id$ Push/shift *
$E+E* < id$ Push/shift id
$E+E*id > $ Pop/reduce E id
$E+E*E > $ Pop/reduce EE*E
$E+E > $ Pop/reduce EE+E
$E - $ Accepted
Construct of Operator Precedence Function table
of
If we have our
So to the of we are
operator function table
To we have to a
two f(i) and g(j)
Id +
i
Id -- ∙> ∙> ∙>
+ <∙ ∙> <∙ ∙>
* <∙ ∙> ∙> ∙>
$ <∙ <∙ <∙ --
j
Construct of Operator Precedence Function table
W/h one is a longest path each function
f(id) g(*) f(+) g(+) f($) =4
g(id) f(*) g(*) f(+) g(+) f($)
=5
Id +
Id -- ∙> ∙> ∙>
f(i) +
<∙ ∙> <∙ ∙>
* <∙ ∙> ∙> ∙>
$ <∙ <∙ <∙ --
g(j)
Id +
f() 4 2 4 0
g() 5 1 3 0 there is no cycle
Construct of Operator Precedence Relation table
E→|E -E
E→|E /E
E →|Id
String id or id and id
E → | E or E
E → | E and E
E →|Id
String id or id and id
Construct of Operator Precedence Function table
From Relation table
The Functional table is
Reading assignment
What is LR parsing?
Questions