Role of the Syntax Analyser – Syntax error handling.
Review of Context Free Grammars - Derivation and Parse Trees,
Eliminating Ambiguity.
Basic parsing approaches - Eliminating left recursion, left factoring.
Top-Down Parsing - Recursive Descent parsing, Predictive Parsing, LL(1)
Grammars.
SYNTAX ANALYSIS
The second phase of compiler is syntax analyzer or parser.
The parser receives a steam of tokens from the lexical analyzer and verifies that the string can
be generated by the grammar for the source language by constructing a parse tree.
The term parsing comes from Latin word pars which means part of speech.
SYNTAX ANALYSIS Scanner
[Lexical Analyzer]
Tokens
Parser
[Syntax Analyzer]
INTERACTION BETWEEN LEXICAL ANALYZER
AND PARSER
CONTEXT FREE GRAMMAR (CFG)
Context free grammar is a grammar whose productions are of the form
where A is a non terminal and α is a set of terminals and non terminals (α can be
empty also)
A formal grammar is "context free" if its production rules can be applied regardless
of the context of a nonterminal.
No matter which symbols surround it, the single nonterminal on the left hand side
can always be replaced by the right hand side.
A CFG consist of (NTPS)
Terminals
basic symbols from which strings are formed
tokens
Non terminals
nonterminals define sets of strings that help define the language generated by the
grammar
Production
Start Symbol
Grammar for simple arithmetic expression
DERIVATION
• A derivation is basically a sequence of production rules, in order to get the input
string.
• Beginning with the start symbol, each replaces a non terminal by the body of one of
its productions.
• Types:
• Left Most Derivation - In left most derivation, the left most non terminal is replaced in each step
• Right Most Derivation - In right most derivation, the right most non terminal is replaced in each
step
Consider the grammar
PARSE TREE
Parse tree is a hierarchical structure which represents the derivation of the grammar to yield
input strings.
Simply it is the graphical representation of derivations.
Derivation tree
Parsing is the process of determining if a string of token can be generated by a grammar.
Yield of the parse tree
The leaves of the parse tree are labeled by non-terminals or terminals and read
from left to right, they constitute a sentential form, called the yield or frontier of
the tree.
Parsing is the process of determining if a string of token can be
generated by a grammar.
2 approaches
Top Down Parsing - In top down parsing, parse tree is constructed from top (root) to the
bottom (leaves).
Bottom Up Parsing - In bottom up parsing, parse tree is constructed from bottom
(leaves)) to the top (root).
Top Down Parsing Bottom Up Parsing
Top down parsing can be viewed as an attempt to find a
leftmost derivation for an input string (that is expanding the
leftmost terminal at every step).
TDP approaches:
Recursive Descent Parser
Predictive Parser
RECURSIVE DESCENT PARSING
IMPLEMENTATION
Procedure S()
{ if nextsymbol = ‘c’
{ A();
if nextsymbol = ‘d’
return success;
} Procedure A()
} { if nextsymbol = ‘a’
{ if nextsymbol = ‘b’
return;
else return;
}
error;
}
It is the most general form of top-down parsing.
It may involve backtracking, that is making repeated scans of input, to
obtain the correct expansion of the leftmost non-terminal.
Unless the grammar is ambiguous or left-recursive, it finds a suitable
parse tree
Drawbacks of RDP
A left-recursive grammar can cause a recursive-descent parser, to go into an infinite loop. That is when
we try to expand A, we may find ourselves again trying to expanding A, without having consumed any
input.
Recursive-descent parsers are not very common as programming language constructs can be parsed
without using backtracking.
Not suitable with ambiguous grammar
24
PREDICTIVE PARSER
Predictive parser has the capability to predict which alternative production is to
be used to replace the input string.
A predictive parsing is a special form of recursive-descent parsing, in which
the current input token unambiguously determines the production to be applied
at each step.
The goal of predictive parsing is to construct a top-down parser that never
backtracks.
It is possible to build a non-recursive predictive parser by maintaining a stack explicitly, rather
than implicitly via recursive calls.
Model of non-recursive predictive parser
Input buffer :
contains the string to be parsed, followed by $(used to indicate end of input
string)
Stack:
initialized with $, to indicate bottom of stack.
Parsing table:
2 D array M[A,a] where A is a nonterminal and a is terminal or the symbol $
The parser is controlled by a program.
28
//Reverse and push into stack
EXAMPLE:
Input : id + id * id
Grammar :
ETE’
E’ +TE’ | є
TFT’
T’*FT’ | є
F(E) | id
30
Moves made by predictive parser for the input id+id*id
31
Uses 2 functions:
FIRST()
FOLLOW()
These functions allows us to fill the entries of
predictive parsing table
32
RULES TO COMPUTE FIRST SET
1) If X is a terminal , then FIRST(X) is {X}
2) If X--> є then add є to FIRST(X)
3) If X is a non terminal and X-->Y1Y2Y3...Yn , then put 'a' in FIRST(X) if for some i,
a is in FIRST(Yi) and є is in all of FIRST(Y1),...FIRST(Yi-1).
35
FOLLOW
FOLLOW is defined only for non terminals of the grammar G.
It can be defined as the set of terminals of grammar G , which can
immediately follow the non terminal in a production rule from
start symbol.
In other words, if A is a nonterminal, then FOLLOW(A) is the set of
terminals 'a' that can appear immediately to the right of A in some
sentential form
36
RULES TO COMPUTE FOLLOW SET
1. If S is the start symbol, then add $ to the
FOLLOW(S).
2. If there is a production rule A--> αBβ then
everything in FIRST(β) except for є is placed in
FOLLOW(B).
3. If there is a production A--> αB , or a production
A--> αBβ where FIRST(β) contains є then
everything in FOLLOW(A) is in FOLLOW(B).
37
38
Calculate First and Follow of the given
grammar
S → aBDh
B → cC
C → bC / ∈
D → EF
E→g/∈
F→f/∈
40
44
A context-free grammar G , whose parsing table has no multiple entries is said to be LL(1).
LL(l) grammars are the class of grammars from which the predictive parsers can be constructed
In the name LL(1),
the first L stands for scanning the input from left to right,
the second L stands for producing a leftmost derivation,
and the 1 stands for using one input symbol of lookahead at each step to make parsing
action decision.
Not LL(1)
Grammar
The goal of predictive parsing is to construct a top-down parser that
never backtracks. To do so, we must transform a grammar in two ways:
Eliminate Left Recursion
Perform Left factoring
These rules eliminate most common causes for backtracking
The problem is that if we use this production for top-down derivation, we will fall into an
infinite derivation chain. This is called left recursion.
Eliminating Left Recursion
The left-recursive pair of productions A Aα|β could be replaced by two non-recursive
productions.
AMBIGUITY
An ambiguous sentence has two or more possible meanings within a single sentence or sequence
of words. This can confuse the reader and make the meaning of the sentence unclear.
AMBIGUOUS GRAMMAR
An ambiguous grammar is one that produces more
than one leftmost or more than one rightmost
derivation for the same sentence.
For most parsers, it is desirable that the grammar be
made unambiguous, for if it is not, we cannot
uniquely determine which parse tree to select for a
sentence.