Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
11 views17 pages

Lec3 SyntaxAnalysis

The document discusses the role of parsers in syntax analysis, detailing their function in verifying token strings and reporting syntax errors. It outlines different types of parsers, including universal, top-down, and bottom-up parsers, and explains context-free grammar, including terminals, non-terminals, and production rules. Additionally, it covers derivation types, parse trees, ambiguity in parsing, and the relationship between grammars and regular expressions.

Uploaded by

Abhilasha Goyal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views17 pages

Lec3 SyntaxAnalysis

The document discusses the role of parsers in syntax analysis, detailing their function in verifying token strings and reporting syntax errors. It outlines different types of parsers, including universal, top-down, and bottom-up parsers, and explains context-free grammar, including terminals, non-terminals, and production rules. Additionally, it covers derivation types, parse trees, ambiguity in parsing, and the relationship between grammars and regular expressions.

Uploaded by

Abhilasha Goyal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

CSN-352

Syntax Analysis
Introduction
Role of Parser

- To verify the string of tokens generated by lexical analyser belong to the grammar of the
source language.
- To report any syntax error.
- Three general types of parser: Universal, Top-down, Bottom-up parser
Types of Parsing Methods

Universal :
- Universal parsing methods such as the Cocke-Younger-Kasami algorithm and Earley's
algorithm can parse any grammar.
- These general methods are, however, too inefficient to use in production
Compilers.

Top Down Parser: Build parse tree from top to bottom, that is, root to leave.
Bottom Up Parser: Build parse tree from leaves to root.
- In either case, the input to the parser is scanned from left to right, one symbol at a time.
Representative Grammars

- Keywords like while or int are easy to parse, because the keyword guides the choice of the
grammar production that must be applied to match the input.

- Mathematical Expression presents more challange in parsing.

- Left Recursive: The leftmost symbol of the body is the same as the nonterminal at the
head of the production. A recursive call for expr will make the parser to loop forever. Suitable for
bottom-up parsing.

expr -> expr + term

- Right Recursive: The rightmost symbol of the body is the same as the nonterminal at the head
of the produciton. A recursive call for expr will make the parser to loop forever.

- The lookahead symbol changes only when a terminal in the body is matched, no change to the
input took place between recursive calls of expr.
Context Free Grammar

Terminal - The term “token name" is a synonym for “terminal", eg., if, else,”)”, etc.
Non Terminal - Nonterminals are syntactic variables that denote sets of strings. Stmt and
expr are non terminals. Nonterminals impose a hierarchical structure on the language that is
key to syntax analysis and translation.
stmt -> if ( expr ) stmt else stmt
Start Symbol – One non terminal is distinguished as start symbol, the strings generated by
start symbol is the language of the grammar.

Production Rules – Combine terminal and non-terminals to form strings of the language.
Context Free Grammar

Grammar of the simple arithmatic


expressions. Find out terminal and
non terminals of the grammar.
Context Free Grammar
Terminal Convetions:
(a) Lowercase letters early in the alphabet, such as a, b, c.
(b) Operator symbols such as +, , and so on.
(c) Punctuation symbols such as parentheses, comma, and so on.
(d) The digits 0; 1; : : : ; 9.
(e) Boldface strings such as id or if, each of which represents a single
terminal symbol.

Non Terminal Convetions:


(a) Uppercase letters early in the alphabet, such as A, B, C.
(b) The letter S, which, when it appears, is usually the start symbol.
(c) Lowercase, italic names such as expr or stmt.
(d) Uppercase letters may be used to represent nonterminals for the constructs.
For example, nonterminals for expressions, terms, and factors are often
represented by
E, T, and F, respectively.
(e) The head of the first production is the start symbol.
Context Free Grammar
* over -> in a derivation indicates derived in zero or more steps.

+ over -> in a derivation indicates derives in one or more steps


Sentential Form of Grammar
- The language generated by a grammar is its set of sentences.

- Sentiential forms can have both terminals and non terminals

- A sentence of G is a sentential form with no nonterminals.

- All other forms obtained in between of the derivation of a sentence of a


grammar are sentential form of the Grammar.

Grammar G

Derivation of -(id+id)

-(id+id) is a sentence of the grammar (sentential form without any non terminal)

E, -E, -(E), -(E+E), -(id+E) are all sentential form the grammar (sentences of the
grammar but having non-terminals).
Types of Derivation
1. Left most derivation – left most non-terminal is chosen to be replaced.

2. Righ most derivaiton – right most non-terminal is chosen to be replaced.

- Rightmost derivations are sometimes called canonical derivations.


Derivation and Parse Tree
- A parse tree is a graphical representation of a derivation that filters out the
order in which productions are applied to replace nonterminals.

- The leaves of a parse tree are labeled by nonterminals or terminals and, read
from left to right, constitute a sentential form, called the yield or frontier of the
tree.

- Yield at level-1: -E
- Yiels at level-2: -(E)
...........
.......
Final Yield: -(id+id)
Drawing Parse Tree from Derivation
- Given derivation: α1 => α2 =>.......=> αn

BASIS: The tree for α1 = A is a single node labeled A

Induction: For each sentential form αi in the derivation, we can construct a parse
tree whose yield is αi. Replace the non-terminal from the left in the current tree
with the m children of the non-terminal. In case if m =0, the non-terminal is
replaced by ϵ. The process is an induction on i.
Ambiguity in Parsing
- An ambiguous grammar is one that produces more than one leftmost derivation
or more than one rightmost derivation for the same sentence.

- It is convenient to use carefully chosen ambiguous grammars, together with


disambiguating rules that throw away undesirable parse trees, leaving only one
tree for each sentence.
Grammar and RE
- Grammar is more powerful notation than regular expression.

- Every construct that can be described by a RE, can be described by a Grammar, but not
vice versa.

- We can create NFA for a RE algorithimacally, we can create grammar from NFA.

1. For each state i of the NFA, create a nonterminal Ai.


2. If state i has a transition to state j on input a, add the production Ai -> aAj . If state i goes to
state j on input ϵ, add the production Ai -> Aj .
3. If i is an accepting state, add Ai -> ϵ.
4. If i is the start state, make Ai be the start symbol of the grammar

NFA for (a|b)*abb


Grammar and RE
- RE is not possible for a language that requires temporary past memory.

- For eg., anbn , keep count of the number of a's before it sees the b's.

- Grammar for anbn: S -> aSb/^

-However, a context free grammar can count two items but not three, anbncn
Brackets in Grammar
- Square Bracket around grammar symbol: Optional Construct

Production A -> X [Y ] Z has the same effect as the two productions A -> X Y Z and A -> X Z.

- Curly Bracket around grammar symbol: zero or more instances

A -> X {Y Z} has the same effect as the infinite sequence of productions

A -> X, A -> X Y Z, A -> X Y Z Y Z, and so on.

You might also like