0% found this document useful (0 votes)

15 views51 pages

Lecture 04

The document provides an introduction to parsing, focusing on the role of parsers in distinguishing valid sequences of tokens in programming languages. It discusses context-free grammars (CFGs), their structure, and the concept of ambiguity in grammars, including examples of ambiguous expressions and methods to resolve such ambiguities. Additionally, it highlights the importance of operator precedence and associativity in defining unambiguous grammars.

Uploaded by

nihafahima9

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views51 pages

Lecture 04

Uploaded by

nihafahima9

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 51

Introduction to Parsing

Ambiguity and Removing Ambiguity

Outline

• Regular languages revisited

• Parser overview

• Context-free grammars (CFG’s)

• Derivations

• Ambiguity

Compiler Design 1 (2011) 2

Languages and Automata

• Formal languages are very important in CS

– Especially in programming languages

• Regular languages
– The weakest formal languages widely used
– Many applications

• We will also study context-free languages

Compiler Design 1 (2011) 3

Limitations of Regular Languages

Intuition: A finite automaton that runs

long enough must repeat states
• A finite automaton cannot remember #
of times it has visited a particular state
• because a finite automaton has finite memory
– Only enough to store in which state it is
– Cannot count, except up to a finite limit
• Many languages are not regular
• E.g., language of balanced parentheses is not
regular: { (i )i | i ≥ 0}

Compiler Design 1 (2011) 4

The Functionality of the Parser

• Input: sequence of tokens from lexer

• Output: parse tree of the program

Compiler Design 1 (2011) 5

Example

• If-then-else statement
if (x == y) then z =1; else z = 2;
• Parser input
IF (ID == ID) THEN ID = INT; ELSE ID =
INT;
• Possible parser output

IF-THEN-ELSE

== = =
ID ID ID INT ID INT
Compiler Design 1 (2011) 6
Comparison with Lexical Analysis

Phase Input Output

Lexer Sequence of Sequence of

characters tokens

Parser Sequence of Parse tree

tokens

Compiler Design 1 (2011) 7

The Role of the Parser

• Not all sequences of tokens are programs . . .

• . . . Parser must distinguish between valid and
invalid sequences of tokens

• We need
– A language for describing valid sequences of tokens
– A method for distinguishing valid from invalid
sequences of tokens

Compiler Design 1 (2011) 8

Context-Free Grammars

• Many programming language constructs have a

recursive structure

• A STMT is of the form

if COND then STMT else STMT ,
or while COND do STMT , or
…
• Context-free grammars are a natural notation
for this recursive structure

Compiler Design 1 (2011) 9

CFGs (Cont.)

• A CFG consists of
– A set of terminals T
– A set of non-terminals N
– A start symbol S (a non-terminal)
– A set of productions

Assuming X ∈ N the productions are of the

formX → ε , or
X → Y1 Y2 ... Yn where Y ∈ N ∪T
i

Compiler Design 1 (2011) 10

Notational Conventions

• In these lecture notes

– Non-terminals are written upper-case
– Terminals are written lower-case
– The start symbol is the left-hand side of the
first production

Compiler Design 1 (2011) 11

Examples of CFGs

A fragment of our example language (simplified):

STMT → if COND then STMT else STMT

⏐ while COND do STMT
⏐ id = int

Compiler Design 1 (2011) 12

Examples of CFGs (cont.)

Grammar for simple arithmetic expressions:

E →E * E
⏐ E+E
⏐ (E)
⏐ id

Compiler Design 1 (2011) 13

The Language of a CFG

Read productions as replacement rules:

X → Y1 ... Yn
Means X can be replaced by Y1 ... Yn
X→ε
Means X can be erased (replaced with empty
string)

Compiler Design 1 (2011) 14

Key Idea

(1) Begin with a string consisting of the start

symbol “S”
(2) Replace any non-terminal X in the string
by a right-hand side of some production

X → Y1 LYn
(3) Repeat (2) until there are no non-terminals in
the string

Compiler Design 1 (2011) 15

The Language of a CFG (Cont.)

More formally, we write

X1 LXi LXn → X1 LXi−1Y1 LYm Xi+1 LXn

if there is a production

Xi → Y1 LYm

Compiler Design 1 (2011) 16

The Language of a CFG (Cont.)

Write
X L X →* Y LY
1 n 1 m
if
X1 L Xn →L →L → Y1 LYm

in 0 or more steps

Compiler Design 1 (2011) 17

The Language of a CFG

Let G be a context-free grammar with start

symbol S. Then the language of G is:

{ a1…a
→
n | S *

…a
a1 n and every ai is a
terminal
}

Compiler Design 1 (2011) 18

Terminals

• Terminals are called so because there are no

rules for replacing them

• Once generated, terminals are permanent

• Terminals ought to be tokens of the language

Compiler Design 1 (2011) 19

Examples

L(G) is the language of the CFG G

Strings of balanced parentheses

(i )i | i ≥
Two grammars:
{
S → (S ) S → (S 0)
O }
S → ε R |ε

Compiler Design 1 (2011) 20

Example

A fragment of our example language (simplified):

STMT → if COND then STMT

⏐ if COND then STMT else STMT
⏐ while COND do STMT
⏐ id = int
COND → (id == id)
⏐ (id != id)

Compiler Design 1 (2011) 21

Example (Cont.)

Some elements of the our language

id = int
if (id == id) then id = int else id = int
while (id != id) do id = int
while (id == id) do while (id != id) do id = int
if (id != id) then if (id == id) then id = int else id = int

Compiler Design 1 (2011) 22

Arithmetic Example

Simple arithmetic expressions:

E → E+E | E *E | (E) | id
Some elements of the language:

id id + id
(id) id* id id
(id) * id * (id)
Compiler Design 1 (2011) 23
Notes

The idea of a CFG is a big step.

But:

• Membership in a language is just “yes” or “no”;

we also need the parse tree of the input

• Must handle errors gracefully

• Need an implementation of CFG’s (e.g., yacc)

Compiler Design 1 (2011) 24

More Notes

• Form of the grammar is important

– Many grammars generate the same language
– Parsing tools are sensitive to the grammar

Note: Tools for regular languages (e.g., lex/ML-Lex)

are also sensitive to the form of the regular
expression, but this is rarely a problem in practice

Compiler Design 1 (2011) 25

Derivations and Parse Trees

A derivation is a sequence of productions

S →L →L →L
A derivation can be drawn as a tree
– Start symbol is the tree’s root
– For a production add children
X → Y LY
1 n Y1
LYn
to node
X

Compiler Design 1 (2011) 26

Derivation Example

• Grammar

E → E+E | E *E | (E) | id
• String

id * id + id

Compiler Design 1 (2011) 27

Derivation Example (Cont.)

E
E
→ E+E
E + E
→ E * E+E
→ id *E + E E * E id
→ id *id + E
id id
→ id *id +
id
Compiler Design 1 (2011) 28
Notes on Derivations

• A parse tree has

– Terminals at the leaves
– Non-terminals at the interior nodes

• An in-order traversal of the leaves is the

original input

• The parse tree shows the association of

operations, the input string does not

Compiler Design 1 (2011) 29

Leftmost and Rightmost Derivations

• The example is a
left-most derivation
– At each step, replace the
E
left-most non-terminal
→ E+E
• There is an equivalent → E+id
notion of a
right-most → E * E + id
→ E *id + id
derivation

→ id *id +
id
Compiler Design 1 (2011) 30
Derivations and Parse Trees

• Note that right-most and leftmost

derivations have the same parse tree

• The difference is just in the order in

which branches are added

Compiler Design 1 (2011) 31

Summary of Derivations

• We are not just interested in whether

s ∈ L(G)
– We need a parse tree for s

• A derivation defines a parse tree

– But one parse tree may have many derivations

• Left-most and right-most derivations are

important in parser implementation

Compiler Design 1 (2011) 32

Ambiguity

•What is Ambiguous Grammar?

• A CFG is ambiguous if there exists more than one
derivation tree for a given input string.
• This occurs when both Left-Most Derivation Trees
(LMDT) and Rightmost Derivation Trees (RMDT) can be
generated for the same string.
• This creates uncertainty about how to parse certain
strings, leading to multiple interpretations.

• Grammar
E → E + E | E * E |( E ) | int

• String
int * int + int
Compiler Design 1 (2011) 33
Ambiguity (Cont.)

This string has two parse trees

E E

E + E E * E

E * E int int E + E

int int int int

Compiler Design 1 (2011) 34

Ambiguity (Cont.)

• A grammar is ambiguous if it has more

than one parse tree for some string
– Equivalently, there is more than one right-most or
left-most derivation for some string
• Ambiguity is bad
– Leaves meaning of some programs ill-defined
• Ambiguity is common in programming languages
– Arithmetic expressions
– IF-THEN-ELSE

Compiler Design 1 (2011) 35

S->aSbS | bSaS | ∈
S S
/\ /\
a S b S
/\ /\
b S a S
/\ /\
a S b S
/\ /\
b S a S
| |
(empty) (empty)
Grammar:
E -> E + E Input string: id + id* id
E -> E * E
E -> id
The leftmost derivation can be done in
1.E -> E + E two ways: 1.E -> E * E
2.id + E 2. E + E * E
3.id + E * E 3. id + E * E
4.id + id * E 4. id + id * E
5.id + id * id 5. id + id * id

For the given input string, we got two leftmost derivation

trees. We need to eliminate the ambiguity in the grammar.
Dealing with Ambiguity

There are several ways to handle ambiguity

Modifying Grammar Rules:

Change the production rules to ensure a unique parse tree for
each valid string.
E→T+E|T
T → int * T | int | ( E )

Operator Precedence and Associativity:

Define the precedence and associativity of operators explicitly.

Enforces precedence of * over +

Compiler Design 1 (2011) 39

Modifying Grammar

E → E + E | E * E | (E) | id
This grammar is ambiguous because the expression id + id * id can have
multiple parse trees, leading to different interpretations (e.g.,
left-associative vs. right-associative parsing).

E→E+T|T
T→T*F|F
F → (E) | id

In this grammar:
•+ has lower precedence than *.
•+ is left-associative.
•* is left-associative.
Ambiguity: The Dangling Else
• Consider the following grammar

S → if C then S

|if C then S else S

|OTHER

• This grammar is also ambiguous

Compiler Design 1 (2011)

The Dangling Else: Example

• The expression
if C1 then if C2 then S3 else S4
has two parse trees

if if

C1 if S4 C1 if

C2 S3 C 2 S3 S4

• Typically we want the second form

Compiler Design 1 (2011) 42
The Dangling Else: A Fix

• else matches the closest unmatched then

• We can describe this in the grammar

S→ /* all then are matched */

MIF /* some then are unmatched */
| →
MIF UIF
if C then MIF else MIF
| OTHER
UIF → if C then S
| if C then MIF else UIF

• Describes the same set of strings

Compiler Design 1 (2011) 43

The Dangling Else: Example Revisited

• The expression if C1 then if C2 then S3 else S4

if if

C1 if C1 if S4

C2 S3 S4 C 2 S3

• A valid parse tree • Not valid because the

(for a UIF) then expression is
not a MIF

Compiler Design 1 (2011) 44

Ambiguity

• No general techniques for handling ambiguity

• Impossible to convert automatically an

ambiguous grammar to an unambiguous one

• Used with care, ambiguity can simplify the

grammar
– Sometimes allows more natural definitions
– We need disambiguation mechanisms

Compiler Design 1 (2011) 45

Precedence and Associativity Declarations

• Instead of rewriting the grammar

– Use the more natural (ambiguous) grammar
– Along with disambiguating declarations

• Most tools allow precedence and associativity

declarations to disambiguate grammars

• Examples …

Compiler Design 1 (2011) 46

Associativity Declarations

• Consider the grammar E → E + E | int

• Ambiguous: two parse trees of int + int + int

E E

E + E E + E

E + E int int E + E

int int int int

• Left associativity declaration: %left +

Compiler Design 1 (2011) 47

Precedence Declarations

• Consider the grammar E → E + E | E * E | int

– And the string int + int * int

E E

E * E E + E

E + E int int E * E

int int int int

• Precedence declarations: %left
+
%left *
Compiler Design 1 (2011) 48
Grammar
1.X -> X - X
2.X -> var/const
Here var can be any variable, and const can be any constant value. A
string a - b - c has two leftmost derivations:

1.X -> X - X 1.X -> X - X

2. X - X - X 2. var - X - X
3. var - var - var 3. a - var - var
4. a - b - c 4. a-b-c
For example, if we take the values a = 2, b = 3 and c = 4:
a - b - c = 2 - 3 - 4 = -5
In the first derivation tree, according to the order of substitution,
the expression will be evaluated as:
(a - b) - c = (2 - 3) - 4 = -1 -4 = -5
In the second derivation tree: a - (b - c) = 2 - (3 - 4) = 2 - -1 = 3
Observe that both parse trees aren't giving the same value. They
have different meanings. In the above example, the first derivation
tree is the correct parse tree for grammar.

(a - b) - c. Here there are two same

operators in the expression. According
to mathematical rules, the expression
must be evaluated based on the
associativity of the operator
Grammar:
E -> E + E Input string: id + id* id
E -> E * E
E -> id
The leftmost derivation can be done in
two ways:
1.E -> E + E 1.E -> E * E
2.id + E If id = 2: 2. E + E * E
3.id + E * E If + id * id = 2 + 2 * 2 = 6 3. id + E * E
4.id + id * E 4. id + id * E
5.id + id * id 5. id + id * id

id + (id * id) = 2 + (2 * 2) = 2 + 4 = 6 (id + id) * id = (2 + 2) * 2 = 4*2 = 8

Verilog Code For Basic Gates and Test Bench
No ratings yet
Verilog Code For Basic Gates and Test Bench
25 pages
BLANKDTR
No ratings yet
BLANKDTR
2 pages
Unit-3 Context Free Grammar
No ratings yet
Unit-3 Context Free Grammar
57 pages
Computer Organization Notes
No ratings yet
Computer Organization Notes
126 pages
Rmo Solu 97
No ratings yet
Rmo Solu 97
9 pages
Unit-2 Syntax Analysis
No ratings yet
Unit-2 Syntax Analysis
27 pages
Ajp Practical 20
100% (1)
Ajp Practical 20
4 pages
Com 111
No ratings yet
Com 111
3 pages
Python Basics and Concepts Guide
No ratings yet
Python Basics and Concepts Guide
13 pages
Chennai Set2
No ratings yet
Chennai Set2
9 pages
Aditya Singh: CS & Math Graduate Resume
No ratings yet
Aditya Singh: CS & Math Graduate Resume
1 page
ContextFreeGrammars Myppt
No ratings yet
ContextFreeGrammars Myppt
41 pages
Lecture 01
No ratings yet
Lecture 01
47 pages
Decimal Fractions
No ratings yet
Decimal Fractions
9 pages
Context-Free Languages & Grammars Explained
No ratings yet
Context-Free Languages & Grammars Explained
40 pages
AI
0% (1)
AI
7 pages
Lecture 02
No ratings yet
Lecture 02
150 pages
Accenture Abstract Reasoning Memory Based Paper 2020
No ratings yet
Accenture Abstract Reasoning Memory Based Paper 2020
6 pages
Lecture 05
No ratings yet
Lecture 05
58 pages
Chapter 4 Intro - To - Parsing
No ratings yet
Chapter 4 Intro - To - Parsing
53 pages
Multimedia Application L4
No ratings yet
Multimedia Application L4
42 pages
Generics in Java
No ratings yet
Generics in Java
3 pages
Chapter 2
No ratings yet
Chapter 2
47 pages
Automata Lectuee5
No ratings yet
Automata Lectuee5
33 pages
Lecture 08
No ratings yet
Lecture 08
36 pages
Noteartificial Intelligence
No ratings yet
Noteartificial Intelligence
23 pages
Principles of Programming Language
No ratings yet
Principles of Programming Language
44 pages
Compiler Unit Ii
No ratings yet
Compiler Unit Ii
67 pages
Year 11 Math Scope & Sequence
No ratings yet
Year 11 Math Scope & Sequence
1 page
10 Computer Applications
No ratings yet
10 Computer Applications
6 pages
OOD MODULE 1 Notes (SET1)
No ratings yet
OOD MODULE 1 Notes (SET1)
4 pages
CS6109 Module 4
No ratings yet
CS6109 Module 4
36 pages
A Star
No ratings yet
A Star
8 pages
Compiler Questions
No ratings yet
Compiler Questions
50 pages
Lecture 03
No ratings yet
Lecture 03
36 pages
Context Free Grammars
No ratings yet
Context Free Grammars
40 pages
(Week 3) Syntax Analysis (Derivation)
No ratings yet
(Week 3) Syntax Analysis (Derivation)
46 pages
Compiler Design: Syntactic Analysis
No ratings yet
Compiler Design: Syntactic Analysis
96 pages
CS61B Homework 1 Help Guide
100% (1)
CS61B Homework 1 Help Guide
7 pages
Context-Free Grammar Basics
No ratings yet
Context-Free Grammar Basics
57 pages
Confuse
No ratings yet
Confuse
14 pages
Lecture 4 - Syntax Analysis
No ratings yet
Lecture 4 - Syntax Analysis
66 pages
Simple Syntax Directed Translation
No ratings yet
Simple Syntax Directed Translation
51 pages
Lec4 SyntaxAnalysis
No ratings yet
Lec4 SyntaxAnalysis
41 pages
Soal 1: Algoritma Penentuan Grade Nilai (Huruf Mutu) Mahasiswa
No ratings yet
Soal 1: Algoritma Penentuan Grade Nilai (Huruf Mutu) Mahasiswa
5 pages
4 Parsing
No ratings yet
4 Parsing
32 pages
Compiler Construction Week 04 Syntax Analysis I)
No ratings yet
Compiler Construction Week 04 Syntax Analysis I)
41 pages
Unit 2
No ratings yet
Unit 2
168 pages
ContextFreeGrammars
No ratings yet
ContextFreeGrammars
28 pages
CC Lec 7
No ratings yet
CC Lec 7
16 pages
Assets, Images, and Icon Widgets in Flutter
No ratings yet
Assets, Images, and Icon Widgets in Flutter
8 pages
CH2-1 To CH2-3
No ratings yet
CH2-1 To CH2-3
79 pages
CD Unit-2 (R20)
No ratings yet
CD Unit-2 (R20)
38 pages
2019-11-29 04 41 39CS V Sem Compiler Design
No ratings yet
2019-11-29 04 41 39CS V Sem Compiler Design
10 pages
Java Exception Handling Q&A Guide
No ratings yet
Java Exception Handling Q&A Guide
6 pages
Gramatici Exemplu
No ratings yet
Gramatici Exemplu
45 pages
Java Abstraction: Interfaces & Classes
No ratings yet
Java Abstraction: Interfaces & Classes
11 pages
Chapter 3 Syntax Analysis (Parsing)
No ratings yet
Chapter 3 Syntax Analysis (Parsing)
29 pages
Context Free Grammars
No ratings yet
Context Free Grammars
39 pages
Chapter 3 - Syntax Analysis Part One
No ratings yet
Chapter 3 - Syntax Analysis Part One
10 pages
Lecture 9
No ratings yet
Lecture 9
22 pages
Compiler Construction Week 4
No ratings yet
Compiler Construction Week 4
16 pages
08 CFG
No ratings yet
08 CFG
41 pages
17 CFGremove Ambiguity Optional
No ratings yet
17 CFGremove Ambiguity Optional
30 pages
Parsing Part - 1
No ratings yet
Parsing Part - 1
53 pages
Compilers - Week 3
No ratings yet
Compilers - Week 3
17 pages
TOC Paper 4
No ratings yet
TOC Paper 4
4 pages
Syntax Analysis: Chapter - 4
No ratings yet
Syntax Analysis: Chapter - 4
41 pages
Coa by Eshaan
No ratings yet
Coa by Eshaan
64 pages
Chapter 7 Exercises
No ratings yet
Chapter 7 Exercises
40 pages
C Programming Lab Exercises
No ratings yet
C Programming Lab Exercises
5 pages
Context-Free Grammar (CFG) : Dr. Nadeem Akhtar
No ratings yet
Context-Free Grammar (CFG) : Dr. Nadeem Akhtar
56 pages
On-The-Job Training Diary: Day No. Date Time Devoted (In HRS) Task Description/Learning
No ratings yet
On-The-Job Training Diary: Day No. Date Time Devoted (In HRS) Task Description/Learning
1 page
Compiler Syntax & Yacc Guide
No ratings yet
Compiler Syntax & Yacc Guide
21 pages
Module1 1
No ratings yet
Module1 1
20 pages
Automata & Compiler Design Guide
No ratings yet
Automata & Compiler Design Guide
56 pages
Context Free Grammars
No ratings yet
Context Free Grammars
36 pages
Assignment 2
No ratings yet
Assignment 2
8 pages
Formal Languages and Automata Theory: CH 4: Context Free Languages
No ratings yet
Formal Languages and Automata Theory: CH 4: Context Free Languages
59 pages
Chapter 3 Syntax Analysis (Parsing)
No ratings yet
Chapter 3 Syntax Analysis (Parsing)
29 pages
Principles of Programming Languages: Syntax Analysis
100% (1)
Principles of Programming Languages: Syntax Analysis
51 pages
4 - Syntax Analyzer (CFG)
No ratings yet
4 - Syntax Analyzer (CFG)
41 pages
Compiler Design - Syntax Analysis
No ratings yet
Compiler Design - Syntax Analysis
6 pages
CFG (31 34)
No ratings yet
CFG (31 34)
78 pages
Context Free Grammars
No ratings yet
Context Free Grammars
40 pages
Compiler 3
No ratings yet
Compiler 3
11 pages
Compiler Lecture 4
No ratings yet
Compiler Lecture 4
17 pages
Entrepreneurship Process
No ratings yet
Entrepreneurship Process
22 pages
Hopefully Today's Lecture: Context Free Grammar (CFG)
No ratings yet
Hopefully Today's Lecture: Context Free Grammar (CFG)
32 pages
Context-Free Grammars in Compiler Design
No ratings yet
Context-Free Grammars in Compiler Design
35 pages
Lecture 3 Compiler Design
No ratings yet
Lecture 3 Compiler Design
12 pages