0% found this document useful (0 votes)

78 views36 pages

G52Cmp Compilers: Syntax Analysis

The document discusses syntax analysis in compilers. It describes the process of checking the syntax of a program using context free grammars. Context free grammars can expressively describe the syntax of programming languages using terminals, nonterminals, productions and a start symbol. Parsing is the process of determining if an input token stream satisfies the grammar using techniques like top-down and bottom-up parsing. Ambiguities in grammars need to be resolved for accurate parsing.

Uploaded by

Kim Mey

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

78 views36 pages

G52Cmp Compilers: Syntax Analysis

Uploaded by

Kim Mey

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 36

G52CMP COMPILERS

Syntax Analysis
Lecture 5

1
Learning objectives

• To describe the process of checking the syntax of a

program
• To describe the representation of a syntax for a
programming language using context free grammar
• To understand the acceptance of a sentence for a
grammar using derivation
• To understand the problem of ambiguity in grammars
and how to resolve it
• To describe the top-down and bottom-up parsing
processes in accepting sentences for a given grammar

2
Source Program
Syntax Analysis
Lexical Analyser

Syntax Analyser

Symbol Table Semantic Analyser
Manager Error
Handler
Intermediate Code
Generator

Code Optimiser

Code Generator

Target Program
3
Grouping of Tokens

4
Where is the Syntax Analysis?
if (b == 0) a = b;

Lexical Analysis or Scanner

if ( b == 0 ) a = b ;

Syntax Analysis or Parsing

if
abstract syntax tree
== = or parse tree
b 0 a b without token names
// inorder traversal
5
The Role of the Parser

token
source Lexical Semantic intermediate
program Analyzer Analyzer
Parser syntax representation
get next tree
token

symbol
table

6
Parsing Analogy (for English language)
• Syntax analysis for natural languages
• Recognize whether a sentence is grammatically correct
• Identify the function of each word

sentence

subject verb indirect object object

I gave him noun phrase

article noun
“I gave him the book”
the book

7
Parsing Overview
• Goal – determine if the input token stream satisfies
the syntax of the program
• What do we need to do this?
– An expressive way to describe the syntax
– A mechanism that determines if the input token
stream satisfies the syntax description
• For lexical analysis
– Regular expressions describe tokens
– Finite automata have the mechanisms to generate
tokens from input stream

8
Use Regular Expressions
• REs can expressively describe tokens
– Easy to implement via DFAs
• So just use them to describe the syntax of a
programming language ???
– NO! – They don’t have enough power to express any
non-trivial syntax. Example: Nested constructs (blocks,
expressions, statements) such as to detect balanced
braces:
{ { { { {
{{} {} {{} { }}}
...
} } } } }

9
Context-Free Grammars
// production rules or
// grammar
• Consist of 4 components/properties:
SaSa
– Terminal symbols = token or 
ST
– Nonterminal symbols = syntactic variables
TbTb
– Start symbol S = special non-terminal
– Productions of the form LHSRHS T
• LHS = single non-terminal
• RHS = string of ‘terminals and non-terminals’
• Specify how non-terminals may be expanded
• Language generated by a grammar is the set of strings of
terminals derived from the start symbol by repeatedly
applying the productions
– L(G) = language generated by grammar G

10
Context-Free Grammars continued

 A set of terminals: basic symbols from which

sentences are formed
 A set of nonterminals: syntactic categories
denoting sets of sentences
 A set of productions: rules specifying how the
terminals and nonterminals can be combined to
form sentences
 The start symbol: a distinguished nonterminal
denoting the language

11
Notational Conventions
To avoid always having to state that "these are the
terminals”, "these are the nonterminals”, and so on, we
shall employ the following notational conventions with
regard to grammars throughout the remainder of this
subject
Terminals: id, +, -, *, /, (, )
Nonterminals: expr, op
Productions:
expr  expr op expr

expr  ( expr )
expr  - expr
expr  id
op  + | - | * | /
The start symbol: expr
12
Notational Conventions continued
l. These Symbols are Terminals:
i) Lower-case letters early in the alphabet such as a, b, c.
ii) Operator symbols such as +, -, etc.
iii) Punctuation symbols such as parentheses, comma, etc.
iv) The digits 0, 1, . . . , 9.
v) Boldface strings such as id or if.

2. These Symbols are Nonterminals:

i) Upper-case letters early in the alphabet such as A, B, C.
ii) The letter S, when it appears, is usually the start symbol.
iii) Lower-case italic names such as expr or stmt.

13
Notational Conventions continued

3. Upper-case letters late in the alphabet, such as X, Y, Z,

represent grammar symbols, that is, either nonterminals
(normally) or terminals.
4. Lower-case letters late in the alphabet, chiefly u, v, . . . , z,
represent strings of terminals.
5. Lower-case Greek letters, , , , … , represent strings of
grammar symbols. Thus, a generic production could be written
as A   , indicating that there is a single nonterminal A on
the left of the arrow (the left side of the production) and a
string of grammar symbols  to the right of the arrow (the
right side of the production).

14
Notational Conventions continued

6. If A  1, A  2, . . . , A  k are all productions with A on

the left (we call them A-productions),
we may write A  1 | 2 | … | k. We call 1, 2, . . . , k the
alternatives for A.
7. The left side of the first production is the start symbol.
E  E A E | ( E ) | - E | id
A + |- |*|/

15
Derivations

A derivation step is an application of a production as a

rewriting rule
E-E
A sequence of derivation steps
E  - E  - ( E )  - ( id )
is called a derivation of “- ( id )” from E
The symbol  * denotes: derives in zero or more steps
+
The symbol  denotes: derives in one or more steps

E  - *( id ) E  - ( id+)

1.   *  for any string  // reflexive relation

2. If  *  and  , then  * 
  // transitive relation

16
example
• Grammar for balanced-parentheses language
– S(S)S
– S // Why is the final S required ?
• 1 non-terminal: S
• 2 terminals: “(”, “)”
• Start symbol: S
• 2 productions
• If grammar accepts a string, there is a derivation of that
string using the productions. For examples:
– S = (S) S = (S)  = ((S) S)  = (()  )  = (())
– S = (S) S = ((S)S) S = ((S)S) (S)S = (()) () = (()) ()

17
More on CFGs
• Shorthand notation – vertical bar for multiple
productions
SaSa|T
TbTb|
• CFGs powerful enough to expression the syntax in
most programming languages
• Derivation = successive application of productions
starting from S
• Acceptance? = Determine if there is a derivation for
an input token stream

18
RE is a Subset of CFG
Can inductively build a grammar for each RE
 S
a Sa
R1 R2 S  S1 S2
R1 | R2 S  S1 | S2
R1* S  S1 S | 

Note:
G1 = grammar for RE1, with start symbol S1
G2 = grammar for RE2, with start symbol S2

19
Context-Free Languages

A context-free language L(G) is the language defined by

a context-free grammar G
A string of terminals  is in L(G) if and only if S + ,
 is called a sentence of G
If S * , where  may contain nonterminals, then we
call  a sentential form of G
E  - E  - ( E )  - ( id )
G1 is equivalent to G2 if L(G1) = L(G2)

20
A Parser
Context free
grammar, G Yes, if s in L(G)
Parser
No, otherwise
Token stream, s
(from lexer)
Error messages

Syntax analyzers (parsers) = CFG acceptors which also

output the corresponding derivation when the token stream
is accepted

Various kinds of parsing: LL(k), LR(k), SLR, LALR

21
Left- & Right-most Derivations
Each derivation step needs to choose
a nonterminal to rewrite
a production to apply
Consider the following grammar:
E  E A E | ( E ) | - E | id
A +|-|*|/
A leftmost derivation always chooses the
leftmost nonterminal to rewrite
E lm - E lm - ( E ) lm - ( E A E ) lm - ( E + E )
lm - ( id + E ) lm - ( id + id )

A rightmost derivation always chooses the

rightmost nonterminal to rewrite
E rm - E rm - ( E ) rm - ( E A E ) rm - ( E + E )
rm - (E + id ) rm - ( id + id ) 22
Parse Trees
* A parse tree is a graphical representation for a derivation that
filters out the order of choosing nonterminals for rewriting
* Many derivations may correspond to the same parse tree, but
every parse tree has associated with it a unique leftmost and a
unique rightmost derivation

E // rightmost derivation
// leftmost derivation
E lm - E - E Erm - E
lm - ( E ) rm - ( E )
lm - ( E + E ) ( E ) rm - ( E + E )
lm - ( id + E ) rm - (E + id )
lm - ( id + id ) E + E rm - ( id + id )

id id 23
Parse Tree vs Abstract Syntax Tree
S SE+S | E +
E + S E  number | (S)
+ 5
Derive: (1 + 2 + (3 + 4)) + 5
( S ) E 1 +
• Parse tree is a tree
2 +
E + S 5 representation of the derivation
• Leaves of the tree are terminals
3 4
1 E + S • Internal nodes are non-
AST (abstract
terminals
syntax tree) discards
2 E • NO information about the order
unneeded
of the derivation steps
( S ) information, more
compact format,
E + S inorder traversal

3 E

4 24
Derivation Order
• Can choose to apply productions in any order,
select non-terminal and
substitute RHS of production
• Two standard orders: left and right-most
• Consider the following grammar:
SE+S | E
E  number | (S)
• Leftmost derivation
– In the string, find the leftmost non-terminal and apply a
production to it
– E+S1+S
• Rightmost derivation
– Same, but find rightmost non-terminal
– E+SE+E+S

25
Leftmost & Rightmost Derivation
Leftmost derive: SE+S|E Rightmost derive:
(1 + 2 + (3 + 4)) + 5 E  number | (S) (1 + 2 + (3 + 4)) + 5
SE+S S  E+S
 (S)+S  E+E
 (E+S) + S  E+5
 (1+S)+S  (S)+5
 (1+E+S)+S  (E+S)+5
 (1+2+S)+S  (E+E+S)+5
 (1+2+E)+S  (E+E+E)+5
 (1+2+(S))+S  (E+E+(S))+5
 (1+2+(E+S))+S  (E+E+(E+S))+5
 (1+2+(3+S))+S  (E+E+(E+E))+5
 (1+2+(3+E))+S  (E+E+(E+4))+5
 (1+2+(3+4))+S  (E+E+(3+4))+5
 (1+2+(3+4))+E  (E+2+(3+4))+5
 (1+2+(3+4))+5  (1+2+(3+4))+5
Result: Same parse tree, same productions chosen, but in different order
26
Ambiguous Grammar
* A grammar is ambiguous if it produces more than
one parse tree for some sentence
EE+E EE*E
 id + E EE*E E+E*E
 id + E * E |E+E  id + E * E
 id + id * E | id  id + id * E
 id + id * id Derive: id + id * id  id + id * id
E E

E + E E * E

id E * E E + E id

id id id id 27
Resolving Ambiguity

* Use disambiguating rules to throw away undesirable

parse trees

* Rewrite grammars by incorporating disambiguating

rules into grammars

28
Example
The dangling-else grammar
stmt  if expr then stmt
| if expr then stmt else stmt
| other
Which parse tree for if E1 then if E2 then S1 else S2 ?
S

if E then S else S
if E then S
S
if E then S
if E then S else S
29
Incorporating Disambiguating Rules
into Grammars
* Rule: match each “else” with the closest previous unmatched
“then”
* Remove undesired state transitions in the pushdown automaton
Which parse tree for if E1 then if E2 then S1 else S2 ?
S

if E then S else S
if E then S
S
if E then S
if E then S else S
30
Grammar Rewriting
Consider the ambiguous grammar below:
stmt  if expr then stmt
| if expr then stmt else stmt
| other

Incorporating Disambiguating Rules into a Grammar:

stmt  m_stmt
Try to derive by yourself:
| unm_stmt if E then if E then S else S

m_stmt  if expr then m_stmt else m_stmt

| other

unm_stmt  if expr then stmt

| if expr then m_stmt else unm_stmt 31
Different kinds of Parsing Algorithms
• Two big groups of algorithms can be distinguished:
– bottom up strategies
– top down strategies
• Analogy: parsing of “Micro-English” grammar below:
Sentence
Sentence ::=
::= Subject
Subject Verb
Verb Object
Object ..
Subject
Subject ::=
::= II || AA Noun
Noun || The
The Noun
Noun
Object
Object ::=
::= me
me || aa Noun
Noun || the
the Noun
Noun
Noun
Noun ::=
::= cat
cat || bat
bat || rat
rat
Verb
Verb ::=
::= like
like || isis || see
see || sees
sees

The cat sees a rat. The rat sees me. I like a cat.
The cat sees the rat. The rat like me. I see the rat.
I sees a rat.
I like the rat.
// Shhh… I don’t like rat. 32
Bottom up parsing
The parse tree “grows” from the bottom (leafs) up to the top (root).

Sentence

Subject Object

Noun Verb Noun

The cat sees a rat .

33
Top-down parsing
The parse tree is constructed starting at the top (root) down to the
bottom (leafs).

Sentence

Subject Verb Object

Noun Noun

The cat sees a rat .

34
Quick Review
• Syntactic analysis
– Lexical analysis
• Group letters into words (or group characters into
tokens)
• Use regular expressions and deterministic FSM’s
– Grammar transformations before parsing, coming soon
• Left-recursion removal
• Left-factoring
• Substitution
– Parsing = structural analysis of program
• Group words into sentences, paragraphs, and
documents (or tokens into expressions, commands, and
programs)
• Top-down Parsing
• Bottom-up Parsing
35
END

Concept Attainment Lesson Plan
No ratings yet
Concept Attainment Lesson Plan
7 pages
Dcom PDF
No ratings yet
Dcom PDF
5 pages
Auditory Scene Analysis Explained
No ratings yet
Auditory Scene Analysis Explained
24 pages
Compiler Design 3
No ratings yet
Compiler Design 3
140 pages
Compiler Design CS - 4
No ratings yet
Compiler Design CS - 4
70 pages
Syntax Analysis: EECS 483 - Lecture 4 University of Michigan Monday, September 17, 2006
No ratings yet
Syntax Analysis: EECS 483 - Lecture 4 University of Michigan Monday, September 17, 2006
28 pages
Chapter 3
No ratings yet
Chapter 3
41 pages
Lecture 5
No ratings yet
Lecture 5
28 pages
Chapter - Three
No ratings yet
Chapter - Three
139 pages
Compiler Design Chapter-3
0% (1)
Compiler Design Chapter-3
177 pages
Lecture 1 Introduction DR Raheel 19022024 032426pm
No ratings yet
Lecture 1 Introduction DR Raheel 19022024 032426pm
32 pages
Chapter-3 So Far
No ratings yet
Chapter-3 So Far
50 pages
Chapter - Three: Syntax Analysis
No ratings yet
Chapter - Three: Syntax Analysis
100 pages
Class Three
No ratings yet
Class Three
74 pages
(Week 4) Syntax Analysis (CFG)
No ratings yet
(Week 4) Syntax Analysis (CFG)
50 pages
Chapter 3 (Updated)
No ratings yet
Chapter 3 (Updated)
165 pages
Copch 3
No ratings yet
Copch 3
90 pages
Module 2 C D Notes
No ratings yet
Module 2 C D Notes
21 pages
Chapter 4 - Context-Free Grammars and Languages
No ratings yet
Chapter 4 - Context-Free Grammars and Languages
60 pages
CS6109 Module 4
No ratings yet
CS6109 Module 4
36 pages
Parser Lec1
No ratings yet
Parser Lec1
20 pages
Lecture 03
No ratings yet
Lecture 03
36 pages
Syntax Analysis for Programmers
No ratings yet
Syntax Analysis for Programmers
58 pages
Compiler Design Lec-Three Syntax Analysis
No ratings yet
Compiler Design Lec-Three Syntax Analysis
60 pages
Chapter-3-Syntax Analysis
No ratings yet
Chapter-3-Syntax Analysis
126 pages
CH 6
No ratings yet
CH 6
18 pages
CH03
No ratings yet
CH03
57 pages
Context-Free Grammar Basics
No ratings yet
Context-Free Grammar Basics
57 pages
Unit-2 Context Free Grammer (TOC)
No ratings yet
Unit-2 Context Free Grammer (TOC)
100 pages
Context-Free Grammars
No ratings yet
Context-Free Grammars
70 pages
Chapter 3 Syntax Analyzer
No ratings yet
Chapter 3 Syntax Analyzer
46 pages
Parsing for Programmers
No ratings yet
Parsing for Programmers
9 pages
Lesson 3: Syntax Analysis: Risul Islam Rasel
No ratings yet
Lesson 3: Syntax Analysis: Risul Islam Rasel
106 pages
Chapter 3
No ratings yet
Chapter 3
180 pages
Chapter 3 Syntax Analysis (Parsing)
No ratings yet
Chapter 3 Syntax Analysis (Parsing)
29 pages
Chapter 3 (Part 1)
No ratings yet
Chapter 3 (Part 1)
33 pages
CH2-1 To CH2-3
No ratings yet
CH2-1 To CH2-3
79 pages
Chapter 3 - Syntax Analysis Part One
No ratings yet
Chapter 3 - Syntax Analysis Part One
17 pages
Context-Free Grammars: Formalism Derivations Backus-Naur Form Left-And Rightmost Derivations
No ratings yet
Context-Free Grammars: Formalism Derivations Backus-Naur Form Left-And Rightmost Derivations
40 pages
Compiler Construction Week 04 Syntax Analysis I)
No ratings yet
Compiler Construction Week 04 Syntax Analysis I)
41 pages
Chapter 3 Syntax Analysis
No ratings yet
Chapter 3 Syntax Analysis
78 pages
2.2 - Syntax Analysis (Upto Top-Down Parsing)
No ratings yet
2.2 - Syntax Analysis (Upto Top-Down Parsing)
91 pages
Chapter 3 Syntax Analysis (Parsing)
No ratings yet
Chapter 3 Syntax Analysis (Parsing)
29 pages
Syntax Analysis and Parsing Guide
No ratings yet
Syntax Analysis and Parsing Guide
105 pages
Module 2
No ratings yet
Module 2
19 pages
1 Syntax Analyzer
No ratings yet
1 Syntax Analyzer
33 pages
Syntax Analysis in Compiler Design
No ratings yet
Syntax Analysis in Compiler Design
16 pages
Unit2 TopDownParsing
No ratings yet
Unit2 TopDownParsing
12 pages
Chapter 2
No ratings yet
Chapter 2
47 pages
Context Free Grammar (CFG)
No ratings yet
Context Free Grammar (CFG)
18 pages
CC-Lec 5 Week 5 Cfgs
No ratings yet
CC-Lec 5 Week 5 Cfgs
29 pages
4th - Syntax Analysis
No ratings yet
4th - Syntax Analysis
29 pages
Chapter 6
No ratings yet
Chapter 6
52 pages
Chapter 3 - Syntax Analysis Part One
No ratings yet
Chapter 3 - Syntax Analysis Part One
10 pages
CH2 1
No ratings yet
CH2 1
27 pages
1 Syntax Analyzer
No ratings yet
1 Syntax Analyzer
33 pages
CST302 - Compiler - Design - Module 2
No ratings yet
CST302 - Compiler - Design - Module 2
19 pages
Chapter 3 - Syntax Analysis
No ratings yet
Chapter 3 - Syntax Analysis
67 pages
KCA015 Unit2
No ratings yet
KCA015 Unit2
29 pages
Chapter 3 Syntax Analysis I
No ratings yet
Chapter 3 Syntax Analysis I
27 pages
CD Unit 2
100% (1)
CD Unit 2
20 pages
Lecture 01 - Finite Automat I PDF
No ratings yet
Lecture 01 - Finite Automat I PDF
28 pages
Context-Free Grammars & CNF Conversion
No ratings yet
Context-Free Grammars & CNF Conversion
23 pages
TIC 2151 - Theory of Computation: Church-Turing Thesis & Computability
No ratings yet
TIC 2151 - Theory of Computation: Church-Turing Thesis & Computability
17 pages
TIC 2151 - Theory of Computation: Decidability
100% (1)
TIC 2151 - Theory of Computation: Decidability
14 pages
Performance Appraisal and Career Strategy
100% (1)
Performance Appraisal and Career Strategy
2 pages
Proiect Lectie Clasa A 10 A Bun Nou Actualizat PT Insp
No ratings yet
Proiect Lectie Clasa A 10 A Bun Nou Actualizat PT Insp
5 pages
NSTA STEM Education Position Statement
No ratings yet
NSTA STEM Education Position Statement
7 pages
Fluency Week Lesson Plan
No ratings yet
Fluency Week Lesson Plan
2 pages
Psycholinguistic Aspects of Interlanguage-1
No ratings yet
Psycholinguistic Aspects of Interlanguage-1
12 pages
PR1 Script
No ratings yet
PR1 Script
3 pages
Cabes-Consolidated Idp - S.Y. 2025-2026
No ratings yet
Cabes-Consolidated Idp - S.Y. 2025-2026
3 pages
GRADES 1 To 12 Daily Lesson Log
No ratings yet
GRADES 1 To 12 Daily Lesson Log
2 pages
Reading From A Manuscript
No ratings yet
Reading From A Manuscript
47 pages
2 HR - Course Plan - EAP 3A - Big English Plus 5
No ratings yet
2 HR - Course Plan - EAP 3A - Big English Plus 5
11 pages
IMSA Enrollment & Retention Insights
100% (1)
IMSA Enrollment & Retention Insights
18 pages
Chapter 1 and 2
No ratings yet
Chapter 1 and 2
13 pages
2182503643 - Nguyễn Văn Tâm INTERNSHIP REPORT-1
No ratings yet
2182503643 - Nguyễn Văn Tâm INTERNSHIP REPORT-1
15 pages
MGMT-591 Course Project
0% (2)
MGMT-591 Course Project
53 pages
Reggio Emilia Brochure
No ratings yet
Reggio Emilia Brochure
3 pages
Customer Feedback at Big Bazaar Agra: Summer Training Project Report ON
No ratings yet
Customer Feedback at Big Bazaar Agra: Summer Training Project Report ON
5 pages
Math 8 DLL Triangle Inequalities
100% (2)
Math 8 DLL Triangle Inequalities
5 pages
Contoh Teks Debat Ala Parlimen
No ratings yet
Contoh Teks Debat Ala Parlimen
2 pages
Student Leadership Activities Plan
No ratings yet
Student Leadership Activities Plan
7 pages
Action Plan Low Reading Comprehension
No ratings yet
Action Plan Low Reading Comprehension
3 pages
Philosophy and Alternative Education
No ratings yet
Philosophy and Alternative Education
34 pages
2MS-seq-ONE FuLL - by Precious Rose
No ratings yet
2MS-seq-ONE FuLL - by Precious Rose
31 pages
Maryland History Lesson Plan
No ratings yet
Maryland History Lesson Plan
3 pages
Goals of Counseling in General
No ratings yet
Goals of Counseling in General
2 pages
Lesson Plan: Mrs. Vicky Room
No ratings yet
Lesson Plan: Mrs. Vicky Room
4 pages
Researching Cognitive Development in Infancy: April 2016
No ratings yet
Researching Cognitive Development in Infancy: April 2016
26 pages
Business Admin Career Insights
No ratings yet
Business Admin Career Insights
8 pages

G52Cmp Compilers: Syntax Analysis

Uploaded by

G52Cmp Compilers: Syntax Analysis

Uploaded by

G52CMP COMPILERS

• To describe the process of checking the syntax of a

Lexical Analysis or Scanner

Syntax Analysis or Parsing

subject verb indirect object object

 A set of terminals: basic symbols from which

2. These Symbols are Nonterminals:

3. Upper-case letters late in the alphabet, such as X, Y, Z,

6. If A  1, A  2, . . . , A  k are all productions with A on

A derivation step is an application of a production as a

1.   *  for any string  // reflexive relation

A context-free language L(G) is the language defined by

Syntax analyzers (parsers) = CFG acceptors which also

Various kinds of parsing: LL(k), LR(k), SLR, LALR

A rightmost derivation always chooses the

* Use disambiguating rules to throw away undesirable

* Rewrite grammars by incorporating disambiguating

Incorporating Disambiguating Rules into a Grammar:

m_stmt  if expr then m_stmt else m_stmt

unm_stmt  if expr then stmt

Noun Verb Noun

The cat sees a rat .

Subject Verb Object

The cat sees a rat .

You might also like