Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
59 views33 pages

CS335: Syntax Analysis: Swarnendu Biswas

The document provides an overview of syntax analysis in compilers. It discusses the key components of a compiler including the lexical analyzer, syntax analyzer, semantic analyzer, and code generator. It also describes context-free grammars which are used by parsers to analyze the syntax of programming languages. Specifically, it defines context-free grammars, derivations, sentential forms, parse trees, and discusses dealing with ambiguous grammars.

Uploaded by

ewfewfewfwe
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
59 views33 pages

CS335: Syntax Analysis: Swarnendu Biswas

The document provides an overview of syntax analysis in compilers. It discusses the key components of a compiler including the lexical analyzer, syntax analyzer, semantic analyzer, and code generator. It also describes context-free grammars which are used by parsers to analyze the syntax of programming languages. Specifically, it defines context-free grammars, derivations, sentential forms, parse trees, and discusses dealing with ambiguous grammars.

Uploaded by

ewfewfewfwe
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

CS335: Syntax Analysis

Swarnendu Biswas

Semester 2019-2020-II
CSE, IIT Kanpur

Content influenced by many excellent references, see References slide for acknowledgements.
An Overview of Compilation
source program target program

lexical analyzer code generator


symbol table

syntax analyzer error handler code optimizer

intermediate code
semantic analyzer generator

CS 335 Swarnendu Biswas


Parser Interface

token IR
source Lexical Syntax parse Rest of
program get next tree
Analyzer token
Analyzer Front End

symbol table

CS 335 Swarnendu Biswas


Need for Checking Syntax
• Given an input program, scanner generates a stream of tokens
classified according to the syntactic category
• The parser determines if the input program, represented by the token
stream, is a valid sentence in the programming language
• The parser attempts to build a derivation for the input program, using
a grammar for the programming language
• If the input stream is a valid program, parser builds a valid model for later
phases
• If the input stream is invalid, parser reports the problem and diagnostic
information to the user

CS 335 Swarnendu Biswas


Syntax Analysis
• Given a programming language grammar 𝐺 and a stream of tokens 𝑠,
parsing tries to find a derivation in 𝐺 that produces 𝑠
• In addition, a syntax analyser
• Forward the information as IR to the next compilation phases
• Handle errors if the input string is not in 𝐿(𝐺)

CS 335 Swarnendu Biswas


Context-Free Grammars
• A context-free grammar (CFG) 𝐺 is a quadruple (𝑇, 𝑁𝑇, 𝑆, 𝑃)

𝑇 Set of terminal symbols (also called words) in the language 𝐿(𝐺)


𝑁𝑇 Set of nonterminal symbols that appear in the productions of 𝐺
𝑆 Goal or start symbol of the grammar 𝐺
𝑃 Set of productions (or rules) in 𝐺

CS 335 Swarnendu Biswas


Context-Free Grammars
• Terminal symbols correspond to syntactic categories returned by the
scanner
• Terminal symbol is a word that can occur in a sentence
• Nonterminals are syntactic variables introduced to provide
abstraction and structure in the productions
• 𝑆 represents the set of sentences in 𝐿(𝐺)
• Each rule in 𝑃 is of the form 𝑵𝑻 → (𝑻 ∪ 𝑵𝑻)∗

CS 335 Swarnendu Biswas


Definitions
• Derivation is a a sequence of rewriting steps that begins with the
grammar 𝐺’s start symbol and ends with a sentence in the language
+
𝑆 ֜ 𝑤 where 𝑤 ∈ 𝐿(𝐺)
• At each point during derivation process, the string is a collection of
terminal or nonterminal symbols
𝛼𝐴𝛽 → 𝛼𝛾𝛽 if 𝐴 → 𝛾
• Such a string is called a sentential form if it occurs in some step of a
valid derivation
• A sentential form can be derived from the start symbol in zero or
more steps

CS 335 Swarnendu Biswas


Example of a CFG
CFG (𝒂 + 𝒃) × 𝒄

𝐸𝑥𝑝𝑟 → 𝐸𝑥𝑝𝑟 𝐸𝑥𝑝𝑟 → 𝐸𝑥𝑝𝑟 𝑂𝑝 name


| 𝐸𝑥𝑝𝑟 𝑂𝑝 name → 𝐸𝑥𝑝𝑟 × name
| name → (𝐸𝑥𝑝𝑟) × name
𝑂𝑝 → + − × | ÷ → (𝐸𝑥𝑝𝑟 𝑂𝑝 name) × name
→ (𝐸𝑥𝑝𝑟 + name) × name
→ (name + name) × name

CS 335 Swarnendu Biswas


Sentential Form and Parse Tree

𝐸𝑥𝑝𝑟 → 𝐸𝑥𝑝𝑟 𝑂𝑝 name 𝐸𝑥𝑝𝑟

→ 𝐸𝑥𝑝𝑟 × name
𝐸𝑥𝑝𝑟 𝑂𝑝 name
→ (𝐸𝑥𝑝𝑟) × name
→ (𝐸𝑥𝑝𝑟 𝑂𝑝 name) × name ( 𝐸𝑥𝑝𝑟 ) ×

→ (𝐸𝑥𝑝𝑟 + name) × name


𝐸𝑥𝑝𝑟 𝑂𝑝 name
→ (name + name) × name
name +

Parse Tree
CS 335 Swarnendu Biswas
Parse Tree
• A parse tree is a graphical representation of a derivation
• Root is labeled with by the start symbol 𝑆
• Each internal node is a nonterminal, and represents the application of a
production
• Leaves are labeled by terminals and constitute a sentential form, read from
left to right, called the yield or frontier of the tree
• Parse tree filters out the order in which productions are applied to
replace nonterminals
• It just represents the rules applied

CS 335 Swarnendu Biswas


Derivations
• At each step during derivation, we have two choices to make
1. Which nonterminal to rewrite?
2. Which production rule to pick?

• Rightmost (or canonical) derivation rewrites the rightmost


nonterminal at each step, denoted by 𝛼 𝛽
𝑟𝑚
• Similarly, leftmost derivation rewrites the leftmost nonterminal at each step,
denoted by 𝛼 𝛽
𝑙𝑚
• Every leftmost derivation can be written as 𝑤𝐴𝛾 𝑤𝛿𝛾
𝑙𝑚

CS 335 Swarnendu Biswas


Leftmost Derivation

𝐸𝑥𝑝𝑟 → 𝐸𝑥𝑝𝑟 𝑂𝑝 name 𝐸𝑥𝑝𝑟

→ (𝐸𝑥𝑝𝑟) 𝑂𝑝 name
𝐸𝑥𝑝𝑟 𝑂𝑝 name
→ 𝐸𝑥𝑝𝑟 𝑂𝑝 name 𝑂𝑝 name
→ name 𝑂𝑝 name 𝑂𝑝 name ( 𝐸𝑥𝑝𝑟 ) ×

→ name + name 𝑂𝑝 name


𝐸𝑥𝑝𝑟 𝑂𝑝 name
→ name + name × name
name +

Parse Tree
CS 335 Swarnendu Biswas
Ambiguous Grammars
• A grammar 𝐺 is ambiguous if
some sentence in 𝐿(𝐺) has more
than one rightmost (or leftmost)
derivation

• An ambiguous grammar can


produce multiple derivations
and parse trees

CS 335 Swarnendu Biswas


Example of Ambiguous Grammar
• A grammar 𝐺 is ambiguous if some sentence in 𝐿(𝐺) has more than
one rightmost (or leftmost) derivation

• An ambiguous grammar can produce multiple derivations and parse


trees
S𝑡𝑚𝑡 → if 𝐸𝑥𝑝𝑟 then 𝑆𝑡𝑚𝑡
| if 𝐸𝑥𝑝𝑟 then 𝑆𝑡𝑚𝑡 else 𝑆𝑡𝑚𝑡
| 𝐴𝑠𝑠𝑖𝑔𝑛

CS 335 Swarnendu Biswas


Ambiguous Dangling-Else Grammar
if 𝐸𝑥𝑝𝑟1 then if 𝐸𝑥𝑝𝑟2 then 𝐴𝑠𝑠𝑖𝑔𝑛1 else 𝐴𝑠𝑠𝑖𝑔𝑛2

𝑆𝑡𝑚𝑡 𝑆𝑡𝑚𝑡

if 𝐸𝑥𝑝𝑟1 then 𝑆𝑡𝑚𝑡 if 𝐸𝑥𝑝𝑟1 then 𝑆𝑡𝑚𝑡 else 𝑆𝑡𝑚𝑡

if 𝐸𝑥𝑝𝑟2 then 𝑆𝑡𝑚𝑡 else 𝑆𝑡𝑚𝑡 if 𝐸𝑥𝑝𝑟2 then 𝑆𝑡𝑚𝑡

𝐴𝑠𝑠𝑖𝑔𝑛1 𝐴𝑠𝑠𝑖𝑔𝑛2 𝐴𝑠𝑠𝑖𝑔𝑛1 𝐴𝑠𝑠𝑖𝑔𝑛2

CS 335 Swarnendu Biswas


Dealing with Ambiguous Grammars
• Ambiguous grammars are problematic for compilers
• Compilers use parse trees to interpret the meaning of the expressions during
later stages
• Multiple parse trees can give rise to multiple interpretations

• Fixing ambiguous grammars


• Transform the grammar to remove the ambiguity
• Include rules to disambiguate during derivations
• For e.g., associativity and precedence

CS 335 Swarnendu Biswas


Fixing the Ambiguous Dangling-Else Grammar
• In all programming languages, an else is matched with the closest
then

S𝑡𝑚𝑡 → if 𝐸𝑥𝑝𝑟 then 𝑆𝑡𝑚𝑡


| if 𝐸𝑥𝑝𝑟 then 𝑇ℎ𝑒𝑛𝑆𝑡𝑚𝑡 else 𝑆𝑡𝑚𝑡
| 𝐴𝑠𝑠𝑖𝑔𝑛
𝑇ℎ𝑒𝑛𝑆𝑡𝑚𝑡 → if 𝐸𝑥𝑝𝑟 then 𝑇ℎ𝑒𝑛𝑆𝑡𝑚𝑡 else 𝑇ℎ𝑒𝑛𝑆𝑡𝑚𝑡
| 𝐴𝑠𝑠𝑖𝑔𝑛

CS 335 Swarnendu Biswas


Fixed Dangling-Else Grammar
if 𝐸𝑥𝑝𝑟1 then if 𝐸𝑥𝑝𝑟2 then 𝐴𝑠𝑠𝑖𝑔𝑛1 else 𝐴𝑠𝑠𝑖𝑔𝑛2

S𝑡𝑚𝑡 → if 𝐸𝑥𝑝𝑟 then 𝑆𝑡𝑚𝑡


→ if 𝐸𝑥𝑝𝑟 then if 𝐸𝑥𝑝𝑟 then 𝑇ℎ𝑒𝑛𝑆𝑡𝑚𝑡 else 𝑆𝑡𝑚𝑡
→ if 𝐸𝑥𝑝𝑟 then if 𝐸𝑥𝑝𝑟 then 𝑇ℎ𝑒𝑛𝑆𝑡𝑚𝑡 else 𝐴𝑠𝑠𝑖𝑔𝑛
→ if 𝐸𝑥𝑝𝑟 then if 𝐸𝑥𝑝𝑟 then 𝐴𝑠𝑠𝑖𝑔𝑛 else 𝐴𝑠𝑠𝑖𝑔𝑛

CS 335 Swarnendu Biswas


Interpreting the Meaning
CFG 𝒂+𝒃×𝒄

𝐸𝑥𝑝𝑟 → (𝐸𝑥𝑝𝑟) 𝐸𝑥𝑝𝑟 → 𝐸𝑥𝑝𝑟 𝑂𝑝 name


| 𝐸𝑥𝑝𝑟 𝑂𝑝 name → 𝐸𝑥𝑝𝑟 × name
| name → 𝐸𝑥𝑝𝑟 𝑂𝑝 name × name
𝑂𝑝 → + − × | ÷ → 𝐸𝑥𝑝𝑟 + name × name
→ name + name × name
rightmost
derivation

CS 335 Swarnendu Biswas


Corresponding Parse Tree
𝒂+𝒃×𝒄 𝐸𝑥𝑝𝑟

𝐸𝑥𝑝𝑟 → 𝐸𝑥𝑝𝑟 𝑂𝑝 name


𝐸𝑥𝑝𝑟 𝑂𝑝 name
→ 𝐸𝑥𝑝𝑟 × name
→ 𝐸𝑥𝑝𝑟 𝑂𝑝 name × name 𝐸𝑥𝑝𝑟 𝑂𝑝 name ×

→ 𝐸𝑥𝑝𝑟 + name × name


name +
→ name + name × name

How do we evaluate the


expression?

CS 335 Swarnendu Biswas


Associativity
𝑠𝑡𝑟𝑖𝑛𝑔 → 𝑠𝑡𝑟𝑖𝑛𝑔 + 𝑠𝑡𝑟𝑖𝑛𝑔 𝑠𝑡𝑟𝑖𝑛𝑔 − 𝑠𝑡𝑟𝑖𝑛𝑔 0 1 2| … |9

9−5+2
𝑠𝑡𝑟𝑖𝑛𝑔 𝑠𝑡𝑟𝑖𝑛𝑔

𝑠𝑡𝑟𝑖𝑛𝑔 + 𝑠𝑡𝑟𝑖𝑛𝑔 𝑠𝑡𝑟𝑖𝑛𝑔 − 𝑠𝑡𝑟𝑖𝑛𝑔

𝑠𝑡𝑟𝑖𝑛𝑔 − 𝑠𝑡𝑟𝑖𝑛𝑔 2 9 𝑠𝑡𝑟𝑖𝑛𝑔 + 𝑠𝑡𝑟𝑖𝑛𝑔

9 5 5 2

CS 335 Swarnendu Biswas


Associativity
• If an operand has operator on both the sides, the side on which
operator takes this operand is the associativity of that operator
• +, -, *, / are left associative
• ^, = are right associative

• Grammar to generate strings with right associative operators


𝑟𝑖𝑔ℎ𝑡 → 𝑙𝑒𝑡𝑡𝑒𝑟 = 𝑟𝑖𝑔ℎ𝑡|𝑙𝑒𝑡𝑡𝑒𝑟
𝑙𝑒𝑡𝑡𝑒𝑟 → 𝑎 𝑏 … |𝑧

CS 335 Swarnendu Biswas


Parse Tree for Right Associative Grammars
𝑟𝑖𝑔ℎ𝑡
a = b = c
𝑙𝑒𝑡𝑡𝑒𝑟 = 𝑟𝑖𝑔ℎ𝑡

a 𝑙𝑒𝑡𝑡𝑒𝑟 = 𝑟𝑖𝑔ℎ𝑡

b 𝑙𝑒𝑡𝑡𝑒𝑟

CS 335 Swarnendu Biswas


Encode Precedence into the Grammar
𝑆𝑡𝑎𝑟𝑡 → 𝐸𝑥𝑝𝑟
𝐸𝑥𝑝𝑟 → 𝐸𝑥𝑝𝑟 + 𝑇𝑒𝑟𝑚|𝐸𝑥𝑝𝑟 − 𝑇𝑒𝑟𝑚|𝑇𝑒𝑟𝑚

priority
𝑇𝑒𝑟𝑚 → 𝑇𝑒𝑟𝑚 × 𝐹𝑎𝑐𝑡𝑜𝑟|𝑇𝑒𝑟𝑚 ÷ 𝐹𝑎𝑐𝑡𝑜𝑟|𝐹𝑎𝑐𝑡𝑜𝑟
𝐹𝑎𝑐𝑡𝑜𝑟 → (𝐸𝑥𝑝𝑟)|num|name

CS 335 Swarnendu Biswas


Corresponding Parse Tree
𝒂−𝒃+𝒄 𝐸𝑥𝑝𝑟
𝑆𝑡𝑎𝑟𝑡 → 𝐸𝑥𝑝𝑟
𝐸𝑥𝑝𝑟 + 𝑇𝑒𝑟𝑚
→ 𝐸𝑥𝑝𝑟 + 𝑇𝑒𝑟𝑚
→ 𝐸𝑥𝑝𝑟 + 𝐹𝑎𝑐𝑡𝑜𝑟
𝐸𝑥𝑝𝑟 − 𝑇𝑒𝑟𝑚 𝐹𝑎𝑐𝑡𝑜𝑟
→ 𝐸𝑥𝑝𝑟 + name
→ 𝐸𝑥𝑝𝑟 − 𝑇𝑒𝑟𝑚 + name
𝑇𝑒𝑟𝑚 𝐹𝑎𝑐𝑡𝑜𝑟 name
→ 𝐸𝑥𝑝𝑟 − 𝐹𝑎𝑐𝑡𝑜𝑟 + name
→ 𝐸𝑥𝑝𝑟 − name + name 𝐹𝑎𝑐𝑡𝑜𝑟 name
→ 𝑇𝑒𝑟𝑚 − name + name
→ 𝐹𝑎𝑐𝑡𝑜𝑟 − name + name name
→ name − name + name

CS 335 Swarnendu Biswas


Types of Parsers

Top-down
• Starts with the root and grows the tree toward the leaves

Bottom-up
• Starts with the leaves and grow the tree toward the root

Universal
• More general algorithms, but inefficient to use in production compilers

CS 335 Swarnendu Biswas


Error Handling
• The scanner cannot deal with all errors
• Common source of programming errors
• Lexical errors
• For e.g., illegal characters, missing quotes around strings
• Syntactic errors
• For e.g., misspelled keywords, misplaced semicolons or extra or missing braces
• Semantic errors
• For e.g., type mismatches between operators and operands, undeclared variables
• Logical errors

CS 335 Swarnendu Biswas


Handling Errors

Panic-mode recovery
• Parser discards input symbols one at a time until a synchronizing token is
found
• Synchronizing tokens are usually delimiters (for e.g., ; or })

Phrase-level recovery
• Perform local correction on the remaining input
• Can go into an infinite loop because of wrong correction, or the error may have
occurred before it is detected
CS 335 Swarnendu Biswas
Handling Errors

Error productions
• Augment the grammar with productions that generate erroneous constructs
• Works only for common mistakes, complicates the grammar

Global correction
• Given an incorrect input string 𝑥 and grammar 𝐺, find a parse tree for a related
string 𝑦 such that the number of modifications (insertions, deletions, and
changes) of tokens required to transform 𝑥 into 𝑦 is as small as possible
CS 335 Swarnendu Biswas
Context-Free vs Regular Grammar
• CFGs are more powerful than REs
• Every regular language is context-free, but not vice versa
• We can create a CFG for every NFA that simulates some RE

• Language that can be described by a CFG but not by a RE

𝐿 = 𝑎𝑛 𝑏 𝑛 𝑛 ≥ 1}

CS 335 Swarnendu Biswas


Limitations of Syntax Analysis
• Cannot determine whether
• A variable has been declared before use
• A variable has been initialized
• Variables are of types on which operations are allowed
• Number of formal and actual arguments of a function match

• These limitations are handled during semantic analysis

CS 335 Swarnendu Biswas


References
• A. Aho et al. Compilers: Principles, Techniques, and Tools, 2nd edition, Chapters 2 and 4.
• K. Cooper and L. Torczon. Engineering a Compiler, 2nd edition, Chapter 3.

CS 335 Swarnendu Biswas

You might also like