CSE309N
Chapter 2
A Simple One – Pass Compiler
Chapter 2
CSE309N
The Entire Compilation Process
Grammars for Syntax Definition
Syntax-Directed Translation
Parsing - Top Down & Predictive
Pulling Together the Pieces
The Lexical Analysis Process
Symbol Table Considerations
A Brief Look at Code Generation
Concluding Remarks/Looking Ahead
Chapter 2
CSE309N
Overview
Programming Language can be defined by describing
1. The syntax of the language
1. What its program looks like
2. We use CFG or BNF (Backus Naur Form)
2. The semantics of the language
1. What its program mean
2. Difficult to describe
3. Use informal descriptions and suggestive examples
Chapter 2
CSE309N
Grammars for Syntax Definition
A Context-free Grammar (CFG) Is Utilized to
Describe the Syntactic Structure of a Language
A CFG Is Characterized By:
1. A Set of Tokens or Terminal Symbols
2. A Set of Non-terminals
3. A Set of Production Rules
Each Rule Has the Form
NT {T, NT}*
4. A Non-terminal Designated As
the Start Symbol
Chapter 2
CSE309N
Grammars for Syntax Definition
Example CFG
list list + digit
list list - digit
list digit
digit 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
(the “|” means OR)
(So we could have written
list list + digit | list - digit | digit )
Chapter 2
CSE309N
Information
A string of tokens is a sequence of zero or more tokens.
The string containing with zero tokens, written as , is called
empty string.
A grammar derives strings by beginning with the start symbol
and repeatedly replacing the non terminal by the right side of a
production for that non terminal.
The token strings that can be derived from the start symbol form
the language defined by the grammar.
Chapter 2
CSE309N
Grammars are Used to Derive Strings:
Using the CFG defined on the earlier slide, we can
derive the string: 9 - 5 + 2 as follows:
list list + digit P1 : list list + digit
list - digit + digit P2 : list list - digit
digit - digit + digit P3 : list digit
9 - digit + digit P4 : digit 9
9 - 5 + digit P4 : digit 5
9-5+2 P4 : digit 2
Chapter 2
CSE309N
Grammars are Used to Derive Strings:
This derivation could also be represented via a Parse Tree
(parents on left, children on right)
list list + digit list
list - digit + digit
digit - digit + digit
list + digit
9 - digit + digit - 2
list digit
9 - 5 + digit
9-5+2 5
digit
9
Chapter 2
CSE309N
Defining a Parse Tree
A parse tree pictorially shows how the start symbol of a
grammar derives a string in the language.
More Formally, a Parse Tree for a CFG Has the Following
Properties:
Root Is Labeled With the Start Symbol
Leaf Node Is a Token or
Interior Node Is a Non-Terminal
If A x1x2…xn, Then A Is an Interior; x1x2…xn Are
Children of A and May Be Non-Terminals or Tokens
Chapter 2
CSE309N
Other Important Concepts
Ambiguity
Two derivations (Parse Trees) for the same token string.
string string
-
+ string string
string string
+ string
string - string 2 9 string
9 5 5 2
Grammar:
string string + string | string – string | 0 | 1 | …| 9
Why is this a Problem ?
Chapter 2
CSE309N
Other Important Concepts
Associativity of Operators
Left vs. Right
list right
list + digit letter = right
2 a =
list - digit letter right
5 b
digit letter
9 c
list list + digit | right letter = right | letter
| list - digit | digit letter a | b | c | …| z
digit 0 | 1 | 2 | …| 9
Chapter 2
CSE309N
Embedding Associativity
The language of arithmetic expressions with + -
(ambiguous) grammar that does not enforce
associativity
string string + string | string – string | 0 | 1 | …| 9
non-ambiguous grammar enforcing left
associativity (parse tree will grow to the left)
string string + digit | string - digit | digit
digit 0 | 1 | 2 | …| 9
non-ambiguous grammar enforcing right
associativity (parse tree will grow to the right)
string digit + string | digit - string | digit
digit 0 | 1 | 2 | …| 9
Chapter 2
CSE309N
Other Important Concepts
Operator Precedence
What does ( )
9+5*2 Typically * / is precedence
mean? + - order
This can be expr expr + term | expr – term | term
incorporated term term * factor | term / factor | factor
into a grammar factor digit | ( expr )
via rules: digit 0 | 1 | 2 | 3 | … | 9
Precedence Achieved by:
expr & term for each precedence level
Rules for each are left recursive or associate to the left
Chapter 2
CSE309N
Syntax for Statements
stmt id := expr
| if expr then stmt
| if expr then stmt else stmt
| while expr do stmt
| begin opt_stmts end
Ambiguous Grammar?
Chapter 2
CSE309N
Syntax-Directed Translation
Associate Attributes With Grammar Rules and Translate as Parsing
occurs
The translation will follow the parse tree structure (and as a result the
structure and form of the parse tree will affect the translation).
First example: Inductive Translation.
Infix to Postfix Notation Translation for Expressions
Translation defined inductively as: Postfix(E) where E is an
Expression.
Rules
1. If E is a variable or constant then Postfix(E) = E
2. If E is E1 op E2 then Postfix(E)
= Postfix(E1 op E2) = Postfix(E1) Postfix(E2) op
3. If E is (E1) then Postfix(E) = Postfix(E1)
Chapter 2
CSE309N
Examples
Postfix( ( 9 – 5 ) + 2 )
= Postfix( ( 9 – 5 ) ) Postfix( 2 ) +
= Postfix( 9 – 5 ) Postfix( 2 ) +
= Postfix( 9 ) Postfix( 5 ) - Postfix( 2 ) +
=95–2+
Postfix(9 – ( 5 + 2 ) )
= Postfix( 9 ) Postfix( ( 5 + 2 ) ) -
= Postfix( 9 ) Postfix( 5 + 2 ) –
= Postfix( 9 ) Postfix( 5 ) Postfix( 2 ) + –
=952+–
Chapter 2
CSE309N
Syntax-Directed Definition
Each Production Has a Set of Semantic Rules
Each Grammar Symbol Has a Set of Attributes
For the Following Example, String Attribute “t” is
Associated With Each Grammar Symbol
expr expr – term | expr + term | term
term 0 | 1 | 2 | 3 | … | 9
recall: What is a Derivation for 9 + 5 - 2?
list list - digit list + digit - digit digit + digit - digit
9 + digit - digit 9 + 5 - digit 9 + 5 - 2
Chapter 2
CSE309N
Syntax-Directed Definition (2)
Each Production Rule of the CFG Has a Semantic
Rule
Production Semantic Rule
expr expr + term expr.t := expr.t || term.t || ‘+’
expr expr – term expr.t := expr.t || term.t || ’-’
expr term expr.t := term.t
term 0 term.t := ‘0’
term 1 term.t := ‘1’
…. ….
term 9 term.t := ‘9’
Note: Semantic Rules for expr define t as a
“synthesized attribute” i.e., the various copies of t
obtain their values from “children t’s”
Chapter 2
CSE309N
Semantic Rules are Embedded in Parse Tree
expr.t =95-2+
expr.t =95- term.t =2
expr.t =9 term.t =5
term.t =9
9 - 5 + 2
It starts at the root and recursively visits the children of
each node in left-to-right order
The semantic rules at a given node are evaluated once all
descendants of that node have been visited.
A parse tree showing all the attribute values at each node
is called annotated parse tree. Chapter 2
CSE309N
Translation Schemes
Embedded Semantic Actions into the right sides of
the productions.
A translation scheme is
expr expr + term {print(„+‟)}
like a syntax-directed
definition except the
expr - term {print(„-‟)}
order of evaluation of
term the semantic rules is
term 0 {print(„0‟)} explicitly shown.
term 1 {print(„1‟)}
expr
… {print(„+‟)}
+
term 9 {print(„9‟)} expr term
- {print(„-‟)} 2 {print(„2‟)}
expr term
5 {print(„5‟)}
term
9 {print(„9‟)}
Chapter 2