Terminology
Statement ( 敘述 )
» declaration, assignment containing expression ( 運算式 )
Grammar ( 文法 )
» a set of rules specify the form of legal statements
Syntax ( 語法 ) vs. Semantics ( 語意 )
» Example: assuming J,K:integer and X,Y:float
» I:=J+K vs I:=X+Y
Compilation: 編譯
» matching statements written by the programmer to structures d
efined by the grammar and generating the appropriate object c
ode
1
System Software
Assembler
Loader and Linker
Macro Processor
Compiler
Operating System
Other System Software
» RDBS
» Text Editors
» Interactive Debugging System
2
Basic Compiler
Lexical analysis -- scanner
» scanning the source statement, recognizing and classifying
the various tokens
Syntactic analysis -- parser
» recognizing the statement as some language construct
Code generation --
3
Scanner
PROGRAM
STATS
SUM VAR
:= SUM
0 ,
; SUMSQ
SUMSQ ,
:= I
READ
(
VALUE
)
;
4
Parser
Grammar: a set of rules
» Backus-Naur Form (BNF)
» Ex: Figure 5.2
Terminology
» Define symbol ::=
» Nonterminal symbols <>
» Alternative symbols |
» Terminal symbols
5
Simplified Pascal Grammar
6
Parser
READ(VALUE) <read> ::= READ (<id-list>)
<id-list>::= id | <id-list>,id
SUM := 0 <assign>::= id := <exp>
<exp> ::= <term> |
SUM := SUM + VALUE <exp>+<term> |
<exp>-<term>
<term>::=<factor> |
MEAN := SUM DIV 100
<term>*<factor> | <term> DIV <f
actor>
<factor>::= id | int | <exp>
7
Syntax Tree
8
Syntax Tree for Program 5.1
9
Lexical Analysis
Function
» scanning the program to be compiled and recognizing the to
kens that make up the source statements
Tokens
» Tokens can be keywords, operators, identifiers, integers, flo
ating-point numbers, character strings, etc.
» Each token is usually represented by some fixed-length cod
e, such as an integer, rather than as a variable-length chara
cter string (see Figure 5.5)
» Token type, Token specifier (value) (see Figure 5.6)
10
Scanner Output
Token specifier
» identifier name, integer value
Token coding scheme
» Figure 5.5
11
Example - Figure 5.6
Statement Token type Token specifier
PROGRAM STATS 1
22 ^STATS
VAR 2
SUM,SUMSQ,I, …, : INTEGER 22 ^SUM
14
22 ^SUMSQ
14
…..
14
22 ^VARIANCE
13
6
BEGIN 3
SUM:=0; 22 ^SUM
15
23 #0
12
Token Recognizer
By grammar
» <ident> ::= <letter> | <ident> <letter>| <ident><digit>
» <letter> ::= A | B | C | D | … | Z
» <digit> ::= 0 | 1 | 2 | 3 | … | 9
By scanner - modeling as finite automata
» Figure 5.8(a)
13
Recognizing Identifier
Identifiers allowing underscore (_)
» Figure 5.8(b)
State A-Z 0-9 _
1 2
2 2 2 3
3 2 2
A -Z
0 -9
1 A -Z 2 - 3
A -Z
0 -9
14
Recognizing Integer
Allowing leading zeroes
» Figure 5.8(c)
0 -9
1 0 -9 2
Disallowing leading zeroes
» Figure 5.8(d)
0 -9
1 1 -9 2
3 space 4
15
Scanner -- Implementation
Figure 5.10 (a)
» Algorithmic code for identifer recognition
Tabular representation of finite automaton for Figure 5.9
State A-Z 0-9 ;,+-*() : = .
1 2 4 5 6
2 2 2 3
3
4 4
5
6 7
7
16
Syntactic Analysis
Recognize source statements as language constructs
or build the parse tree for the statements
» bottom-up: operator-precedence parsing
» top-down:: recursive-descent parsing
17
Operator-Precedence Parsing
Operator
» any terminal symbol (or any token)
Precedence
» * »+
» +«*
Operator-precedence
» precedence relations between operators
18
Precedence Matrix for the Fig. 5.2
Operator-Precedence Parse Example
BEGIN READ ( VALUE ) ;
20
(i) … id1 := id2 DIV
(ii) … id1 := <N1> DIV int -
(iii) … id1 := <N1> DIV <N2> -
(iv) … id1 := <N3> - id3 *
(v) … id1 := <N3> - <N4> * id4 ;
(vi) … id1 := <N3> - <N4> * <N5> ;
(vi) … id1 := <N3> - <N6> ;
(vii) … id1 := <N7> ;
Operator-Precedence Parsing
Bottom-up parsing
Generating precedence matrix
» Aho et al. (1988)
23
Shift-reduce Parsing with Stack
Figure 5.14
24
Recursive-Descent Parsing
Each nonterminal symbol in the grammar is associate
d with a procedure
<read> ::= READ (<id-list>)
<stmt> ::= <assign> | <read> | <write> | <for>
Left recursion
» <dec-list> ::= <dec> | <dec-list>;<dec>
Modification
» <dec-list> ::= <dec> {;<dec>}
25
26
Recursive-Descent Parse of READ
27
Simplified Pascal Grammar for Recursive-
Descent Parser
28
29
30
31
32
33
34
35
36
37
38
39