Thanks to visit codestin.com
Credit goes to www.slideshare.net

Rigvendra Kumar Vardhan 
M.TECH ECE 
Pondicherry University 
1
 Interpreters (direct execution) 
 Assemblers 
 Preprocessors 
 Text formatters (non-WYSIWYG) 
 Analysis tools 
CS 540 Spring 2013 GMU 2
 Interpreter 
A program that reads a source program 
and produces the results of executing 
that program 
 Compiler 
A program that translates a program 
from one language (the source) to 
another (the target) 
3
 Correctness 
 Speed (runtime and compile time) 
Degrees of optimization 
Multiple passes 
 Space 
 Feedback to user 
 Debugging 
CS 540 Spring 2013 GMU 4
 Interpreter 
Execution engine 
Program execution interleaved with 
analysis 
running = true; 
while (running) { 
analyze next statement; 
execute that statement; 
} 
May involve repeated analysis of some 
statements (loops, functions) 
5
 Read and analyze entire program 
 Translate to semantically equivalent 
program in another language 
Presumably easier to execute or more 
efficient 
Should “improve” the program in some 
fashion 
 Offline process 
Tradeoff: compile time overhead 
(preprocessing step) vs execution 
performance 
6
 Compilers 
FORTRAN, C, C++, Java, COBOL, etc. 
etc. 
Strong need for optimization, etc. 
 Interpreters 
PERL, Python, awk, sed, sh, csh, 
postscript printer, Java VM 
Effective if interpreter overhead is low 
relative to execution cost of language 
statements 
7
8 
Source code 
Compiler 
Assembly code 
Assembler 
Object code 
(machine code) 
Fully-resolved object 
code (machine code) 
Linker 
Loader 
Executable image
 Series of program representations 
 Intermediate representations optimized 
for program manipulations of various 
kinds (checking, optimization) 
 Become more machine-specific, less 
language-specific as translation proceeds 
9
 First approximation 
Front end: analysis 
 Read source program and understand its structure 
and meaning 
Back end: synthesis 
 Generate equivalent target language program 
Source Front End Back End Target 
10
 Must recognize legal programs (& 
complain about illegal ones) 
 Must generate correct code 
 Must manage storage of all variables 
 Must agree with OS & linker on target 
format 
Source Front End Back End Target 
11
 Need some sort of Intermediate 
Representation (IR) 
 Front end maps source into IR 
 Back end maps IR to target machine code 
Source Front End Back End Target 
12
CS 540 Spring 2013 GMU 13 
Scanner 
(lexical 
analysis) 
Parser 
(syntax 
analysis) 
Code 
Optimizer 
Semantic 
Analysis 
(IC generator) 
Code 
Generator 
Symbol 
Table 
Source 
language 
tokens Syntactic 
structure 
Intermediate 
Language 
Target 
language 
Intermediate 
Language
 Lexical Analysis 
 Syntax Analysis 
 Semantic Analysis 
 Runtime environments 
 Code Generation 
 Code Optimization 
CS 540 Spring 2013 GMU 14
15 
Source code 
(character stream) 
Lexical analysis 
Parsing 
Token stream 
Abstract syntax tree 
Intermediate Code Generation 
Intermediate code 
Optimization 
Code generation 
Intermediate code 
Assembly code 
Front end 
(machine-independent) 
Back end 
(machine-dependent)
Scanner Parser source tokens IR 
 Split into two parts 
Scanner: Responsible for converting 
character stream to token stream 
 Also strips out white space, comments 
Parser: Reads token stream; generates IR 
 Both of these can be generated 
automatically 
Source language specified by a formal 
grammar 
Tools read the grammar and generate 
scanner & parser (either table-driven or hard 
coded) 
16
 Token stream: Each significant lexical 
chunk of the program is represented by a 
token 
Operators & Punctuation: {}[]!+-=*;: … 
Keywords: if while return goto 
Identifiers: id & actual name 
Constants: kind & value; int, floating-point 
character, string, … 
17
 Input text 
// this statement does very little 
if (x >= y) y = 42; 
 Token Stream 
IF LPAREN ID(x) GEQ ID(y) 
RPAREN ID(y) BECOMES INT(42) SCOLON 
Note: tokens are atomic items, not character 
strings 
18
 Many different forms 
(Engineering tradeoffs) 
 Common output from a parser is an 
abstract syntax tree 
Essential meaning of the program 
without the syntactic noise 
19
 Token Stream Input  Abstract Syntax Tree 
20 
IF LPAREN ID(x) 
GEQ ID(y) RPAREN 
ID(y) BECOMES 
INT(42) SCOLON 
ifStmt 
>= 
ID(x) ID(y) 
assign 
ID(y) INT(42)
 During or (more common) after parsing 
Type checking 
Check for language requirements like 
“declare before use”, type compatibility 
Preliminary resource allocation 
Collect other information needed by 
back end analysis and code generation 
21
 Responsibilities 
Translate IR into target machine code 
Should produce fast, compact code 
Should use machine resources 
effectively 
Registers 
Instructions 
Memory hierarchy 
22
 Typically split into two major parts with 
sub phases 
“Optimization” – code improvements 
May well translate parser IR into 
another IR 
Code generation 
Instruction selection & scheduling 
Register allocation 
23
 Input 
if (x >= y) 
y = 42; 
 Output 
mov eax,[ebp+16] 
cmp eax,[ebp-8] 
jl L17 
mov [ebp-8],42 
L17: 
24
lda $30,-32($30) 
stq $26,0($30) 
stq $15,8($30) 
bis $30,$30,$15 
bis $16,$16,$1 
stl $1,16($15) 
lds $f1,16($15) 
sts $f1,24($15) 
ldl $5,24($15) 
bis $5,$5,$2 
s4addq $2,0,$3 
ldl $4,16($15) 
mull $4,$3,$2 
ldl $3,16($15) 
addq $3,1,$4 
mull $2,$4,$2 
ldl $3,16($15) 
addq $3,1,$4 
mull $2,$4,$2 
stl $2,20($15) 
ldl $0,20($15) 
br $31,$33 
$33: 
bis $15,$15,$30 
ldq $26,0($30) 
ldq $15,8($30) 
addq $30,32,$30 
ret $31,($26),1 
Optimized Code 
25 
s4addq $16,0,$0 
mull $16,$0,$0 
addq $16,1,$16 
mull $0,$16,$0 
mull $0,$16,$0 
ret $31,($26),1 
Unoptimized Code
26 
Source code 
(character stream) 
Lexical analysis 
Parsing 
Token stream 
Abstract syntax tree 
(AST) 
Semantic Analysis 
if (b == 0) a = b; 
if ( b == 0 ) a = b ; 
if 
== 
b 0 
= 
a b 
if 
== 
int b int 0 
= 
int a 
lvalue 
int b 
boolean 
Decorated AST 
; 
int ;
Intermediate Code Generation 
Optimization 
Code generation 
27 
if 
boolean == 
int ; 
int b int 0 
= 
int a 
lvalue 
int b 
CJUMP == 
MEM 
+ 
fp 8 
CONST MOVE 
0 MEM MEM 
fp 4 fp 8 
NOP 
+ + 
CJUMP == 
CONST MOVE 
CX NOP 
0 DX CX 
CMP CX, 0 
CMOVZ DX,CX
 Compiler techniques are everywhere 
Parsing (little languages, interpreters) 
Database engines 
AI: domain-specific languages 
Text processing 
Tex/LaTex -> dvi -> Postscript -> pdf 
Hardware: VHDL; model-checking tools 
Mathematics (Mathematica, Matlab) 
28
 Fascinating blend of theory and 
engineering 
Direct applications of theory to practice 
Parsing, scanning, static analysis 
Some very difficult problems (NP-hard or 
worse) 
Resource allocation, “optimization”, 
etc. 
Need to come up with good-enough 
solutions 
29
 Ideas from many parts of CSE 
AI: Greedy algorithms, heuristic search 
Algorithms: graph algorithms, dynamic 
programming, approximation 
algorithms 
Theory: Grammars DFAs and PDAs, 
pattern matching, fixed-point 
algorithms 
Systems: Allocation & naming, 
synchronization, locality 
Architecture: pipelines & hierarchy 
management, instruction set use 
30
 program ::= statement | program 
statement 
 statement ::= assignStmt | ifStmt 
 assignStmt ::= id = expr ; 
 ifStmt ::= if ( expr ) stmt 
 expr ::= id | int | expr + expr 
 Id ::= a | b | c | i | j | k | n | x | y | z 
 int ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 
31
 There are several syntax notations for 
productions in common use; all mean the 
same thing 
ifStmt ::= if ( expr ) stmt 
ifStmt if ( expr ) stmt 
<ifStmt> ::= if ( <expr> ) <stmt> 
32
program ::= statement | program statement 
statement ::= assignStmt | ifStmt 
assignStmt ::= id = expr ; 
ifStmt ::= if ( expr ) stmt 
expr ::= id | int | expr + expr 
id ::= a | b | c | i | j | k | n | x | y | z 
int ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 
a = 1 ; 
if ( a + 1 ) 
b = 2 ; 
33 
program 
program 
ID(a) expr 
stmt 
stmt 
assign 
int (1) 
ifStmt 
expr stmt 
expr expr 
assign 
ID(a) int (1) ID(b) expr 
int (2)
34

C program compiler presentation

  • 1.
    Rigvendra Kumar Vardhan M.TECH ECE Pondicherry University 1
  • 2.
     Interpreters (directexecution)  Assemblers  Preprocessors  Text formatters (non-WYSIWYG)  Analysis tools CS 540 Spring 2013 GMU 2
  • 3.
     Interpreter Aprogram that reads a source program and produces the results of executing that program  Compiler A program that translates a program from one language (the source) to another (the target) 3
  • 4.
     Correctness Speed (runtime and compile time) Degrees of optimization Multiple passes  Space  Feedback to user  Debugging CS 540 Spring 2013 GMU 4
  • 5.
     Interpreter Executionengine Program execution interleaved with analysis running = true; while (running) { analyze next statement; execute that statement; } May involve repeated analysis of some statements (loops, functions) 5
  • 6.
     Read andanalyze entire program  Translate to semantically equivalent program in another language Presumably easier to execute or more efficient Should “improve” the program in some fashion  Offline process Tradeoff: compile time overhead (preprocessing step) vs execution performance 6
  • 7.
     Compilers FORTRAN,C, C++, Java, COBOL, etc. etc. Strong need for optimization, etc.  Interpreters PERL, Python, awk, sed, sh, csh, postscript printer, Java VM Effective if interpreter overhead is low relative to execution cost of language statements 7
  • 8.
    8 Source code Compiler Assembly code Assembler Object code (machine code) Fully-resolved object code (machine code) Linker Loader Executable image
  • 9.
     Series ofprogram representations  Intermediate representations optimized for program manipulations of various kinds (checking, optimization)  Become more machine-specific, less language-specific as translation proceeds 9
  • 10.
     First approximation Front end: analysis  Read source program and understand its structure and meaning Back end: synthesis  Generate equivalent target language program Source Front End Back End Target 10
  • 11.
     Must recognizelegal programs (& complain about illegal ones)  Must generate correct code  Must manage storage of all variables  Must agree with OS & linker on target format Source Front End Back End Target 11
  • 12.
     Need somesort of Intermediate Representation (IR)  Front end maps source into IR  Back end maps IR to target machine code Source Front End Back End Target 12
  • 13.
    CS 540 Spring2013 GMU 13 Scanner (lexical analysis) Parser (syntax analysis) Code Optimizer Semantic Analysis (IC generator) Code Generator Symbol Table Source language tokens Syntactic structure Intermediate Language Target language Intermediate Language
  • 14.
     Lexical Analysis  Syntax Analysis  Semantic Analysis  Runtime environments  Code Generation  Code Optimization CS 540 Spring 2013 GMU 14
  • 15.
    15 Source code (character stream) Lexical analysis Parsing Token stream Abstract syntax tree Intermediate Code Generation Intermediate code Optimization Code generation Intermediate code Assembly code Front end (machine-independent) Back end (machine-dependent)
  • 16.
    Scanner Parser sourcetokens IR  Split into two parts Scanner: Responsible for converting character stream to token stream  Also strips out white space, comments Parser: Reads token stream; generates IR  Both of these can be generated automatically Source language specified by a formal grammar Tools read the grammar and generate scanner & parser (either table-driven or hard coded) 16
  • 17.
     Token stream:Each significant lexical chunk of the program is represented by a token Operators & Punctuation: {}[]!+-=*;: … Keywords: if while return goto Identifiers: id & actual name Constants: kind & value; int, floating-point character, string, … 17
  • 18.
     Input text // this statement does very little if (x >= y) y = 42;  Token Stream IF LPAREN ID(x) GEQ ID(y) RPAREN ID(y) BECOMES INT(42) SCOLON Note: tokens are atomic items, not character strings 18
  • 19.
     Many differentforms (Engineering tradeoffs)  Common output from a parser is an abstract syntax tree Essential meaning of the program without the syntactic noise 19
  • 20.
     Token StreamInput  Abstract Syntax Tree 20 IF LPAREN ID(x) GEQ ID(y) RPAREN ID(y) BECOMES INT(42) SCOLON ifStmt >= ID(x) ID(y) assign ID(y) INT(42)
  • 21.
     During or(more common) after parsing Type checking Check for language requirements like “declare before use”, type compatibility Preliminary resource allocation Collect other information needed by back end analysis and code generation 21
  • 22.
     Responsibilities TranslateIR into target machine code Should produce fast, compact code Should use machine resources effectively Registers Instructions Memory hierarchy 22
  • 23.
     Typically splitinto two major parts with sub phases “Optimization” – code improvements May well translate parser IR into another IR Code generation Instruction selection & scheduling Register allocation 23
  • 24.
     Input if(x >= y) y = 42;  Output mov eax,[ebp+16] cmp eax,[ebp-8] jl L17 mov [ebp-8],42 L17: 24
  • 25.
    lda $30,-32($30) stq$26,0($30) stq $15,8($30) bis $30,$30,$15 bis $16,$16,$1 stl $1,16($15) lds $f1,16($15) sts $f1,24($15) ldl $5,24($15) bis $5,$5,$2 s4addq $2,0,$3 ldl $4,16($15) mull $4,$3,$2 ldl $3,16($15) addq $3,1,$4 mull $2,$4,$2 ldl $3,16($15) addq $3,1,$4 mull $2,$4,$2 stl $2,20($15) ldl $0,20($15) br $31,$33 $33: bis $15,$15,$30 ldq $26,0($30) ldq $15,8($30) addq $30,32,$30 ret $31,($26),1 Optimized Code 25 s4addq $16,0,$0 mull $16,$0,$0 addq $16,1,$16 mull $0,$16,$0 mull $0,$16,$0 ret $31,($26),1 Unoptimized Code
  • 26.
    26 Source code (character stream) Lexical analysis Parsing Token stream Abstract syntax tree (AST) Semantic Analysis if (b == 0) a = b; if ( b == 0 ) a = b ; if == b 0 = a b if == int b int 0 = int a lvalue int b boolean Decorated AST ; int ;
  • 27.
    Intermediate Code Generation Optimization Code generation 27 if boolean == int ; int b int 0 = int a lvalue int b CJUMP == MEM + fp 8 CONST MOVE 0 MEM MEM fp 4 fp 8 NOP + + CJUMP == CONST MOVE CX NOP 0 DX CX CMP CX, 0 CMOVZ DX,CX
  • 28.
     Compiler techniquesare everywhere Parsing (little languages, interpreters) Database engines AI: domain-specific languages Text processing Tex/LaTex -> dvi -> Postscript -> pdf Hardware: VHDL; model-checking tools Mathematics (Mathematica, Matlab) 28
  • 29.
     Fascinating blendof theory and engineering Direct applications of theory to practice Parsing, scanning, static analysis Some very difficult problems (NP-hard or worse) Resource allocation, “optimization”, etc. Need to come up with good-enough solutions 29
  • 30.
     Ideas frommany parts of CSE AI: Greedy algorithms, heuristic search Algorithms: graph algorithms, dynamic programming, approximation algorithms Theory: Grammars DFAs and PDAs, pattern matching, fixed-point algorithms Systems: Allocation & naming, synchronization, locality Architecture: pipelines & hierarchy management, instruction set use 30
  • 31.
     program ::=statement | program statement  statement ::= assignStmt | ifStmt  assignStmt ::= id = expr ;  ifStmt ::= if ( expr ) stmt  expr ::= id | int | expr + expr  Id ::= a | b | c | i | j | k | n | x | y | z  int ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 31
  • 32.
     There areseveral syntax notations for productions in common use; all mean the same thing ifStmt ::= if ( expr ) stmt ifStmt if ( expr ) stmt <ifStmt> ::= if ( <expr> ) <stmt> 32
  • 33.
    program ::= statement| program statement statement ::= assignStmt | ifStmt assignStmt ::= id = expr ; ifStmt ::= if ( expr ) stmt expr ::= id | int | expr + expr id ::= a | b | c | i | j | k | n | x | y | z int ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 a = 1 ; if ( a + 1 ) b = 2 ; 33 program program ID(a) expr stmt stmt assign int (1) ifStmt expr stmt expr expr assign ID(a) int (1) ID(b) expr int (2)
  • 34.

Editor's Notes

  • #12 Au02: from Cooper’s slides
  • #13 Au02: from Cooper’s slides
  • #17 Lex, yacc examples
  • #20 Au02: plan on drawing something here
  • #22 Au02: plan on drawing something here
  • #31 Au02: taken from one of Cooper’s slides
  • #32 This is similar to what is used in the Algol Report