Week III & IV
Materials of Week III & IV
• Lexical Analysis
• Token
• FA for lexical analysis
• FA implementation
Lexical Analysis
• Identify or recognize words (tokens) in
the source program.
• Send a sequence of tokens obtained to the
next stage
• Constructs the table used in the compiler
which is :
– a symbol table (identifier table) and
– a table of numerical constants (constants
table)
Token
• a token is a string of input
characters taken as an atomic unit
and passed on and recognized by the
next stage of the compiler
Kind of Token
1. Keyword: while, if, else, for, …
2. Identifier : words defined by the
programmer to identify variables,
classes, constants, functions etc. Ex :
min, max, GetNull2, zero33, etc
3. Operator : +, -, <>, etc
Kind of Token (cont.)
4. Numeric constant :
– Integer : 12, 123, …
– Real : 12.32, 233.2, …
– Exponent : 0.9E-23, 10E2, …
5. Character constant
6. Special Character : {, (, }, ), ;
7. Comment : It is known in the lexical stage but
does not proceed to the next stage.
8. White Space
9. Newline
Example
• Example of an input string in C ++,
with the following types of tokens
obtained:
• while ( x33 <= 2.5e+33 – total ) calc ( x33 ) ;
• 1 6 2 3 4 3 2 6 2 6 266
Output of Lexical
Analysis
• The output of this stage is the flow
of tokens.
• Each token consists of two parts :
– A class that indicates the type of token
– A value that indicates the class member.
Example of the Output
Token Class Token Value
• 1 [code for while]
• 6 [code for (]
• 2 [pointer to contents of symbol table for x33]
• 2 [code for <=]
• 2 [pointer to contents of constant table for 2.5e+33]
• 3 [code for -]
• 2 [pointer to contents of symbol table for total]
• 6 [code for )]
• 2 [pointer to contents of symbol table for calc]
• 6 [code for (]
• 2 [pointer to contents of symbol table for x33]
• 6 [code for )]
• 6 [code for ;]
A Note
• If the source language is not case
sensitive, the scanner must be able
to recognize and accommodate it.
• Suppose the following words
represent the same keywords: then,
tHeN, Then, THEN.
• One of the technique to recognize
that words are to change all words to
upper (or lower) case.
FA for keywords
• Suppose an FA recognizes 5 keyword tokens : if,
inline,int,for,float.
i f
n l i n e
f o r
l o a t
FA for Indentifier
• An identifier is a string that starts with a letter
and is followed by any letter or number. Example :
calc, max, min1, h2o, … .
• RE for identifier : letter(letter+number)*
Numeric Constant
• Integer
• Real
• Exponential
Integer Numeric
Constant
• RE :
integer = number(number)*
Real or decimal constant
• Decimal = integer . integer
• Decimal = number(number)* . number(number)*
Exponential constant
• Exp = (decimal + integer) e (sign + λ) integer
• Or if it is written in full for Σ = {a, s,., E} where a = number
and s = sign, we get the following regular expression :
• Exp = (aa*.aa* + aa*)e(s+λ)aa*
• Exp = aa*(.aa*+λ)e(s+λ)aa*
Problem Questions
1. For each C / C ++ input string,
specify the token and class list :
a) for (I=start; I<=fin+3.5e6;I=I*3)
ac=ac+/*incr*/1;
b) { ax=33; bx= /*if*/ 31.4 } // ax+3;
c) if /* if */ a) } +whiles
2. Construct an FA that recognizes the
word : RENT, RENEW, RED, RAID,
RAG and SENT.
Problem Questions
3. Modify FA for real constant such
that it can accept numeric numbers
starting with a dot, for example .25
Problem Questions
4. Design an FA that can recognize
comments in C / C ++ namely / * and
* /. Use symbol A to represent
characters other than * or /; (where
the set of alphabet = {/, *, A}).
FA impelementation
• There are two ways that can be used to
implement DFA, i.e. :
– Stores all transition functions in a two-
dimensional array, where rows are for each
state (state) and columns are for expressing
input symbols.
– Use switch statements based on the transition
direction of each state. The code for each
state will test the input symbol and determine
the direction of the transition to the new state
based on the input.
Implementation of FA with Array in C++
: initial part
#include <iostream.h>
void main(){
bool accept[3]={true,false,false}; //final state
int dfa[10][2]={{1,2},{2,0},{2,2}}; // DFA definition
char inp[10]; //input strings
int state;
int input; //the element of the input set
bool akhir; // end of the input string reading
int pos; // the pointer to the position of the input reading
char lagi;
Implementation of FA with Array in C++
: iteration part
do{
cout<<"Masukkan input (beri spasi di akhir) : ";
cin>>inp; // enter the input string
pos = 0; // the initial position of the input string reading
akhir = false; // position not at the end of the string
state = 0; // starting from the initial state = q0
while(akhir == false){
if(inp[pos]=='a') input = 0; // input a is represented by a number 0
else if(inp[pos]=='b') input=1; //input b is represented by a number 1
else if(inp[pos]=='\0'){
akhir=true; // if the position is at the end of the string
break; } // the reading process stopped
state = dfa[state][input]; // transition function
pos++; // read the next input
}
}
Implementation of FA with Array in C++ :
output part
//if the last state is final, then it is accepted
if (accept[state])
cout<<"String tersebut diterima (ACCEPTED) \n";
else
cout<<"String Tersebut Ditolak (REJECTED)\n";
cout<<"Masukkan input lain? (y/t) : "; cin>>lagi;
} while(lagi!='t');
Implementation of FA with Switch Case
#include <iostream.h>
void main(){
int state;
char inp[10];
int pos;
bool akhir;
char lagi;
do{
cout<<"Masukkan input (beri spasi di akhir) : ";
cin>>inp; // enter the input string
pos = 0; // the initial position of the input string reading
akhir = false;
state = 0; // starting from the initial state q0;
while(akhir == false) { // if the reading isn't over
switch(state) {
case 0 : //di state q0
if(inp[pos]=='a') state = 1; // if read 'a' then go to state q1
else if(inp[pos]=='b') state=2; // if read 'b' then go to state q2
break;
case 1 :
if(inp[pos]=='a') state = 2; // if read 'a' then go to state q2
else if(inp[pos]=='b') state=0; // if read 'a‘ then go to state q0
break;
case 2 :
if(inp[pos]=='a') state = 2; // if read 'a' then go to state q2
else if(inp[pos]=='b') state=2; // if read 'a' then go to state q2
break;
}
if(inp[pos]=='\0') akhir = true;
pos++;
}
// if the last state is final state state (q0) then the
words is accepted
if (state==0)
cout<<"String tersebut diterima (ACCEPTED) \n";
else
cout<<"String Tersebut Ditolak (REJECTED)\n";
cout<<"Masukkan input lain? (y/t) : "; cin>>lagi;
} while(lagi!='t');
}
Execution of the program
Programming Assignment
• Implement FA to recognize an
integer constant using C++
• Screenshoot :
– the code and
– execution of the program code
• Upload to the siakad in PDF file
• Notice : avoid plagiarism