Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
9 views21 pages

Module 2

This is the syllabus on automata and compilor design
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views21 pages

Module 2

This is the syllabus on automata and compilor design
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 21

MODULE 2

Automata and Compiler Design

TOP DOWN PARSING:


Top-down parsing can be viewed as the problem of constructing a parse tree for
the given input string, starting from the root and creating the nodes of the parse
tree in preorder (depth-first left to right).
Equivalently, top-down parsing can be viewed as finding a leftmost derivation
for an input string.
It is classified in to two different variants namely; one which uses Back
Tracking and the other is Non Back Tracking in nature
Non Back Tracking Parsing: There are two variants of this parser as given
below.
1. Table Driven Predictive Parsing :
i. LL (1) Parsing
2. Recursive Descent parsing
Back Tracking
Brute Force method

Context Free Grammar (CFG):


A context-free grammar (CFG) is a formal system used to describe a class of
languages known as context-free languages (CFLs). purpose of context-free
grammar is:
 To list all strings in a language using a set of rules (production rules).
 It extends the capabilities of regular expressions and finite automata.
A grammar is said to be the Context-free grammar if every production is in the
form of:
G -> (V∪T)*, where G ∊ V
V (Variables/Non-terminals): These are symbols that can be replaced using
production rules. They help in defining the structure of the grammar. Typically,
non-terminals are represented by uppercase letters (e.g., S, A, B).
MODULE 2
Automata and Compiler Design
T (Terminals): These are symbols that appear in the final strings of the language
and cannot be replaced further. They are usually represented by lowercase
letters (e.g., a, b, c) or specific symbols.
The left-hand side can only be a Variable, it cannot be a terminal.
But on the right-hand side here it can be a Variable or Terminal or both
combination of Variable and Terminal.
Derivation
Derivation is a sequence of production rules. It is used to get the input string
through these production rules. During parsing we have to take two decisions.
These are as follows:
o We have to decide the non-terminal which is to be replaced.
o We have to decide the production rule by which the non-terminal will be
replaced.
We have two options to decide which non-terminal to be replaced with
production rule.
Left-most Derivation
In the left most derivation, the input is scanned and replaced with the production
rule from left to right. So in left most derivatives we read the input string from
left to right.
1. S = S + S
2. S = S - S
3. S = a | b |c
Input: a - b + c
The left-most derivation is:
S=S+S
S=S-S+S
S=a-S+S
S=a-b+S
S=a-b+c
MODULE 2
Automata and Compiler Design
Right-most Derivation
In the right most derivation, the input is scanned and replaced with the
production rule from right to left. So in right most derivatives we read the
input string from right to left.
Example:
1. S = S + S
2. S = S - S
3. S = a | b |c
Input: a - b + c
The right-most derivation is:
1. S = S - S
2. S = S - S + S
3. S = S - S + c
4. S = S - b + c
5. S = a - b + c

Parse tree
o Parse tree is the graphical representation of symbol. The symbol can be
terminal or non-terminal.
o In parsing, the string is derived using the start symbol. The root of the
parse tree is that start symbol.
o It is the graphical representation of symbol that can be terminals or non-
terminals.
o Parse tree follows the precedence of operators. The deepest sub-tree
traversed first. So, the operator in the parent node has less precedence
over the operator in the sub-tree.
The parse tree follows these points:
o All leaf nodes have to be terminals.
MODULE 2
Automata and Compiler Design
o All interior nodes have to be non-terminals.
o In-order traversal gives original input string.

Example:

Production rules:
1. T= T + T | T * T
2. T = a|b|c

Input: a * b + c

Step 1:

Step 2:

Step 3:

Step 4:
MODULE 2
Automata and Compiler Design

Step 5:

Ambiguity
A grammar is said to be ambiguous if there exists more than one leftmost
derivation or more than one rightmost derivative or more than one parse
tree for the given input string. If the grammar is not ambiguous then it is
called unambiguous.
Example:

2. S = ∈
1. S = aSb | SS

For the string aabb, the above grammar generates two parse trees:
MODULE 2
Automata and Compiler Design
If the grammar has ambiguity then it is not good for a compiler
construction. No method can automatically detect and remove the
ambiguity but you can remove ambiguity by re-writing the whole
grammar without ambiguity.

LL(K) Grammars :
LL(k) grammar is a type of grammar that uses a predictive parsing table
to parse input from left to right. LL(k) stands for "left-to-right, leftmost
derivation" and "k" represents the number of lookahead symbols.
How it works
 An LL(k) parser uses a predictive parsing table to parse input.
 The table is constructed from a context-free grammar (CFG).
 The parser reads the input from left to right, performing leftmost
derivation of the sentence.
 The parser uses k tokens of lookahead when parsing a sentence.
LL(k) grammar properties
 Every LL(k) grammar is deterministic.
 The set of LL(k) languages is contained in the set of LL(k+1) languages.
 Not all context-free languages can be recognized by an LL(k) parser.

LL(1) Parsing :
Here the 1st L represents that the scanning of the Input will be done from
the Left to Right manner and the second L shows that in this parsing
technique, we are going to use the Left most Derivation Tree. And finally,
the 1 represents the number of look-ahead, which means how many
symbols you will see when you want to make a decision.
MODULE 2
Automata and Compiler Design

Conditions for an LL(1) Grammar


To construct a working LL(1) parsing table, a grammar must satisfy these
conditions:
 No Left Recursion: Avoid recursive definitions like A -> A + b.
 Unambiguous Grammar: Ensure each string can be derived in only one
way.
 Left Factoring: Make the grammar deterministic, so the parser can
proceed without guessing.
Algorithm to Construct LL(1) Parsing Table
Step 1: First check all the essential conditions mentioned above and go to
step 2.
Step 2: Calculate First() and Follow() for all non-terminals.
1. First(): If there is a variable, and from that variable, if we try to drive all
the strings then the beginning Terminal Symbol is called the First.
2. Follow(): What is the Terminal Symbol which follows a variable in the
process of derivation.
Step 3: For each production A –> α. (A tends to alpha)
1. Find First(α) and for each terminal in First(α), make entry A –> α in the
table.
2. If First(α) contains ε (epsilon) as terminal, then find the Follow(A) and
for each terminal in Follow(A), make entry A –> ε in the table.
MODULE 2
Automata and Compiler Design
3. If the First(α) contains ε and Follow(A) contains $ as terminal, then make
entry A –> ε in the table for the $.
To construct the parsing table, we have two functions:
In the table, rows will contain the Non-Terminals and the column will
contain the Terminal Symbols. All the Null Productions of the Grammars
will go under the Follow elements and the remaining productions will lie
under the elements of the First set.
Now, let’s understand with an example.
Example 1: Consider the Grammar:
E --> TE'
E' --> +TE' | ε
T --> FT'
T' --> *FT' | ε
F --> id | (E)

*ε denotes epsilon
Step 1: The grammar satisfies all properties in step 1.
Step 2: Calculate first() and follow().

Find their First and Follow sets:

First Follow

E –> TE’ { id, ( } { $, ) }

E’ –> { +, ε } { $, ) }
MODULE 2
Automata and Compiler Design

First Follow

+TE’/
ε

T –> FT’ { id, ( } { +, $, ) }

T’ –> *FT’/
{ *, ε } { +, $, ) }
ε

F –> id/(E) { id, ( } { *, +, $, ) }

Step 3: Make a parser table.


Now, the LL(1) Parsing Table is:

id + * ( ) $

E E
–> –>
E
T T
E’ E’

E E
E’ ’ ’
E –> – –
’ +T > >
E’
ε ε

T T T
–> –>
F F
MODULE 2
Automata and Compiler Design

id + * ( ) $

T’ T’

T T
T’ ’ ’
T’
T –> – –
–>
’ *F > >
ε
T’
ε ε

F
F
–>
F –>
(E
id
)

As you can see that all the null productions are put under the Follow set
of that symbol and all the remaining productions lie under the First of that
symbol.
Note: Every grammar is not feasible for LL(1) Parsing table. It may be
possible that one cell may contain more than one production.
Advantages of Construction of LL(1) Parsing Table
 Clear Decision-Making: With an LL(1) parsing table, the parser can
decide what to do by looking at just one symbol ahead. This makes it
easy to choose the right rule without confusion or guessing.
 Fast Parsing: Since there’s no need to go back and forth or guess the next
step, LL(1) parsing is quick and efficient. This is useful for applications
like compilers where speed is important.
 Easy to Spot Errors: The table helps identify errors right away. If the
current symbol doesn’t match any rule in the table, the parser knows
there’s an error and can handle it immediately.
MODULE 2
Automata and Compiler Design
 Simple to Implement: Once the table is set up, the parsing process is
straightforward. You just follow the instructions in the table, making it
easier to build and maintain.
 Good for Predictive Parsing: LL(1) parsing is often called “predictive
parsing” because the table lets you predict the next steps based on the
input. This makes it reliable for parsing programming languages and
structured data.

Frequently Asked Questions on LL(1) Parsing Table – FAQ’s

Bottom-up Parsers
Bottom-up parsing is a type of syntax analysis method where the parser starts
from the input symbols (tokens) and attempts to reduce them to the start symbol
of the grammar (usually denoted as S). The process involves applying
production rules in reverse, starting from the leaves of the parse tree and
working upwards toward the root.
Start with tokens: The parser begins with the terminal symbols (the input
tokens), which are the leaves of the parse tree.
Shift and reduce: The parser repeatedly applies two actions:
MODULE 2
Automata and Compiler Design
 Shift: The next token is pushed onto a stack.
 Reduce: A sequence of symbols on the stack is replaced by a non-
terminal according to the production rules of the grammar. This step is
called “reduction,” where the parser replaces the right-hand side of a
production with the left-hand side non-terminal.
Repeat until root: The process of shifting and reducing continues until the
entire input is reduced to the start symbol, indicating the sentence has been
successfully parsed.
Example:
1. E → T
2. T → T * F
3. T → id
4. F → T
5. F → id
input string: “id * id”

Reduction Process in Bottom-Up Parsing


In bottom-up parsing, the process of reduction is when a specific part of the
input (called a substring) is replaced by a non-terminal symbol according to the
production rules of the grammar.
MODULE 2
Automata and Compiler Design
Production Rule: A production rule defines how a non-terminal symbol can be
replaced by other symbols (either terminals or non-terminals). For example, you
might have a production like this:
Expression → Term + Term
This rule says that an Expression can be made by combining two Terms with
a + in between.
Matching the Substring: During parsing, we look at the input string and try to
match parts of it to the right-hand side of a production rule. For instance, if the
input is “3 + 5”, we might find that this substring matches the Term + Term part
of the production.
Replacement Step (Reduction): Once a match is found, we “reduce” that
matched part. This means we replace it with the non-terminal on the left side of
the production rule. So, from the example above:
 We recognize the substring “3 + 5” as matching Term + Term.
 We then replace this substring with the non-terminal Expression.
Continue Reducing: The parser keeps reducing parts of the input string in this
way, until all parts are reduced to the start symbol (like S in many grammars).
This indicates that the entire input has been successfully parsed according to the
grammar.
Example:
Consider the following simple grammar:
1. S→A+B
2. A→3
3. B→5
Now, let’s parse the string “3 + 5”:
 First, we start with the input string: “3 + 5”.
 We look for parts of the string that match a production rule.
 We see that “3” matches A, so we replace it with A (so now we have
A+5).
 Next, “5” matches B, so we replace it with B (now we have A+B).
MODULE 2
Automata and Compiler Design
 Finally, A+B matches the production S→A+B, so we replace A+B with
S.
Now, we’ve reduced the entire input to the start symbol S, meaning the input
has been successfully parsed.
Classification of Bottom-up Parsers

A bottom-up parser is often referred to as a shift-reduce parser. A shift-


reduce parser has just four canonical actions:
 shift — next input symbol is shifted onto the top of the stack.
 reduce — right end of handle is on top of stack; locate left end of handle
within the stack; pop handle off stack and push appropriate non-terminal
LHS.
 accept — terminate parsing and signal success.
 error — call an error recovery routine.
Types of LR parsing methods :
1. SLR
2. CLR
3. LALR
MODULE 2
Automata and Compiler Design

SR Parsing
Shift-reduce parsing is a popular bottom-up technique used in syntax analysis,
where the goal is to create a parse tree for a given input based on grammar
rules. The process works by reading a stream of tokens (the input), and then
working backwards through the grammar rules to discover how the input can be
generated.
1. Input Buffer: This stores the string or sequence of tokens that needs to be
parsed.
2. Stack: The parser uses a stack to keep track of which symbols or parts of
the parse it has already processed. As it processes the input, symbols are
pushed onto and popped off the stack.
3. Parsing Table: Similar to a predictive parser, a parsing table helps the
parser decide what action to take next.
4. Shift-reduce parsing works by processing the input left to right and
gradually building up a parse tree by shifting tokens onto the stack and
reducing them using grammar rules, until it reaches the start symbol of
the grammar.
Four Main Operations of Shift Reduce Parser
Shift: Move the next input symbol onto the stack when no reduction is
possible.
Reduce: Replace a sequence of symbols at the top of the stack with the
left-hand side of a grammar rule.
Accept: Successfully complete parsing when the entire input is
processed and the stack contains only the start symbol.
Error: Handle unexpected or invalid input when no shift or reduce action
is possible.

Working of Shift Reduce Parser


Shift-reduce parsers use a Deterministic Finite Automaton (DFA) to help
recognize these handles. The DFA helps track what symbols are on the
stack and decides when to shift or reduce by following a set of rules.
Instead of directly analyzing the structure, the DFA helps the parser
determine when reductions should occur based on the stack’s contents.

The shift-reduce parser is a bottom-up parsing technique that breaks down a


string into two parts: the undigested part and the semi-digested part. Here’s
how it works:
MODULE 2
Automata and Compiler Design
1. Undigested Part: This part contains the remaining tokens that still need
to be processed. It is the input that hasn’t been handled yet.
2. Semi-Digested Part: This part is on a stack. It’s where tokens or parts
of the string that have been processed are stored.

Example

Consider the grammar


S –> ( L ) | a
L –> L , S | S
Perform Shift Reduce parsing for input string “( a, ( a, a ) ) “.

Input Parsing
Stack Buffer Action

(a,(a,a))
$ Shift
$

a,(a,a))
$( Shift
$

$(a ,(a,a))$ Reduce S → a

$(S ,(a,a))$ Reduce L → S

$(L ,(a,a))$ Shift

$(L, (a,a))$ Shift

$(L,( a,a))$ Shift

$(L,(a ,a))$ Reduce S → a

$(L,(S ,a))$ Reduce L → S

$(L,(L ,a))$ Shift


MODULE 2
Automata and Compiler Design

Input Parsing
Stack Buffer Action

$(L,(L, a))$ Shift

$(L,(L,a ))$ Reduce S → a

Reduce L →L,
$ ( L, ( L, S ))$
S

$ ( L, ( L ))$ Shift

Reduce S →
$ ( L, ( L ) )$
(L)

Reduce L → L,
$ ( L, S )$
S

$(L )$ Shift

Reduce S →
$(L) $
(L)

$S $ Accept

LR Parser
LR parser is a bottom-up parser for context-free grammar that is very
generally used by computer programming language compiler and other
associated tools. LR parser reads their input from left to right and
produces a right-most derivation. It is called a Bottom-up parser because
it attempts to reduce the top-level grammar productions by building up
from the leaves. LR parsers are the most powerful parser of all
deterministic parsers in practice.
MODULE 2
Automata and Compiler Design

LR parser algorithm :
LR Parsing algorithm is the same for all the parser, but the parsing table
is different for each parser. It consists following components as follows.
1. Input Buffer –
It contains the given string, and it ends with a $ symbol.

2. Stack –
The combination of state symbol and current input symbol is used to
refer to the parsing table in order to take the parsing decisions.

SLR Parser
 LR parser is also called as SLR parser
 it is weakest of the three methods but easier to implement
 a grammar for which SLR parser can be constructed is called SLR
grammar
Steps for constructing the SLR parsing table
1. Writing augmented grammar
2. LR(0) collection of items to be found
3. Find FOLLOW of LHS of production
4. Defining 2 functions:goto[list of terminals] and action[list of non-
terminals] in the parsing table

EXAMPLE – Construct LR parsing table for the given context-


free grammar
S–>AA
A–>aA|b
STEP1: Find augmented grammar
The augmented grammar of the given grammar is:-
S’–>.S [0th production]
S–>.AA [1st production]
MODULE 2
Automata and Compiler Design
A–>.aA [2nd production]
A–>.b [3rd production]
STEP2: Find LR(0) collection of items
Below is the figure showing the LR(0) collection of items. We will
understand everything one by one.

The terminals of this grammar are {a,b}.


The non-terminals of this grammar are {S,A}

CLR Parser
The CLR parser stands for canonical LR parser. It is a more powerful
LR parser. It makes use of lookahead symbols. This method uses a large
set of items called LR(1) items.The main difference between LR(0) and
LR(1) items is that, in LR(1) items, it is possible to carry more
information in a state, which will rule out useless reduction states.This
extra information is incorporated into the state by the lookahead symbol.
The general syntax becomes [A->∝.B, a ]
where A->∝.B is the production and a is a terminal or right end marker$
LR(1) items=LR(0) items + look ahead
MODULE 2
Automata and Compiler Design
LALR Parser
ALR Parser is lookahead LR parser. It is the most powerful parser
which can handle large classes of grammar. The size of CLR parsing
table is quite large as compared to other parsing table. LALR reduces
the size of this table.LALR works similar to CLR. The only difference is
, it combines the similar states of CLR parsing table into one single
state.
The general syntax becomes [A->∝.B, a ]
where A->∝.B is production and a is a terminal or right end marker $
LR(1) items=LR(0) items + look ahead

Steps for constructing the LALR parsing table :


1. Writing augmented grammar
2. LR(1) collection of items to be found
3. Defining 2 functions: goto[list of terminals] and action[list of non-
terminals] in the LALR parsing table

YACC tool

YACC is an LALR parser generator developed at the beginning of the


1970s by Stephen C. Johnson for the Unix operating system. It
automatically generates the LALR(1) parsers from formal grammar
specifications. YACC plays an important role in compiler and interpreter
development since it provides a means to specify the grammar of a
language and to produce parsers that either interpret or compile code
written in that language.
Key Concepts and Features of YACC
 Grammar Specification: The input to YACC is a context-free
grammar (usually in the Backus-Naur Form, BNF) that describes the
syntax rules of the language it parses.
 Parser Generation: YACC translates the grammar into a C function
that could perform an efficient parsing of input text according to such
predefined rules.
 LALR(1) Parsing: This is a bottom-up parsing method that makes use
of a single token lookahead in determining the next action of parsing.
 Semantic Actions: These are the grammar productions that are
associated with an action; this enables the execution of code, usually in
C, used in the construction of abstract syntax trees , the generation of
intermediate representations, or error handling.
MODULE 2
Automata and Compiler Design
 Attribute Grammars: These grammars consist of non-terminal
grammar symbols with attributes, which through semantic actions are
used in the construction of parse trees or the output of code.
 Integration with Lex: It is often used along with Lex, a tool that
generates lexical analyzers-scanners-which breaks input into tokens that
are then processed by the YACC parser.

You might also like