Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
128 views17 pages

Top-Down Parsing Techniques

This document discusses top-down parsing techniques, including recursive descent parsing and LL(1) parsing. It begins by explaining top-down parsing and how it constructs parse trees using a preorder traversal. It then covers recursive descent parsing, including how to handle repetition, choice, and error recovery using EBNF notation. LL(1) parsing is introduced as an alternative that uses an explicit stack instead of recursion. The key aspects of LL(1) parsing include the LL(1) parsing table, which expresses the possible rule choices for each non-terminal based on the next input token, and the LL(1) parsing algorithm.

Uploaded by

gdayanand4u
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
128 views17 pages

Top-Down Parsing Techniques

This document discusses top-down parsing techniques, including recursive descent parsing and LL(1) parsing. It begins by explaining top-down parsing and how it constructs parse trees using a preorder traversal. It then covers recursive descent parsing, including how to handle repetition, choice, and error recovery using EBNF notation. LL(1) parsing is introduced as an alternative that uses an explicit stack instead of recursion. The key aspects of LL(1) parsing include the LL(1) parsing table, which expresses the possible rule choices for each non-terminal based on the next input token, and the LL(1) parsing algorithm.

Uploaded by

gdayanand4u
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 17

Chapter 4 Top-Down Parsing

OUTLINE
Top-Down Parsing It parses an input string of tokens by tracing out the steps in a leftmost derivation. And the implied traversal of the parse tree is a preorder traversal and, thus, occurs from the root to the leaves. The example: number + number, and corresponds to the parse tree exp

exp

op

exp

number + number The above parse tree is corresponds to the leftmost derivations: (1) exp => exp op exp (2) => number op exp (3) => number + exp (4) => number + number Two forms of Top-Down Parsers Predictive parsers: attempts to predict the next construction in the input string using one or more look-ahead tokens Backtracking parsers: try different possibilities for a parse of the input, backing up an arbitrary amount in the input if one possibility fails. It is more powerful but much slower, unsuitable for practical compilers. Two kinds of Top-Down parsing algorithms Recursive-descent parsing: is quite versatile and suitable for a handwritten parser. LL(1) parsing: The first L refers to the fact that it processes the input from left to right; The second L refers to the fact that it traces out a leftmost derivation for the input string; The number 1 means that it uses only one symbol of input to predict the direction of the parse. Look-Ahead Sets First and Follow sets: are required by both recursive-descent parsing and LL(1) parsing. A TINY Parser It is constructed by recursive-descent parsing algorithm. Error recovery methods The error recovery methods used in Top-Down parsing will be described.

4.1 TOP-DOWN PARSING BY RECURSIVE-DESCENT


4.1.1 The Basic Method of Recursive-Descent The idea of Recursive-Descent Parsing We view the grammar rule for a non-terminal A as a definition for a procedure to recognize an A; The right-hand side of the grammar for A specifies the structure of the code for this procedure.

The first example The Expression Grammar: expr expr addop termterm addop + term term mulop factor factor mulop * factor (expr) number A recursive-descent procedure that recognizes a factor is as follows (in pseudo-code): Procedure factor BEGIN Case token of ( : match( ( ); expr; match( )); number: match(number); else error; end case; END factor

Where, the token keeps the current next token in the input (one symbol of lookahead); The Match procedure matches the current next token with its parameters, advances the input if it succeeds, and declares error if it does not: Procedure match( expectedToken); Begin If token = expectedToken then GetToken; Else Error; Endif; End match. Notes Writing recursive-decent procedure for the remaining rules in the expression grammar is not as easy for factor. It requires the use of EBNF.

4.1.2 Repetition and Choice: Using EBNF The second example The grammar rule for an if-statement: If-stmt if ( exp ) statement if ( exp ) statement else statement The procedure that can be translated into: Procedure ifstmt; Begin Match( if ); Match( ( ); Exp; Match( ) ); Statement; If token = else then Match (else); Statement; Endif; End ifstmt; In this example, we could not immediately distinguish the two choices. The EBNF of the if-statement is as follows: If-stmt if ( exp ) statement [ else statement] Where, the square brackets of the EBNF are translated into a test in the code for ifstmt. If token = else then Match (else); Statement; Endif; Notes: EBNF notation is designed to mirror closely the actual code of a recursivedescent parser, So a grammar should always be translated into EBNF if recursive-descent is to be used. It is natural to write a parser that matches each else token as soon as it is encountered in the input. Consider the exp in the grammar for simple arithmetic expression in BNF: expr expr addop termterm If we were to try to turn this into a recursive exp procedure, this would lead

to an immediate infinite recursive loop. The solution is to use the EBNF rule: expr term {addop term} Where, the curly bracket expressing repetition can be translated into the code for a loop: Procedure exp; Begin Term; While token = + or token = - do Match(token); Term; End while; End exp; Similarly, the EBNF rule for term: term factor {mulop factor} Becomes the code Procedure term; Begin factor; While token = * do Match(token); factor; End while; End exp;

A question: whether the left associatively implied by the curly bracket (and explicit in the original BNF) can still be maintained within this code. A recursive-descent calculator for the simple integer arithmetic of our grammar: Function exp: integer; Var temp: integer; Begin Temp:=term; While token=+ or token = - do Case token of + : match(+); temp:=temp+term; -:match(-); temp:=temp-term; end case; end while; return temp; end exp; We can ensure that the operations are left associative by performing the operations as we cycle through the loop. A working simple calculator in C code /*Simple integer arithmetic calculator according to the EBNF; <exp> <term> { <addop> <term>} <addop> + <term> <factor> { <mulop> <factor> } <mulop> * <factor> ( <exp> ) Number inputs a line of text from stdin outputs error or the result. */

#include <stdio.h> #include <stdio.h> char token; /* global token variable */ /*function prototype for recursive calls*/ int exp(void); int term(void); int factor(void); void error(void) {fprint(stderr, error\n); exit(1); } void match(char expectedToken) {if (token==expectedToken) token=getchar(); else error(); } main() { int result; token=getchar();/*load token with first character for lookahead*/ result=exp(); if (token==\n) /*check for end of line*/ printf(Result = %d\n, result); else error(); /*extraneous chars on line*/ return 0; } int exp(void) { int temp =term(); while ((token==+) || token==-)) switch (token) { case +: match (+); temp+=term(); break; case -: match (-); temp-=term(); break; } return temp; }

int term(void) {int temp=factor(); while (token==*){ match(*); temp*=factor(); } return temp; } int factor(void) { int temp; if (token==() { match ((); temp = exp(); match()); } else if (isdigit(token)){ ungetc(token,stdin); scanf(%d,&temp); token = getchar(); } else error(); return temp; }

Notes The method of turning grammar rule in EBNF into code is quite powerful. However, there are a few pitfalls, and care must be taken in scheduling the actions within the code. In the previous pseudo-code for exp: (1) The match of operation should be before repeated calls to term; (2) The global token variable must be set before the parse begins; (3) The getToken must be called just after a successful test of a token Construction of the syntax tree The expression: 3+4+5 + + 5

3 4 The pseudo-code for the exp procedure to construct the syntax tree: function exp : syntaxTree; Var temp, newtemp: syntaxTree; begin Temp:=term; While token=+ or token = - do Case token of + : match(+); newtemp:=makeOpNode(+); leftChild(newtemp):=temp; rightChild(newtemp):=term; temp=newtemp; -:match(-); newtemp:=makeOpNode(-); leftChild(newtemp):=temp; rightChild(newtemp):=term; temp=newtemp; end case; end while; return temp; end exp; The simpler function exp : syntaxTree; Var temp, newtemp: syntaxTree; begin Temp:=term;

While token=+ or token = - do newtemp:=makeOpNode(token); match(token); leftChild(newtemp):=temp; rightChild(newtemp):=term; temp=newtemp; end while; return temp; end exp; The pseudo-code for the if-statement procedure to construct the syntax tree: Function ifstatement: syntaxTree; Var temp:syntaxTree; Begin Match(if); Match((); Temp:= makeStmtNode(if); TestChild(temp):=exp; Match()); ThenChild(temp):=statement; If token= else then Match(else); ElseChild(temp):=statement; Else ElseChild(temp):=nil; End if; End ifstatement

4.1.3 Further Decision Problems The recursive-descent method is quite powerful and adequate to construct a complete parse. But we need more formal methods to deal with complex situation. (1) It may be difficult to convert a grammar in BNF into EBNF form; (2) It is difficult to decide when to use the choice A and the choice A ;if both and begin with non-terminals. First Sets. (3) It may be necessary to know what token legally coming from the nonterminal A, in writing the code for an -production: A.Follow Sets. (4) It requires computing the First and Follow sets in order to detect the errors as early as possible. Such as )3-2), the parse will descend from exp to term to factor before an error is reported.

4.2 LL(1) PARSING 4.2.1 The Basic Method of LL(1) Parsing Main idea: LL(1) Parsing uses an explicit stack rather than recursive calls to perform a parse. An example: a simple grammar for the strings of balanced parentheses: S(S) S The following table shows the actions of a top-down parser given this grammar and the string ( ): Steps 1 2 3 4 5 6 Parsing Stack $S $S)S( $S)S $S) $S $ Input ()$ ()$ )$ )$ $ $ Action S(S) S match S match S accept

A top-down parser begins by pushing the start symbol onto the stack. It accepts an input string if, after a series of actions, the stack and the input become empty. A general schematic for a successful top-down parse: $ StartSymbol Inputstring$ one of the two actions one of the two actions $ $ accept The two actions: (1) Generate: Replace a non-terminal A at the top of the stack by a string (in reverse) using a grammar rule A , and (2) Match: Match a token on top of the stack with the next input token.

The list of generating actions in the above table: S => (S)S [S(S) S] => ( )S [S] => ( ) [S] Which corresponds precisely to the steps in a leftmost derivation of string ( ). This is the characteristic of top-down parsing. Constructing a parse tree: Adding node construction actions as each non-terminal or terminal is push onto the stack.

4.2.2 The LL(1) Parsing Table and Algorithm Purpose of the LL(1) Parsing Table: To express the possible rule choices for a non-terminal A when the A is at the top of parsing stack based on the current input token (the look-ahead). The LL(1) Parsing table for the following simple grammar: S(S) S M[N,T] S ( S(S) S ) S $ S

The general LL(1) Parsing table definition: The table is a two-dimensional array indexed by non-terminals and terminals containing production choices to use at the appropriate parsing step, which called M[N,T]. Where, N is the set of non-terminals of the grammar; T is the set of terminals or tokens (including $); Any entrances remaining empty represent potential errors. The table-constructing rule: (Supposed that the table is originally empty) * (1) If Ais a production choice, and there is a derivation =>a, where a is a token, then add Ato the table entry M[A,a]; * * (2) If Ais a production choice, and there are derivations =>and S$=>Aa, where S is the start symbol and a is a token (or $), then add Ato the table entry M[A,a]; The constructing-process of the above table: (1) For the production : S(S) S, =(S)S, where a=(, this choice will be added to the entry M[S,( ) ( and only); * (2) For the production: S, =a, i.e. there are derivation=>and S$=>Aa=(S)S$. where a=) or a=$. So add the choice Sto the both M[S,]] and M[S,$]. Definition of LL(1) Grammar: A grammar is an LL(1) grammar if the associated LL(1) parsing table has at most on production in each table entry. An LL(1) grammar cannot be ambiguous.

A Parsing Algorithm Using the LL(1) Parsing Table: (* assumes $ marks the bottom of the stack and the end of the input *) push the start symbol onto the top the parsing stack; while the top of the parsing stack $ and the next input token $ do if the top of the parsing stack is terminal a and the next input token = a then (* match *) pop the parsing stack; advance the input; else if the top of the parsing stack is non-terminal A and the next input token is terminal a and parsing table entry M[A,a] contains production AX1X2 Xn then (* generate *) pop the parsing stack; for i:=n downto 1 do push Xi onto the parsing stack; else error; if the top of the parsing stack = $ and the next input token = $ then accept else error. The LL(1) parsing table for simplified grammar of if-statements: Statement if-stmt | other If-stmt if (exp) statement else-part Else-part else statement | Exp 0 | 1 M[N,T] Stateme nt If-stmt If Statement if-stmt If-stmt if (exp) statement else-part Other Else Stateme nt other 0 1 $

Else-part

Elsepart else statemen t Elsepart Exp 0 Exp 1

Elsepart

Exp

Notice: the entry M[else-part, else] contains two entries, i.e. the dangling else ambiguity. Disambiguating rule: always prefer the rule that generates the current look-ahead token over any other, and thus the production Else-part else statement over Else-part With this modification, the above table will become unambiguous, and the grammar can be parsed as if it were an LL(1) grammar The parsing actions for the string: If (0) if (1) other else other ( for conciseness, statement= S, if-stmt=I, else-part=L, exp=E, if=I, else=e, other=o) Steps 1 2 3 4 5 Parsing Stack $S $I $LS)E(i $ LS)E( $ LS)E Input i(0)i(1)oeo$ i(0)i(1)oeo$ i(0)i(1)oeo$ (0)i(1)oeo $ 0)i(1)oeo $ Action SI Ii(E)SL Match Match Eo Match Match SI Ii(E)SL Match Match E1 Match match So match LeS Match So match

L 22 $ $ accept

You might also like