Parsing in Perl
Alberto Simões
[email protected]
YAPC::EU::2006
Alberto Simões Parsing in Perl
What we will talk about
Parsing...
what it is...
the tools to make it!
But not how to do it!
show some examples...
and compare their efficiency.
Alberto Simões Parsing in Perl
What we will talk about
Parsing...
what it is...
the tools to make it!
But not how to do it!
show some examples...
and compare their efficiency.
Alberto Simões Parsing in Perl
What we will talk about
Parsing...
what it is...
the tools to make it!
But not how to do it!
show some examples...
and compare their efficiency.
Alberto Simões Parsing in Perl
What we will talk about
Parsing...
what it is...
the tools to make it!
But not how to do it!
show some examples...
and compare their efficiency.
Alberto Simões Parsing in Perl
What we will talk about
Parsing...
what it is...
the tools to make it!
But not how to do it!
show some examples...
and compare their efficiency.
Alberto Simões Parsing in Perl
What we will talk about
Parsing...
what it is...
the tools to make it!
But not how to do it!
show some examples...
and compare their efficiency.
Alberto Simões Parsing in Perl
The Definitions
Alberto Simões Parsing in Perl
Parsing
In computer science, parsing is the process of analyzing an input
sequence (read from a file or a keyboard, for example) in order to
determine its grammatical structure with respect to a given formal
grammar. It is formally named syntax analysis. A parser is a
computer program that carries out this task. The name is
analogous with the usage in grammar and linguistics.
Parsing transforms input text into a data structure, usually a tree,
which is suitable for later processing and which captures the
implied hierarchy of the input. Generally, parsers operate in two
stages, first identifying the meaningful tokens in the input, and
then building a parse tree from those tokens.
Wikipedia (August 2006)
Alberto Simões Parsing in Perl
Parsing
In computer science, parsing is the process of analyzing an input
sequence (read from a file or a keyboard, for example) in order to
determine its grammatical structure with respect to a given formal
grammar. It is formally named syntax analysis. A parser is a
computer program that carries out this task. The name is
analogous with the usage in grammar and linguistics.
Parsing transforms input text into a data structure, usually a tree,
which is suitable for later processing and which captures the
implied hierarchy of the input. Generally, parsers operate in two
stages, first identifying the meaningful tokens in the input, and
then building a parse tree from those tokens.
Wikipedia (August 2006)
Alberto Simões Parsing in Perl
The Process
Lexical analysis is the processing of an input sequence of
characters (such as the source code of a computer program)
to produce, as output, a sequence of symbols called “lexical
tokens”, or just “tokens”. For example, lexers for many
programming languages convert the character sequence 123
abc into two tokens: 123 and abc (whitespace is not a token
in most languages). The purpose of producing these tokens is
usually to forward them as input to another program, such as
a parser.
Syntax analysis is a process in compilers that recognizes the
structure of programming languages. It is also known as
parsing.
Wikipedia (August 2006)
Alberto Simões Parsing in Perl
The Process
Lexical analysis is the processing of an input sequence of
characters (such as the source code of a computer program)
to produce, as output, a sequence of symbols called “lexical
tokens”, or just “tokens”. For example, lexers for many
programming languages convert the character sequence 123
abc into two tokens: 123 and abc (whitespace is not a token
in most languages). The purpose of producing these tokens is
usually to forward them as input to another program, such as
a parser.
Syntax analysis is a process in compilers that recognizes the
structure of programming languages. It is also known as
parsing.
Wikipedia (August 2006)
Alberto Simões Parsing in Perl
Approaches
Top-down parsing - A parser can start with the start symbol
and try to transform it to the input. Intuitively, the parser
starts from the largest elements and breaks them down into
incrementally smaller parts. LL parsers are examples of
top-down parsers.
Bottom-up parsing - A parser can start with the input and
attempt to rewrite it to the start symbol. Intuitively, the
parser attempts to locate the most basic elements, then the
elements containing these, and so on. LR parsers are examples
of bottom-up parsers. Another term used for this type of
parser is Shift-Reduce parsing
Wikipedia (August 2006)
Alberto Simões Parsing in Perl
Approaches
Top-down parsing - A parser can start with the start symbol
and try to transform it to the input. Intuitively, the parser
starts from the largest elements and breaks them down into
incrementally smaller parts. LL parsers are examples of
top-down parsers.
Bottom-up parsing - A parser can start with the input and
attempt to rewrite it to the start symbol. Intuitively, the
parser attempts to locate the most basic elements, then the
elements containing these, and so on. LR parsers are examples
of bottom-up parsers. Another term used for this type of
parser is Shift-Reduce parsing
Wikipedia (August 2006)
Alberto Simões Parsing in Perl
...boring...
Forget Wikipedia!
Alberto Simões Parsing in Perl
What is Parsing?
to recognize portions of text:
detect tokens;
integers, reals, strings, variables, reserved words, etc.
analyze a specific token sequence:
detect syntax;
define the order tokens make sense;
interpret the sequence and perform an action:
perform semantic actions;
execute the code defined; generate code;
Alberto Simões Parsing in Perl
What is Parsing?
to recognize portions of text:
detect tokens;
integers, reals, strings, variables, reserved words, etc.
analyze a specific token sequence:
detect syntax;
define the order tokens make sense;
interpret the sequence and perform an action:
perform semantic actions;
execute the code defined; generate code;
Alberto Simões Parsing in Perl
What is Parsing?
to recognize portions of text:
detect tokens;
integers, reals, strings, variables, reserved words, etc.
analyze a specific token sequence:
detect syntax;
define the order tokens make sense;
interpret the sequence and perform an action:
perform semantic actions;
execute the code defined; generate code;
Alberto Simões Parsing in Perl
What is Parsing?
to recognize portions of text:
detect tokens;
integers, reals, strings, variables, reserved words, etc.
analyze a specific token sequence:
detect syntax;
define the order tokens make sense;
interpret the sequence and perform an action:
perform semantic actions;
execute the code defined; generate code;
Alberto Simões Parsing in Perl
What is Parsing?
to recognize portions of text:
detect tokens;
integers, reals, strings, variables, reserved words, etc.
analyze a specific token sequence:
detect syntax;
define the order tokens make sense;
interpret the sequence and perform an action:
perform semantic actions;
execute the code defined; generate code;
Alberto Simões Parsing in Perl
What is Parsing?
to recognize portions of text:
detect tokens;
integers, reals, strings, variables, reserved words, etc.
analyze a specific token sequence:
detect syntax;
define the order tokens make sense;
interpret the sequence and perform an action:
perform semantic actions;
execute the code defined; generate code;
Alberto Simões Parsing in Perl
What is Parsing?
to recognize portions of text:
detect tokens;
integers, reals, strings, variables, reserved words, etc.
analyze a specific token sequence:
detect syntax;
define the order tokens make sense;
interpret the sequence and perform an action:
perform semantic actions;
execute the code defined; generate code;
Alberto Simões Parsing in Perl
What is Parsing?
to recognize portions of text:
detect tokens;
integers, reals, strings, variables, reserved words, etc.
analyze a specific token sequence:
detect syntax;
define the order tokens make sense;
interpret the sequence and perform an action:
perform semantic actions;
execute the code defined; generate code;
Alberto Simões Parsing in Perl
What is Parsing?
to recognize portions of text:
detect tokens;
integers, reals, strings, variables, reserved words, etc.
analyze a specific token sequence:
detect syntax;
define the order tokens make sense;
interpret the sequence and perform an action:
perform semantic actions;
execute the code defined; generate code;
Alberto Simões Parsing in Perl
So, Regular Expressions?
yes!
RegExp are good for tokens;
RegExps are good for regular expressions :-)
no!
most real grammars can’t be parsed with RegExps;
Alberto Simões Parsing in Perl
So, Regular Expressions?
yes!
RegExp are good for tokens;
RegExps are good for regular expressions :-)
no!
most real grammars can’t be parsed with RegExps;
Alberto Simões Parsing in Perl
So, Regular Expressions?
yes!
RegExp are good for tokens;
RegExps are good for regular expressions :-)
no!
most real grammars can’t be parsed with RegExps;
Alberto Simões Parsing in Perl
So, Regular Expressions?
yes!
RegExp are good for tokens;
RegExps are good for regular expressions :-)
no!
most real grammars can’t be parsed with RegExps;
Alberto Simões Parsing in Perl
So, Regular Expressions?
yes!
RegExp are good for tokens;
RegExps are good for regular expressions :-)
no!
most real grammars can’t be parsed with RegExps;
Alberto Simões Parsing in Perl
Then?
Typically:
flex for lexical analysis
(re2c for thread-safe and reentrancy);
bison for syntactic analysis
(lemon for thread-safe and reentrancy);
but that is for C;
Perl 5 has lexical analysis (RegExps);
Perl 5 doesn’t have Grammar Support;
but we have CPAN!;
Parse::RecDescent;
Parse::Yapp;
Parse::YALALR;
Perl 6 will have Grammar Support (Hurray!)
PGE — Parrot Grammar Engine;
Alberto Simões Parsing in Perl
Then?
Typically:
flex for lexical analysis
(re2c for thread-safe and reentrancy);
bison for syntactic analysis
(lemon for thread-safe and reentrancy);
but that is for C;
Perl 5 has lexical analysis (RegExps);
Perl 5 doesn’t have Grammar Support;
but we have CPAN!;
Parse::RecDescent;
Parse::Yapp;
Parse::YALALR;
Perl 6 will have Grammar Support (Hurray!)
PGE — Parrot Grammar Engine;
Alberto Simões Parsing in Perl
Then?
Typically:
flex for lexical analysis
(re2c for thread-safe and reentrancy);
bison for syntactic analysis
(lemon for thread-safe and reentrancy);
but that is for C;
Perl 5 has lexical analysis (RegExps);
Perl 5 doesn’t have Grammar Support;
but we have CPAN!;
Parse::RecDescent;
Parse::Yapp;
Parse::YALALR;
Perl 6 will have Grammar Support (Hurray!)
PGE — Parrot Grammar Engine;
Alberto Simões Parsing in Perl
Then?
Typically:
flex for lexical analysis
(re2c for thread-safe and reentrancy);
bison for syntactic analysis
(lemon for thread-safe and reentrancy);
but that is for C;
Perl 5 has lexical analysis (RegExps);
Perl 5 doesn’t have Grammar Support;
but we have CPAN!;
Parse::RecDescent;
Parse::Yapp;
Parse::YALALR;
Perl 6 will have Grammar Support (Hurray!)
PGE — Parrot Grammar Engine;
Alberto Simões Parsing in Perl
Then?
Typically:
flex for lexical analysis
(re2c for thread-safe and reentrancy);
bison for syntactic analysis
(lemon for thread-safe and reentrancy);
but that is for C;
Perl 5 has lexical analysis (RegExps);
Perl 5 doesn’t have Grammar Support;
but we have CPAN!;
Parse::RecDescent;
Parse::Yapp;
Parse::YALALR;
Perl 6 will have Grammar Support (Hurray!)
PGE — Parrot Grammar Engine;
Alberto Simões Parsing in Perl
Then?
Typically:
flex for lexical analysis
(re2c for thread-safe and reentrancy);
bison for syntactic analysis
(lemon for thread-safe and reentrancy);
but that is for C;
Perl 5 has lexical analysis (RegExps);
Perl 5 doesn’t have Grammar Support;
but we have CPAN!;
Parse::RecDescent;
Parse::Yapp;
Parse::YALALR;
Perl 6 will have Grammar Support (Hurray!)
PGE — Parrot Grammar Engine;
Alberto Simões Parsing in Perl
Then?
Typically:
flex for lexical analysis
(re2c for thread-safe and reentrancy);
bison for syntactic analysis
(lemon for thread-safe and reentrancy);
but that is for C;
Perl 5 has lexical analysis (RegExps);
Perl 5 doesn’t have Grammar Support;
but we have CPAN!;
Parse::RecDescent;
Parse::Yapp;
Parse::YALALR;
Perl 6 will have Grammar Support (Hurray!)
PGE — Parrot Grammar Engine;
Alberto Simões Parsing in Perl
Then?
Typically:
flex for lexical analysis
(re2c for thread-safe and reentrancy);
bison for syntactic analysis
(lemon for thread-safe and reentrancy);
but that is for C;
Perl 5 has lexical analysis (RegExps);
Perl 5 doesn’t have Grammar Support;
but we have CPAN!;
Parse::RecDescent;
Parse::Yapp;
Parse::YALALR;
Perl 6 will have Grammar Support (Hurray!)
PGE — Parrot Grammar Engine;
Alberto Simões Parsing in Perl
Then?
Typically:
flex for lexical analysis
(re2c for thread-safe and reentrancy);
bison for syntactic analysis
(lemon for thread-safe and reentrancy);
but that is for C;
Perl 5 has lexical analysis (RegExps);
Perl 5 doesn’t have Grammar Support;
but we have CPAN!;
Parse::RecDescent;
Parse::Yapp;
Parse::YALALR;
Perl 6 will have Grammar Support (Hurray!)
PGE — Parrot Grammar Engine;
Alberto Simões Parsing in Perl
Then?
Typically:
flex for lexical analysis
(re2c for thread-safe and reentrancy);
bison for syntactic analysis
(lemon for thread-safe and reentrancy);
but that is for C;
Perl 5 has lexical analysis (RegExps);
Perl 5 doesn’t have Grammar Support;
but we have CPAN!;
Parse::RecDescent;
Parse::Yapp;
Parse::YALALR;
Perl 6 will have Grammar Support (Hurray!)
PGE — Parrot Grammar Engine;
Alberto Simões Parsing in Perl
Then?
Typically:
flex for lexical analysis
(re2c for thread-safe and reentrancy);
bison for syntactic analysis
(lemon for thread-safe and reentrancy);
but that is for C;
Perl 5 has lexical analysis (RegExps);
Perl 5 doesn’t have Grammar Support;
but we have CPAN!;
Parse::RecDescent;
Parse::Yapp;
Parse::YALALR;
Perl 6 will have Grammar Support (Hurray!)
PGE — Parrot Grammar Engine;
Alberto Simões Parsing in Perl
Then?
Typically:
flex for lexical analysis
(re2c for thread-safe and reentrancy);
bison for syntactic analysis
(lemon for thread-safe and reentrancy);
but that is for C;
Perl 5 has lexical analysis (RegExps);
Perl 5 doesn’t have Grammar Support;
but we have CPAN!;
Parse::RecDescent;
Parse::Yapp;
Parse::YALALR;
Perl 6 will have Grammar Support (Hurray!)
PGE — Parrot Grammar Engine;
Alberto Simões Parsing in Perl
Then?
Typically:
flex for lexical analysis
(re2c for thread-safe and reentrancy);
bison for syntactic analysis
(lemon for thread-safe and reentrancy);
but that is for C;
Perl 5 has lexical analysis (RegExps);
Perl 5 doesn’t have Grammar Support;
but we have CPAN!;
Parse::RecDescent;
Parse::Yapp;
Parse::YALALR;
Perl 6 will have Grammar Support (Hurray!)
PGE — Parrot Grammar Engine;
Alberto Simões Parsing in Perl
What I’ve tested
flex + bison;
re2c + lemon;
Parse::RecDescent;
Parse::YAPP;
flex + Parse::YAPP;
Parrot Grammar Engine
flex+bison and re2c+lemon will appear just at the end, as a
baseline of efficiency.
Alberto Simões Parsing in Perl
What I’ve tested
flex + bison;
re2c + lemon;
Parse::RecDescent;
Parse::YAPP;
flex + Parse::YAPP;
Parrot Grammar Engine
flex+bison and re2c+lemon will appear just at the end, as a
baseline of efficiency.
Alberto Simões Parsing in Perl
What I’ve tested
flex + bison;
re2c + lemon;
Parse::RecDescent;
Parse::YAPP;
flex + Parse::YAPP;
Parrot Grammar Engine
flex+bison and re2c+lemon will appear just at the end, as a
baseline of efficiency.
Alberto Simões Parsing in Perl
What I’ve tested
flex + bison;
re2c + lemon;
Parse::RecDescent;
Parse::YAPP;
flex + Parse::YAPP;
Parrot Grammar Engine
flex+bison and re2c+lemon will appear just at the end, as a
baseline of efficiency.
Alberto Simões Parsing in Perl
What I’ve tested
flex + bison;
re2c + lemon;
Parse::RecDescent;
Parse::YAPP;
flex + Parse::YAPP;
Parrot Grammar Engine
flex+bison and re2c+lemon will appear just at the end, as a
baseline of efficiency.
Alberto Simões Parsing in Perl
What I’ve tested
flex + bison;
re2c + lemon;
Parse::RecDescent;
Parse::YAPP;
flex + Parse::YAPP;
Parrot Grammar Engine
flex+bison and re2c+lemon will appear just at the end, as a
baseline of efficiency.
Alberto Simões Parsing in Perl
What I’ve tested
flex + bison;
re2c + lemon;
Parse::RecDescent;
Parse::YAPP;
flex + Parse::YAPP;
Parrot Grammar Engine
flex+bison and re2c+lemon will appear just at the end, as a
baseline of efficiency.
Alberto Simões Parsing in Perl
My Test Case (1/2)
a simple calculator;
sums, subtractions, variables, prints;
BNF:
Program ← Statement Program
Statement
0
Statement ← Variable =0 Expression 0 ;0
0
print 0 Expression 0 ;0
0
Expression ← Expression −0 Expression
0
Expression +0 Expression
Variable
Number
Number ← /\d + /
Variable ← /[a − z] + /
Alberto Simões Parsing in Perl
My Test Case (1/2)
a simple calculator;
sums, subtractions, variables, prints;
BNF:
Program ← Statement Program
Statement
0
Statement ← Variable =0 Expression 0 ;0
0
print 0 Expression 0 ;0
0
Expression ← Expression −0 Expression
0
Expression +0 Expression
Variable
Number
Number ← /\d + /
Variable ← /[a − z] + /
Alberto Simões Parsing in Perl
My Test Case (1/2)
a simple calculator;
sums, subtractions, variables, prints;
BNF:
Program ← Statement Program
Statement
0
Statement ← Variable =0 Expression 0 ;0
0
print 0 Expression 0 ;0
0
Expression ← Expression −0 Expression
0
Expression +0 Expression
Variable
Number
Number ← /\d + /
Variable ← /[a − z] + /
Alberto Simões Parsing in Perl
My Test Case (2/2)
automatic test generation;
randomly add, subtract and define variables;
randomly print variables;
example:
a = 10;
a = 150 - a + 350;
print a;
different test sizes:
10 lines;
100 lines;
1 000 lines;
10 000 lines;
100 000 lines;
1 000 000 lines;
2 000 000 lines;
4 000 000 lines;
6 000 000 lines;
Alberto Simões Parsing in Perl
My Test Case (2/2)
automatic test generation;
randomly add, subtract and define variables;
randomly print variables;
example:
a = 10;
a = 150 - a + 350;
print a;
different test sizes:
10 lines;
100 lines;
1 000 lines;
10 000 lines;
100 000 lines;
1 000 000 lines;
2 000 000 lines;
4 000 000 lines;
6 000 000 lines;
Alberto Simões Parsing in Perl
My Test Case (2/2)
automatic test generation;
randomly add, subtract and define variables;
randomly print variables;
example:
a = 10;
a = 150 - a + 350;
print a;
different test sizes:
10 lines;
100 lines;
1 000 lines;
10 000 lines;
100 000 lines;
1 000 000 lines;
2 000 000 lines;
4 000 000 lines;
6 000 000 lines;
Alberto Simões Parsing in Perl
My Test Case (2/2)
automatic test generation;
randomly add, subtract and define variables;
randomly print variables;
example:
a = 10;
a = 150 - a + 350;
print a;
different test sizes:
10 lines;
100 lines;
1 000 lines;
10 000 lines;
100 000 lines;
1 000 000 lines;
2 000 000 lines;
4 000 000 lines;
6 000 000 lines;
Alberto Simões Parsing in Perl
My Test Case (2/2)
automatic test generation;
randomly add, subtract and define variables;
randomly print variables;
example:
a = 10;
a = 150 - a + 350;
print a;
different test sizes:
10 lines;
100 lines;
1 000 lines;
10 000 lines;
100 000 lines;
1 000 000 lines;
2 000 000 lines;
4 000 000 lines;
6 000 000 lines;
Alberto Simões Parsing in Perl
My Test Case (2/2)
automatic test generation;
randomly add, subtract and define variables;
randomly print variables;
example:
a = 10;
a = 150 - a + 350;
print a;
different test sizes:
10 lines;
100 lines;
1 000 lines;
10 000 lines;
100 000 lines;
1 000 000 lines;
2 000 000 lines;
4 000 000 lines;
6 000 000 lines;
Alberto Simões Parsing in Perl
My Test Case (2/2)
automatic test generation;
randomly add, subtract and define variables;
randomly print variables;
example:
a = 10;
a = 150 - a + 350;
print a;
different test sizes:
10 lines;
100 lines;
1 000 lines;
10 000 lines;
100 000 lines;
1 000 000 lines;
2 000 000 lines;
4 000 000 lines;
6 000 000 lines;
Alberto Simões Parsing in Perl
My Test Case (2/2)
automatic test generation;
randomly add, subtract and define variables;
randomly print variables;
example:
a = 10;
a = 150 - a + 350;
print a;
different test sizes:
10 lines;
100 lines;
1 000 lines;
10 000 lines;
100 000 lines;
1 000 000 lines;
2 000 000 lines;
4 000 000 lines;
6 000 000 lines;
Alberto Simões Parsing in Perl
My Test Case (2/2)
automatic test generation;
randomly add, subtract and define variables;
randomly print variables;
example:
a = 10;
a = 150 - a + 350;
print a;
different test sizes:
10 lines;
100 lines;
1 000 lines;
10 000 lines;
100 000 lines;
1 000 000 lines;
2 000 000 lines;
4 000 000 lines;
6 000 000 lines;
Alberto Simões Parsing in Perl
My Test Case (2/2)
automatic test generation;
randomly add, subtract and define variables;
randomly print variables;
example:
a = 10;
a = 150 - a + 350;
print a;
different test sizes:
10 lines;
100 lines;
1 000 lines;
10 000 lines;
100 000 lines;
1 000 000 lines;
2 000 000 lines;
4 000 000 lines;
6 000 000 lines;
Alberto Simões Parsing in Perl
My Test Case (2/2)
automatic test generation;
randomly add, subtract and define variables;
randomly print variables;
example:
a = 10;
a = 150 - a + 350;
print a;
different test sizes:
10 lines;
100 lines;
1 000 lines;
10 000 lines;
100 000 lines;
1 000 000 lines;
2 000 000 lines;
4 000 000 lines;
6 000 000 lines;
Alberto Simões Parsing in Perl
My Test Case (2/2)
automatic test generation;
randomly add, subtract and define variables;
randomly print variables;
example:
a = 10;
a = 150 - a + 350;
print a;
different test sizes:
10 lines;
100 lines;
1 000 lines;
10 000 lines;
100 000 lines;
1 000 000 lines;
2 000 000 lines;
4 000 000 lines;
6 000 000 lines;
Alberto Simões Parsing in Perl
My Test Case (2/2)
automatic test generation;
randomly add, subtract and define variables;
randomly print variables;
example:
a = 10;
a = 150 - a + 350;
print a;
different test sizes:
10 lines;
100 lines;
1 000 lines;
10 000 lines;
100 000 lines;
1 000 000 lines;
2 000 000 lines;
4 000 000 lines;
6 000 000 lines;
Alberto Simões Parsing in Perl
My Test Case (2/2)
automatic test generation;
randomly add, subtract and define variables;
randomly print variables;
example:
a = 10;
a = 150 - a + 350;
print a;
different test sizes:
10 lines;
100 lines;
1 000 lines;
10 000 lines;
100 000 lines;
1 000 000 lines;
2 000 000 lines;
4 000 000 lines;
6 000 000 lines;
Alberto Simões Parsing in Perl
Now, the results
Alberto Simões Parsing in Perl
Parse::RecDescent ID
Author: Damian Conway
Lastest Release: 1.94 (April 9, 2003)
Available from: CPAN
Alberto Simões Parsing in Perl
Parse::RecDescent rationale
⇑ full Perl implementation;
⇑ mixed lexical and syntactic analyzer in same code;
⇓ slow;
⇓ only support LL(1) grammars;
Alberto Simões Parsing in Perl
Parse::RecDescent
use Parse::RecDescent;
our %VAR;
my $grammar = q{
Program: Statement(s) /\Z/ { 1 }
Statement: Var ’=’ Expression ’;’ { $main::VAR{$item[1]} = $item[3]; }
| /print/ Expression ’;’ { print "> $item[2]\n"; }
Expression: Number ’+’ Expression { $item[1]+$item[3] }
| Number ’-’ Expression { $item[1]-$item[3] }
| Var ’+’ Expression { ($main::VAR{$item[1]} || 0) + $item[3] }
| Var ’-’ Expression { ($main::VAR{$item[1]} || 0) + $item[3] }
| Var { $main::VAR{$item[1]} || 0; }
| Number { $item[1]; }
Number: /+
./
Var: /[a-z]+/
};
my $parser = new Parse::RecDescent($grammar);
undef $/;
my $text = <STDIN>;
$parser->Program($text) or die "** Parse Error **\n";
Alberto Simões Parsing in Perl
Problems
Unfortunately, the program does not respect left association of the
operators. Couldn’t manage to solve that (didn’t try hard).
3 − 2 + 1 is evaluated as Number (3) − Expression(2 + 1), thus,
evaluating it to 0 instead of the correct answer: 2
Well, I had a cheat version, but it made the test program a lot
slower than it is at the moment.
Alberto Simões Parsing in Perl
Problems
Unfortunately, the program does not respect left association of the
operators. Couldn’t manage to solve that (didn’t try hard).
3 − 2 + 1 is evaluated as Number (3) − Expression(2 + 1), thus,
evaluating it to 0 instead of the correct answer: 2
Well, I had a cheat version, but it made the test program a lot
slower than it is at the moment.
Alberto Simões Parsing in Perl
Problems
Unfortunately, the program does not respect left association of the
operators. Couldn’t manage to solve that (didn’t try hard).
3 − 2 + 1 is evaluated as Number (3) − Expression(2 + 1), thus,
evaluating it to 0 instead of the correct answer: 2
Well, I had a cheat version, but it made the test program a lot
slower than it is at the moment.
Alberto Simões Parsing in Perl
Parse::RecDescent timings
test size spent time
10 0.104 s
100 0.203 s
1 000 1.520 s
10 000 87.310 s
Alberto Simões Parsing in Perl
Parse::RecDescent Memory Usage
perl recdes.pl 1,778,617,585,999 bytes x ms
bytes
x809F49D:Perl_safesysmal
6M
x809F54B:Perl_safesysrea
4M
2M heap-admin
0M
0.0 20000.0
40000.0
60000.0
80000.0
100000.0
120000.0
140000.0
160000.0
180000.0
200000.0
220000.0
240000.0 ms
test file with 10 000 lines
Alberto Simões Parsing in Perl
Parse::YAPP ID
Author: Francois Desarmenien
Lastest Release: 1.05 (Nov 4, 2001)
Available from: CPAN
Alberto Simões Parsing in Perl
Parse::YAPP rationale
⇑ full Perl implementation;
⇑ supports bison-like LR grammars;
⇓ you need to specify your own lexical analyzer;
⇓ slow for big input files...
if you do not prepare a good lexical analyzer;
Alberto Simões Parsing in Perl
Parse::Yapp
%left ’+’ ’-’
%%
Program : Statement
| Program Statement
;
Statement : Var ’=’ Expression ’;’ { $main::VAR$_[1] = $_[3] }
| Print Expression ’;’ { print "> $_[2]\n" }
;
Expression : Expression ’-’ Expression { $_[1] - $_[3] }
| Expression ’+’ Expression { $_[1] + $_[3] }
| Var { $main::VAR{$_[1]} || 0 }
| Number { $_[1] }
;
%%
our %VAR;
my $p = new Calc();
undef $/;
my $File = <STDIN>;
$p->YYParse( yylex => \&yylex,
yyerror => \&yyerror);
Alberto Simões Parsing in Perl
Parse::Yapp
sub yyerror {
if ($_[0]->YYCurtok) {
printf STDERR (’Error: a "%s" (%s) was fond where %s was expected’."\n",
$_[0]->YYCurtok, $_[0]->YYCurval, $_[0]->YYExpect)
} else {
print STDERR "Expecting one of ",join(", ",$_[0]->YYExpect),"\n";
}
}
sub yylex{
for($File){
1 while (s!^(\s+|\n)!!g); # Advance spaces
return ("","") if $_ eq ""; # EOF
# Tokens
s!^(\d+)!! and return ("Number", $1);
s!^print!! and return ("Print", "print");
s!^([a-z]+)!! and return ("Var", $1);
# Operators
s!([;+-=])!! and return ($1,$1);
print STDERR "Unexpected symbols: ’$File’\n" ;
}
}
Alberto Simões Parsing in Perl
Parse::YAPP timings
test size Parse::RecDescent Parse::YAPP
10 0.104 s 0.016 s
100 0.203 s 0.034 s
1 000 1.520 s 0.272 s
10 000 87.310 s 4.972 s
100 000 — 2 253.657 s
Alberto Simões Parsing in Perl
Parse::Yapp Memory Usage
perl Calc.pl 74,532,562,124 bytes x ms
bytes
1,200k
x809F49D:Perl_safesysmal
1,000k
800k
heap-admin
600k
400k
x809F54B:Perl_safesysrea
200k
0k
0.0 20000.0 40000.0 60000.0 ms
test file with 10 000 lines
Alberto Simões Parsing in Perl
Parse::YAPP + flex ID
Idea by: Alberto Simões
Latest Release: n/a
Available from: The Perl Review v0i3, 2002
Alberto Simões Parsing in Perl
Parse::YAPP+flex rationale
⇑ fast and robust for big input files;
⇑ supports bison-like LR grammars;
⇓ to glue Perl and C takes some work;
⇓ you need a C compiler;
⇓ you need to know a little of C and flex;
Alberto Simões Parsing in Perl
Parse::Yapp + flex: the lexical analyzer
%{
#define YY_DECL char* yylex() void;
%}
char buffer[15];
%%
"print" { return strcpy(buffer, "Print"); }
[0-9]+ { return strcpy(buffer, "Number"); }
[a-z]+ { return strcpy(buffer, "Var"); }
\n { }
" " { }
. { return strcpy(buffer, yytext); }
%%
int perl_yywrap(void) { return 1; }
char *perl_yylextext(void) { return perl_yytext; }
Alberto Simões Parsing in Perl
Parse::Yapp + flex: the syntactic analyzer
%left ’+’ ’-’
%%
Program : Statement
| Program Statement
;
Statement : Var ’=’ Expression ’;’ { $main::VAR$_[1] = $_[3] }
| Print Expression ’;’ { print "> $_[2]\n"; }
;
Expression : Expression ’-’ Expression { $_[1] - $_[3] }
| Expression ’+’ Expression { $_[1] + $_[3] }
| Var { $main::VAR{$_[1]} || 0 }
| Number { $_[1] }
;
%%
our %VAR;
Alberto Simões Parsing in Perl
Parse::Yapp + flex: just that?
NO!
you need XS glue code;
you need some Perl glue code;
you need a decent makefile;
Can you give details?
Check my article “Cooking Perl with Flex” in TPR v0i3, 2002;
http://alfarrabio.di.uminho.pt/~albie/
publications/perlflex.pdf
Alberto Simões Parsing in Perl
Parse::Yapp + flex: just that?
NO!
you need XS glue code;
you need some Perl glue code;
you need a decent makefile;
Can you give details?
Check my article “Cooking Perl with Flex” in TPR v0i3, 2002;
http://alfarrabio.di.uminho.pt/~albie/
publications/perlflex.pdf
Alberto Simões Parsing in Perl
Parse::Yapp + flex: just that?
NO!
you need XS glue code;
you need some Perl glue code;
you need a decent makefile;
Can you give details?
Check my article “Cooking Perl with Flex” in TPR v0i3, 2002;
http://alfarrabio.di.uminho.pt/~albie/
publications/perlflex.pdf
Alberto Simões Parsing in Perl
Parse::Yapp + flex: just that?
NO!
you need XS glue code;
you need some Perl glue code;
you need a decent makefile;
Can you give details?
Check my article “Cooking Perl with Flex” in TPR v0i3, 2002;
http://alfarrabio.di.uminho.pt/~albie/
publications/perlflex.pdf
Alberto Simões Parsing in Perl
Parse::Yapp + flex: just that?
NO!
you need XS glue code;
you need some Perl glue code;
you need a decent makefile;
Can you give details?
Check my article “Cooking Perl with Flex” in TPR v0i3, 2002;
http://alfarrabio.di.uminho.pt/~albie/
publications/perlflex.pdf
Alberto Simões Parsing in Perl
Parse::Yapp + flex: just that?
NO!
you need XS glue code;
you need some Perl glue code;
you need a decent makefile;
Can you give details?
Check my article “Cooking Perl with Flex” in TPR v0i3, 2002;
http://alfarrabio.di.uminho.pt/~albie/
publications/perlflex.pdf
Alberto Simões Parsing in Perl
Parse::Yapp + flex: just that?
NO!
you need XS glue code;
you need some Perl glue code;
you need a decent makefile;
Can you give details?
Check my article “Cooking Perl with Flex” in TPR v0i3, 2002;
http://alfarrabio.di.uminho.pt/~albie/
publications/perlflex.pdf
Alberto Simões Parsing in Perl
Parse::Yapp + flex: just that?
NO!
you need XS glue code;
you need some Perl glue code;
you need a decent makefile;
Can you give details?
Check my article “Cooking Perl with Flex” in TPR v0i3, 2002;
http://alfarrabio.di.uminho.pt/~albie/
publications/perlflex.pdf
Alberto Simões Parsing in Perl
Parse::YAPP + flex timings
test size RecDescent YAPP YAPP + flex
10 0.104 s 0.016 s 0.034 s
100 0.203 s 0.034 s 0.049 s
1 000 1.520 s 0.272 s 0.174 s
10 000 87.310 s 4.972 s 1.168 s
100 000 — 2 253.657 s 12.145 s
1 000 000 — — 122.377 s
2 000 000 — — 264.219 s
4 000 000 — — 530.527 s
6 000 000 — — 800.705 s
Alberto Simões Parsing in Perl
Parse::Yapp + flex Memory Usage
perl parse.pl 20,106,601,308 bytes x ms
bytes
x809F49D:Perl_safesysmal
600k
heap-admin
400k
x4032CAF:perl_yyalloc
200k
x809F54B:Perl_safesysrea
0k
0.0 2000.0 4000.0 6000.0 8000.0 10000.012000.014000.016000.018000.020000.022000.0 ms
test file with 10 000 lines
Alberto Simões Parsing in Perl
Parrot Grammar Engine ID
Author: mostly, Patrick Michaud
Lastest Release: to be released yet
Available from: Parrot releases or Parrot SVN tree
Alberto Simões Parsing in Perl
PGE rationale
⇑ built-in in Perl 6;
⇑ includes constructs to simplify the LL(1) constrain;
m not yet fast... but we are working on it;
⇓ Mainly a top-down parser (although bottom-up should also be suppo
⇓ ATM you need to write semantic actions in PIR;
Alberto Simões Parsing in Perl
PGE implementation
grammar Benchmark;
token program { <?statement>+ }
rule statement {
| print <expression> ; {{ $I0 = match[’expression’];
print $I0; print "\n" }}
| <var> = <expression> ; {{ $P0 = match[’expression’];
$S0 = match[’var’]; set_global $S0, $P0 }}
}
rule expression { <value> [ <add> | <sub> ]* {{ $I0 = match[’value’]
# 25 lines removed...
.return($I0) }}
}
rule add { \+ <value> }
rule sub { \- <value> }
rule value { <number> {{ $I0 = match[’number’]; .return ($I0) }}
| <var> {{ $S0 = match[’var’];
$P0 = get_global $S0; $I0 = $P0; .return($I0) }}
}
token number { \d+ }
token var { <[a..z]>+ }
Alberto Simões Parsing in Perl
PGE timings
test size RecDescent YAPP YAPP + flex PGE
10 0.104 s 0.016 s 0.034 s 0.124 s
100 0.203 s 0.034 s 0.049 s 0.253 s
1 000 1.520 s 0.272 s 0.174 s 1.463 s
10 000 87.310 s 4.972 s 1.168 s 16.189 s
100 000 — 2 253.657 s 12.145 s 665.746 s
1 000 000 — — 122.377 s —
2 000 000 — — 264.219 s —
4 000 000 — — 530.527 s —
6 000 000 — — 800.705 s —
Alberto Simões Parsing in Perl
PGE Memory Usage
../../../../parrot -j main.pir 92,090,753,626 bytes x ms
bytes
8M x417A7DF:mem_sys_allocat
x417A73D:mem_sys_allocat
6M
x417A82F:mem__internal_a
4M
heap-admin
2M
x417A880:mem__sys_reallo
0M
0.0 2000.0 4000.0 6000.0 8000.0 10000.0 12000.0 ms
test file with 10 000 lines
Alberto Simões Parsing in Perl
Remember I had C implementations?
Let’s look into their memory usage.
Alberto Simões Parsing in Perl
Remember I had C implementations?
Let’s look into their memory usage.
Alberto Simões Parsing in Perl
Timings for C implementations
test size Parse:: Parse:: YAPP PGE re2c + flex +
RecDescent YAPP + flex lemon bison
10 0.104 s 0.016 s 0.034 s 0.124 s 0.001 s 0.001 s
100 0.203 s 0.034 s 0.049 s 0.253 s 0.001 s 0.001 s
1 000 1.520 s 0.272 s 0.174 s 1.463 s 0.002 s 0.002 s
10 000 87.310 s 4.972 s 1.168 s 16.189 s 0.009 s 0.009 s
100 000 — 2 253.657 s 12.145 s 665.746 s 0.089 s 0.103 s
1 000 000 — — 122.377 s — 0.850 s 0.862 s
2 000 000 — — 264.219 s — 1.896 s 1.891 s
4 000 000 — — 530.527 s — 4.327 s 3.604 s
6 000 000 — — 800.705 s — 5.681 s 5.665 s
Alberto Simões Parsing in Perl
flex+bison Memory Usage
parser 16,427,193 bytes x ms
bytes
x80492D9:yyalloc
60k
x40625FE:g_malloc0
40k
20k x401914F:posix_memalign
0k
0.0 50.0 100.0 150.0 200.0 250.0 300.0 350.0 ms
test file with 10 000 lines
Alberto Simões Parsing in Perl
re2c+lemon Memory Usage
parser 1,418,530 bytes x ms
bytes
x40625FE:g_malloc0
6k
x401914F:posix_memalign
4k
x8048BD2:ParseAlloc
2k
heap-admin
0k
0.0 50.0 100.0 150.0 200.0 250.0 300.0 ms
test file with 10 000 lines
Alberto Simões Parsing in Perl
Comparing them all
Alberto Simões Parsing in Perl
Performance Comparison
10000
re2c+lemon
bison+flex
1000 Parse::Yapp + flex
PGE
Parse::Yapp
100 Parse::RecDescent
Time (seconds)
10
0.1
0.01
0.001
10 100 1000 10000 100000 1e+06 1e+07
Test Size (lines)
Alberto Simões Parsing in Perl
Thanks!!
Luciano Rocha for the flex + bison and re2c + lemon
implementations;
Rúben Fonseca for the PGE idea;
Patrick Michaud and Kevin Tew for the PGE implementation;
and, of course, Larry Wall, Gloria Wall, Leopold Toetsch, Chip
Salzenberg, Allison Randal, Damian Conway, Anna
Kournikova, Francois Desarmenien, Jerry Gay, Will Coleda,
Simon Cozens, Vern Paxson, Jef Poskanzer, Kevin Gong, brian
d foy, Santa Claus, Audrey Tang, José João Almeida,
Batman, Jonathan Scott Duff, Nuno Carvalho, Marty Pauley,
León Brocard, Josette Garcia, James Tisdall, José Castro,
Michael Schwern, Pamela Anderson, Andy Lester, Abigail,
Nicholas Clark, Magda Joana Silva, Matt Diephouse, Ilya
Martynov, Wikipedia, Randal Schwartz, Dan Sugalski, Jon
Orwant, Tom Christiansen, Johan Vromans, ........................
Alberto Simões Parsing in Perl
Thanks!!
Luciano Rocha for the flex + bison and re2c + lemon
implementations;
Rúben Fonseca for the PGE idea;
Patrick Michaud and Kevin Tew for the PGE implementation;
and, of course, Larry Wall, Gloria Wall, Leopold Toetsch, Chip
Salzenberg, Allison Randal, Damian Conway, Anna
Kournikova, Francois Desarmenien, Jerry Gay, Will Coleda,
Simon Cozens, Vern Paxson, Jef Poskanzer, Kevin Gong, brian
d foy, Santa Claus, Audrey Tang, José João Almeida,
Batman, Jonathan Scott Duff, Nuno Carvalho, Marty Pauley,
León Brocard, Josette Garcia, James Tisdall, José Castro,
Michael Schwern, Pamela Anderson, Andy Lester, Abigail,
Nicholas Clark, Magda Joana Silva, Matt Diephouse, Ilya
Martynov, Wikipedia, Randal Schwartz, Dan Sugalski, Jon
Orwant, Tom Christiansen, Johan Vromans, ........................
Alberto Simões Parsing in Perl
Thanks!!
Luciano Rocha for the flex + bison and re2c + lemon
implementations;
Rúben Fonseca for the PGE idea;
Patrick Michaud and Kevin Tew for the PGE implementation;
and, of course, Larry Wall, Gloria Wall, Leopold Toetsch, Chip
Salzenberg, Allison Randal, Damian Conway, Anna
Kournikova, Francois Desarmenien, Jerry Gay, Will Coleda,
Simon Cozens, Vern Paxson, Jef Poskanzer, Kevin Gong, brian
d foy, Santa Claus, Audrey Tang, José João Almeida,
Batman, Jonathan Scott Duff, Nuno Carvalho, Marty Pauley,
León Brocard, Josette Garcia, James Tisdall, José Castro,
Michael Schwern, Pamela Anderson, Andy Lester, Abigail,
Nicholas Clark, Magda Joana Silva, Matt Diephouse, Ilya
Martynov, Wikipedia, Randal Schwartz, Dan Sugalski, Jon
Orwant, Tom Christiansen, Johan Vromans, ........................
Alberto Simões Parsing in Perl
Thanks!!
Luciano Rocha for the flex + bison and re2c + lemon
implementations;
Rúben Fonseca for the PGE idea;
Patrick Michaud and Kevin Tew for the PGE implementation;
and, of course, Larry Wall, Gloria Wall, Leopold Toetsch, Chip
Salzenberg, Allison Randal, Damian Conway, Anna
Kournikova, Francois Desarmenien, Jerry Gay, Will Coleda,
Simon Cozens, Vern Paxson, Jef Poskanzer, Kevin Gong, brian
d foy, Santa Claus, Audrey Tang, José João Almeida,
Batman, Jonathan Scott Duff, Nuno Carvalho, Marty Pauley,
León Brocard, Josette Garcia, James Tisdall, José Castro,
Michael Schwern, Pamela Anderson, Andy Lester, Abigail,
Nicholas Clark, Magda Joana Silva, Matt Diephouse, Ilya
Martynov, Wikipedia, Randal Schwartz, Dan Sugalski, Jon
Orwant, Tom Christiansen, Johan Vromans, ........................
Alberto Simões Parsing in Perl