pigeon - a PEG parser generator for Go

The pigeon command generates parsers based on a parsing expression grammar (PEG). Its grammar and syntax is inspired by the PEG.js project, while the implementation is loosely based on the parsing expression grammar for C# 3.0 article. It parses Unicode text encoded in UTF-8.

Features for this fork

Performance improvements (5-10x faster than the original version):
- Removed parser.state, because it's very slow.
- Memoized can be worked together with Optimize option.
- More memory efficient - reduces memory allocations
- Generates cleaner and less lines of code
parseSeqExpr only collects values that are explicitly returned, improving performance:
- Example 1: e <- items:(__ Integer __)+ EOF { return items } returns an empty array [] since no values are explicitly marked for collection
- Example 2: e <- items:(__ i:Integer __ { return i })+ EOF { return items } returns [1, 2, 3] because integers are explicitly collected
- This is more efficient than the original version which would return [[nil, 1, nil], [nil, 2, nil], [nil, 3, nil]] with unnecessary nil values
String capture:
- expr <- val:<anotherExpr> { fmt.Println(val.(string)) } // you got the string value in val
- expr <- val:<(A '=' B)> { fmt.Println(val.(string)) } // you got the string value in val, e.g. "A=B"
Use code to control matching behavior (andCodeExpr and notCodeExpr):
- expr <- &{ return c.data.AllowNumber } [0-9]+ // Only matches digits if c.data.AllowNumber is true
- expr <- val:<[0-9]+> &{ return val.(string) == "123" } { return val.(string) } // Only succeeds if the matched string equals "123"
- Tips: don't enable memoized if you control matching behavior by code.
Logical and / or match:
- expr <- &&testExpr testExpr // if testExpr return ok but matched nothing (e.g. testExpr <- 'A'*), &&testEpr returns false.
Multiple peg files supported:
1. pigeon -o script1.peg.go script1.peg to generate a normal parser.
2. Run pigeon -grammar-only -grammar-name=g2 -run-func-prefix="_s2_" -o script2.peg.go script2.peg to generate grammar only code in same package.
3. Use it by newParser("filename", "expr").parse(g2)
Simplified actionExpr:
- The original version required two parameters to return (val, error), but errors are rarely used. So this fork simplifies the return values.
- Examples:
  - expr <- [0-9]+ { fmt.Println(expr) } is ok in this fork, returns nothing.
  - expr <- "true" { return 1 } if you want return something.
- Add an error by manual:
  - expr <- "if" { p.addErr(errors.New("keyword is not allowed")) }, equals to expr <- "if" { return nil, errors.New("keyword is not allowed") } of original pigeon.
Provide a struct(ParserCustomData) to embed, to replace the globalStore
- Must define a struct ParserCustomData in your module.
- Access data by c.data, for example: expr <- { fmt.Println(c.data.MyOption) }
- globalState is removed.
Remove ParseFile ParseReader, rename Parse and all options to lowercase issue, branch feat/rename-exported-api
- ParseReader converts io.Reader to bytes, then invoke parse, it don't make sense.
- Function Parse and all options(MaxExpressions,Entrypoint,Statistics,Debug,Memoize,AllowInvalidUTF8,Recover,GlobalStore,InitState) expose to module user. I think expose them is not a good idea.
Skip "actionExpr" while looking ahead issue, branch feat/skip-code-expr-while-looking-ahead
- See detail in the issue.
- *{} / &{} / !{} won't skip.
ActionExpr refactored issue, branch refactor/actionExpr
- Unlimited ActionExpr(CodeExpr): grammar like expr <- firstPart:[0-9]+ { fmt.Println(firstPart) } secondPart:[a-z]+ { fmt.Println(firstPart, secondPart) } is allowed for this fork.
- You can access parser in ActionExpr: expr <- { fmt.Println(p) }
- stateCodeExpr(#{}) was removed.
position of generated code is removed
- It produced a lot of different for version control.
- You can keep it by set SetRulePos to true and rebuild.
Added -optimize-ref-expr-by-index option
- An option to tweak RefExpr the most usually used expr in parser.
- About ~10% faster with this option.
Removed -support-left-recursion option
- It's not used much, so I removed it to make maintenance easier
Removed -optimize-grammar option
- There are bugs present and the effects are not significant.
Removed -optimize-basic-latin option
- Because there is no evidence to suggest that this is an optimization
charClassMatcher / anyMatcher / litMatcher not return byte anymore, because of performance.
- Use string capture or c.text instead.

Installation

go install github.com/fy0/pigeon@latest

This will install or update the package, and the pigeon command will be installed in your $GOBIN directory. Neither this package nor the parsers generated by this command require any third-party dependency, unless such a dependency is used in the code blocks of the grammar.

Basic usage

pigeon [options] [PEG_GRAMMAR_FILE]

By default, the input grammar is read from stdin and the generated code is printed to stdout. You may save it in a file using the -o flag.

Github user @mna created the original package in April 2015, and @breml is the original package's maintainer as of May 2017.

Example

Given the following grammar:

{
//nolint:unreachable
package main

type ParserCustomData struct {
}

var ops = map[string]func(int, int) int {
    "+": func(l, r int) int {
        return l + r
    },
    "-": func(l, r int) int {
        return l - r
    },
    "*": func(l, r int) int {
        return l * r
    },
    "/": func(l, r int) int {
        return l / r
    },
}

func toAnySlice(v any) []any {
    if v == nil {
        return nil
    }
    return v.([]any)
}

func eval(first, rest any) int {
    l := first.(int)
    restSl := toAnySlice(rest)
    for _, v := range restSl {
        restExpr := toAnySlice(v)
        r := restExpr[1].(int)
        op := restExpr[0].(string)
        l = ops[op](l, r)
    }
    return l
}
}

Input <- expr:Expr EOF {
    return expr
}

Expr <- _ first:Term rest:( _ op:AddOp _ r:Term { return []any{op, r} })* _ {
    return eval(first, rest)
}

Term <- first:Factor rest:( _ op:MulOp _ r:Factor { return []any{op, r} })* {
    return eval(first, rest)
}

Factor <- '(' expr:Expr ')' {
    return expr
} / integer:Integer {
    return integer
}

AddOp <- ( '+' / '-' ) {
    return string(c.text)
}

MulOp <- ( '*' / '/' ) {
    return string(c.text)
}

Integer <- '-'? [0-9]+ {
    v, err := strconv.Atoi(string(c.text))
    if err != nil {
        p.addErr(err)
    }
    return v
}

_ "whitespace" <- [ \n\t\r]*

EOF <- !.

The generated parser can parse simple arithmetic operations, e.g.:

18 + 3 - 27 * (-18 / -3)

=> -141

More examples can be found in the examples/ subdirectory.

See the package documentation for detailed usage.

Contributing

See the CONTRIBUTING.md file.

License

The BSD 3-Clause license. See the LICENSE file.

TODO

~~performance: Create another version of parseOneOrMoreExpr/parseZeroOrMoreExpr which not collect results. Choose expr decide by is labeled, A bit faster.~~
~~performance: Remove pushV and popV, a bit faster.~~
~~performance: In parseCharClassMatcher, variable start can be removed in most case. Lot of of small memory pieces allocated.~~
~~performance: Remove Wrap function if they are not needed.~~
performance: Too many any, can we remove parseExpr?
string capture inside predicate expr not work: &( alist:<("a")*> &{ fmt.Println(alist) } )
auto remove return nil if unreachable

Name		Name	Last commit message	Last commit date
Latest commit History 324 Commits
.github		.github
ast		ast
bootstrap		bootstrap
builder		builder
examples		examples
grammar		grammar
misc		misc
test		test
testutils		testutils
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
.goreleaser.yml		.goreleaser.yml
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
THIRD-PARTY-NOTICES		THIRD-PARTY-NOTICES
benchparse_test.go		benchparse_test.go
cmpast_test.go		cmpast_test.go
doc.go		doc.go
go.mod		go.mod
go.sum		go.sum
main.go		main.go
main_test.go		main_test.go
parse_test.go		parse_test.go
pigeon.go		pigeon.go
reserved_words.go		reserved_words.go
targeted_test.go		targeted_test.go
unicode_classes.go		unicode_classes.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

pigeon - a PEG parser generator for Go

Features for this fork

Installation

Basic usage

Example

Contributing

License

TODO

About

Uh oh!

Releases 1

Packages

Uh oh!

Languages

License

fy0/pigeon

Folders and files

Latest commit

History

Repository files navigation

pigeon - a PEG parser generator for Go

Features for this fork

Installation

Basic usage

Example

Contributing

License

TODO

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Languages

Packages