Ouroboros is a modern EBNF parser generator for Python with a strongly-typed design and multiple usage modes: you can use it as a small parsing framework (handwritten rules), build parsers at runtime from composable Config objects, or generate a standalone Python parser module.
- Multiple modes: the same grammar configuration can be executed at runtime (builder) or compiled into a Python source file (codegen).
- Strongly-typed AST shape: tags, named fields, and sequence structure are made explicit (especially in generated code), helping IDEs and type checkers.
- Regex-based terminals: leaf matching is powered by
re.Pattern, which is great for expressing token-like primitives.
Use this when you want fully handwritten rules / node structures, but still want a consistent buffer abstraction, backtracking interface, and error reporting.
- Key types:
TextSource,ParsingContext,ABCRuleType. - Contract:
rule.parse_impl(ctx, idx) -> tuple[Node, int]; raisingPatternMismatchErrorsignals failure and enables callers to backtrack. - External API:
ctx.parse_rule(rule, idx) -> tuple[Node, int].
Use this when you want to assemble a grammar quickly using combinators (union / sequence / repeat / option / pick / named sequence, etc.) and parse inputs immediately at runtime.
The core API is RuleRegistry:
add_rule(name, handle)registers a rule; the handle can be aConfigor a string reference (indirect rule).fetch_rule(name)returns a recursively referencable rule instance (internally uses a placeholder + cache to support recursion).
Minimal example (structure only; see ouroboros.implements for the full config set):
from ouroboros.core import TextSource, ParsingContext, PatternMismatchError
from ouroboros.builder import RuleRegistry
from ouroboros.implements import TerminalConfig, SequenceConfig
registry = RuleRegistry()
registry.add_rule("Number", TerminalConfig(r"[0-9]+"))
registry.add_rule("Pair", SequenceConfig(["Number", TerminalConfig(r":"), "Number"]))
rule = registry.fetch_rule("Pair")
source = TextSource(name="demo", text="12:34")
ctx = ParsingContext(source)
try:
node, idx = ctx.parse_rule(rule, 0)
# node.contents matches SequenceConfig structure
a, _, b = node.contents
assert a.string == "12"
assert b.string == "34"
except PatternMismatchError as e:
print(e)Use this when you want:
- A distributable, auditable, pure-Python parser file.
- Deployment without the builder/config machinery.
- A clearer, more explicit type structure for your AST in code.
The core API is ParserGenerator:
add_rule(name, handle)adds a rule.gen_parser_file(file)writes out Python source.
The CLI is essentially this path packaged: parse an EBNF config file into Config objects, then emit a Python module via ParserGenerator.
Ouroboros comes with a CLI tool, ouroboros, which supports three main commands: gen, run, and parse.
Compile an EBNF grammar into a standalone Python parser module.
ouroboros gen --grammar grammar.obnf --parser parser.py
# Short form:
# ouroboros gen -g grammar.obnf -p parser.pyWhat it does:
- Reads the input EBNF grammar file.
- Parses it into a set of rule configurations (
Config). - Generates a Python source file containing strongly-typed nodes and parsing rules.
Execute a previously generated parser module on an input text file.
ouroboros run --parser parser.py --entry StartRule --text input.txt
# Short form:
# ouroboros run -p parser.py -e StartRule -t input.txtWhat it does:
- Loads the generated Python module (
parser.py). - Retrieves the rule object for
StartRule. - Parses the input text file and prints the resulting AST.
Build and run a parser at runtime without an intermediate file.
ouroboros parse --grammar grammar.obnf --entry StartRule --text input.txt
# Short form:
# ouroboros parse -g grammar.obnf -e StartRule -t input.txtWhat it does:
- Reads the input EBNF grammar file.
- Parses it into a set of rule configurations.
- Builds a runtime parser from the configs.
- Parses the input text using the specified entry rule.
Ouroboros grammars ultimately compile down to a small set of combinators:
Terminal(/regex/): regex terminal.Union: tagged alternatives (nice for typing).Sequence/FilteredSequence/NamedSequence: positional vs named-field sequences.Option: optional.Repeat/SeparatedRepeat: repetition and separator-based repetition.Pick: wrap with prefix/suffix but return only the chosen subrule (useful for discarding parentheses/delimiters).
On failure, ParsingContext tracks the farthest failure position and the set of expected regex patterns at that point; you can turn that into a line/column exception to report helpful errors to users.
- Good fit: config-driven mini-languages, data format parsers, DSL/interpreter front-ends, and projects that benefit from a typed AST.
- Limits: this is a backtracking parsing model (not packrat); highly ambiguous grammars may cause performance issues. Prefer left-factoring and reducing ambiguity/backtracking.