Lil Scan helps you hand-write parsers in Zig with excellent error messages

Hand-writing parsers isn't actually too bad: You write a function for every syntactical element and by using regular code you get a lot of control of error messages and how to process the parsed structure. Lil Scan (the younger sibling of Big Parse) is a library which helps you in this process. It provides a scanner which handles the tokenization step of your parser. One of the main value it provides is the diagnostic system which makes it easy to provide excellent error messages.

Features / non-features

Handles scanning/lexing only:
- Designed to be used together with a hand-written (recursive descent) parser.
- No support for traditional context-free parsing algorithms. LL(k), LR, LALR are all terms which Lil Scan knows nothing about.
Built-in support for common patterns:
- Whitespace, integers, strings.
- UTF-8/Unicode by default, and anything which works on ASCII is clearly labeled as such.
- JSON primitives (planned).
Flexible diagnostic system:
- The scanner returns spans for every token which can be stored for later stages.
Excellent presentation of error message:
- The source is include with arrows pointing to the exact place.
- Deliberate design decisions.
- Automatically detects if stderr is not a TTY and then prints single-line errors.
- Respects NO_COLOR and CLICOLOR_FORCE.
- Tagging during scanning enables syntax highlighted error messages (planned).
Follows best practices::
- Zero allocations.
- Zero dependencies.
- 100% test coverage.
- 100% documentation coverage.
- 0BSD licensed.

Note: There's no promise of active development. The planned features are merely what we think would fit within this project. Open an issue if you need one of the planned features and/or want to help contribute.

Usage

const lil = @import("lil-scan");

// Initialize a scanner.
var s = lil.Scanner.init(" 123 ");

// Skip any whitespace.
try s.skip(lil.whitespaceAscii(s.rest()));

// Parse an integer.
var num: i64 = undefined;
_ = try s.must(
   lil.integerAscii(s.rest(), i64, &num),
   &.{.text = "Expected integer"},
);

// Check if there's a `[`:
if (try s.maybe(lil.slice(s.rest(), "["))) |_| {
    // Start parsing an array.
}

Guide

1: Parse functions

A parse function is a standalone function which parses the prefix of some text. For instance, the integerAscii parser when applied to the text 123 + 456 will return that it successfully parsed 123 (and leaves + 456 behind). Lil Scan intentionally decouples the parse functions from the scanning mechanism: The parse functions are completely stateless functions and you're encouraged to write your own for your specific use case.

More specifically, a parse function is a function which accepts a []const u8 and returns a lil.ParseResult. A parse result is one of:

success: We were able to parse n characters.
failure: We successfully started parsing the text, but then something unexpected appeared in the text.
nothing: There's no match at the beginning.

Lil Scan currently ships with the following parse functions:

Name	Description	Example	Failures
`eof`	Returns success on an empty slice.	`lil.slice(text, "")`	None.
`slice`	Parses an exact match of a slice.	`lil.slice(text, "if")`	None.
`whenAscii`	Parses ASCII text as long as the function returns true.	`lil.slice(text, fn(ch: u8) -> bool)`	None.
`whitespaceAscii`, `digitAscii`, `alphabeticAscii`, `alphanumericAscii`, `upperAscii`, `lowerAscii`, `hexAscii`	Parses ASCII text of a certain category. This uses the functions in `std.ascii`.	`lil.whitespaceAscii(text)`	None.
`whenUtf8`	Parses UTF-8 text as long as the function returns true.	`lil.whenUtf8(text, fn(ch: u21) -> bool)`	None.
`integerAscii`	Parses a simple integer (matching the regex `[+-]?\d+`)	`lil.integerAscii(text, i64, &dest)`	When the integer can't fit in the given type.

2: Scanning

The scanner (lil.Scanner) keeps track of the current location in the text. rest() returns the remaining text to be parsed and there's a set of advance functions which then advances the scanner. The general idea is that you pass rest() into a parse function and then pass the parse result into the advance function. All the advance functions will propagate parse failures upwards and they only differ how the deal with success vs. nothing.

must: This will cause the scanner to fail if the parse result is nothing.
maybe: This returns null if the parse result is nothing.
skip: This doesn't care about success. Typically used to skip whitespace and similar.

A scanner might fail at some point. Any of the advance functions might return error.ParseError and if so it will also set the failure field on the scanner. This contains information about the span and the message which caused the error.

3: Spans

must and maybe both return a span which represents a part of the parsed text. These are lightweight structs, storing indexes, which can be kept around. Another use case is to use Scanner.sliceFromSpan to get the slice of the span and work directly on this during parsing:

const hex_span = try s.must(lil.hexAscii(s.rest()), &.{.text = "Hex expected."});

// `[]const u8` containing the bytes:
const hex = s.sliceFromSpan(hex_span);

4: Messages

A message is something which can be shown to a user. The most common use case is an error message which happens when parsing fails, but Lil Scan's diagnostic system is capable of supporting other types of message as well.

Messages are designed to be static: You define them once as *const lil.Message and pass them around as pointers. There's no way of dynamically creating an error message where you interpolate user values in. In the future Lil Scan might provide ways of attaching additional metadata to messages.

pub const Message = struct {
    severity: Severity = .err,
    text: []const u8,
    code: ?[]const u8 = null,
    url: ?[]const u8 = null,
};

pub const Severity = enum {
    err,
    warn,
    info,
    hint,
};

5: Presentation

A presenter is what presents a message to the user. It's typically initialized from autoDetect which will present errors to stderr and automatically detects whether it should use colors and/or show the messages in "expanded mode".

// Create a presenter:
var buf: [4096]u8 = undefined;
var pres = lil.Presenter.autoDetect(&buf);

// Present a single message:
var s = lil.Scanner.init(source);

parse(s) catch {
   const failure = s.failure.?;
   try pres.present(lil.SingleMessagePresentation{
      .msg = failure.msg,
      .span = failure.span,
      .filename = filename,
      .source = source,
   });
}

At the moment the only presentation implemented is SingleMessagePresentation which shows a single message.

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
.github/workflows		.github/workflows
examples		examples
src		src
.gitignore		.gitignore
DESIGN.adoc		DESIGN.adoc
LICENSE.md		LICENSE.md
README.md		README.md
build.zig		build.zig
build.zig.zon		build.zig.zon

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

Lil Scan helps you hand-write parsers in Zig with excellent error messages

Features / non-features

Usage

Guide

1: Parse functions

2: Scanning

3: Spans

4: Messages

5: Presentation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Uh oh!

License

Uh oh!

judofyr/lil-scan

Folders and files

Latest commit

History

Repository files navigation

Lil Scan helps you hand-write parsers in Zig with excellent error messages

Features / non-features

Usage

Guide

1: Parse functions

2: Scanning

3: Spans

4: Messages

5: Presentation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages