Thanks to visit codestin.com
Credit goes to Github.com

Skip to content

Conversation

@morr0ne
Copy link
Member

@morr0ne morr0ne commented Jun 15, 2025

This PR is a major rewrite of the parser that comes with many improvements and changes. At its core, the main change is the underlying technology.

Instead of using a Pest grammar which is hard to maintain and read, the new parser separates the steps into a proper lexer using Logos and a simple but very efficient parser using LALRPOP. This gets turned into an AST which also allows for more efficient and streamlined implementation of the deserializer.

The parser is largely backwards compatible except where the old parser was wrong. The new parser should be more correct, and some tests have been updated to reflect that. The value enum has also been updated to be more ergonomic and easier to use.

There are a few breaking changes that improve the syntax and were agreed upon beforehand:

  • Stricter parsing of unquoted keys, most notably internal quotes are no longer allowed
  • Less compact syntax, stuff like truefalse won't work anymore
  • String interpolation now uses a syntax similar to the one used in Rust: "$var""${var}"
  • Unicode escapes also use a new syntax that allows for more Unicode planes to be used: "\u{FFFD}"

Enhanced Integer Parsing Support:

  • Added octal integer support: New 0o prefix syntax (e.g., 0o755 → 493, 0o123 → 83)
  • Added binary integer support: New 0b prefix syntax (e.g., 0b1010 → 10, 0b1111 → 15)
  • All integer formats support:
    • Negative values: -0xABC, -0o755, -0b1100
    • Underscore separators for readability: 1_000_000, 0xFA_FA_FA, 0o12_34_56, 0b1010_1100
Format Prefix Example Result
Decimal none 42, 1_000_000 42, 1000000
Hexadecimal 0x 0xFF, 0xFA_FA 255, 64250
Octal 0o 0o755, 0o12_34 493, 668
Binary 0b 0b1010, 0b10_11 10, 11

There might be other changes that are more of a result of the older parser not properly respecting the spec at times.

The changes should address all the concerns highlighted in #42.

There are other changes planned and more extensive error handling which will be implemented in other PRs. The new code should allow much easier extensibility.

Lua is fully working, the CLI has been updated too. WASM support is now fully working and fixed.

Overall, the new code is more readable, maintainable, correct, more efficient and possibly even faster, although that wasn't the goal of this rewrite.

Resolves #42

@JakeStanger
Copy link
Collaborator

Does this superseed #41 btw? If so we can close that

@morr0ne
Copy link
Member Author

morr0ne commented Jun 22, 2025

Does this superseed #41 btw? If so we can close that

Not necessarly, this is a different approach to the rewrite. I still think there could be value in a streaming parser, especially for big configs. I am gonna rebase that one and possibly work on it in the future once this is merged

@morr0ne morr0ne marked this pull request as ready for review June 22, 2025 23:13
@morr0ne morr0ne marked this pull request as draft June 23, 2025 08:38
Copy link
Collaborator

@JakeStanger JakeStanger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall impl looks great, very clean code. A few areas that do need attention, but they are all minor stuff really

@morr0ne morr0ne requested a review from JakeStanger June 24, 2025 14:46
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll need to update the spec to include the additional features before merge. Obvs gonna rework that at some point, but for now changes can just go on a v0.11 branch.

(Same is true of the interpolation/unicode syntax changes)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have a dedicated spec repository or include the spec in here? Because the current one lives in the website afaik.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reckon its own repo. While I want to keep this the reference implementation I think it makes sense to have the spec elsewhere, mainly so it can be managed separately (ie issues/PRs are directly against spec only)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

@JakeStanger JakeStanger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code all LGTM as it stands. Will await the final changes and then fully re-review.

} in {
foo = "$greeting, $subject"
foo = "${greeting}, ${subject}"
bar = "\$escaped"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Escaping should work the same way but with the new syntax:

Suggested change
bar = "\$escaped"
bar = "\${escaped}"

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally, how do we want to handle "partially formed" interpolations, eg:

{ value = "${oopsNoClosingBrace" } 

(or no opening brace, etc)

Is that an error, or treated as a string literal?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

Issues with the full specification and pest code

2 participants