use `PUSH_LITERAL` to simplify grammar #285

yshavit · 2025-03-30T07:57:03Z

Use pest's new (as of 2.8.0) PUSH_LITERAL. This reduces the number of rules we need in the grammar.

One complication is that the string rule now needs to end with DROP. So we could either keep the string as only non-empty and then add the DROP after each string?, or we could have it include empty. The latter is simpler, but means we need to then add a check for explicit * in the table matcher.

coderabbitai · 2025-03-30T08:11:02Z

📝 Walkthrough

Walkthrough

The changes update dependency versions in the Cargo configuration from version 2.7 to 2.8 for both pest and pest_derive. Modifications in the matcher module adjust how the Matcher::Any variant is handled, now accepting an associated value, and alter related test expectations. The grammar file has been refined by replacing indirect string handling with explicit PUSH_LITERAL operations and by removing several old rules, while adding new helper rules for unit testing. In the query module, string parsing rules have been streamlined and the StringVariant enum has been updated to remove some variants and include a new AngleBracket variant. The selectors module introduces a new AnyVariant enum to differentiate between implicit and explicit matches. The strings module now tracks explicit wildcard usage, and error reporting in the API shows improved string conversion details. Additionally, test cases have been updated with revised error messages and position indications.

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c4ae8a4 and 68fb32c.

⛔ Files ignored due to path filters (1)

Cargo.lock is excluded by !**/*.lock

📒 Files selected for processing (9)

Cargo.toml (1 hunks)
src/matcher.rs (10 hunks)
src/query/grammar.pest (2 hunks)
src/query/query.rs (3 hunks)
src/query/selectors.rs (31 hunks)
src/query/strings.rs (11 hunks)
src/select/api.rs (2 hunks)
tests/md_cases/bad_queries.toml (2 hunks)
tests/md_cases/select_html.toml (1 hunks)

🧰 Additional context used

🧬 Code Definitions (1)

src/query/selectors.rs (5)

src/matcher.rs (2)

from (70-87)

from (364-368)

src/select/sel_code_block.rs (1)

from (13-18)

src/select/sel_link_like.rs (3)

from (12-14)

from (30-32)

from (48-53)

src/select/sel_section.rs (1)

from (12-14)

src/select/sel_table.rs (1)

from (31-36)

🔇 Additional comments (78)

Cargo.toml (1)

16-17: Dependency version update for pest and pest_derive looks appropriate.

The update to pest 2.8 aligns with your PR objectives to use the new PUSH_LITERAL feature. This is a targeted change that should help simplify your grammar as planned.

tests/md_cases/select_html.toml (1)

62-62: Error message improvement provides better guidance to users.

The updated error message now provides more specific options for what's expected: end of input, "*", unquoted string, regex, quoted string, or "^". This makes it clearer what valid inputs are possible at this location.

src/select/api.rs (2)

15-16: Good addition of necessary imports for error handling.

These imports from the pest crate support the new error formatting functionality.

29-47: Well-implemented error formatting method.

The new to_string method provides improved error reporting by converting both Pest and Other error variants to readable string representations. For the Other variant, it nicely leverages the pest error formatting capabilities when possible.

tests/md_cases/bad_queries.toml (2)

171-176: Error message and position have been updated appropriately.

The error position indicator has been adjusted from position 4 to 3, and the message now provides more specific guidance about expected inputs.

226-228: More specific error guidance for table column matcher.

The error message now explicitly states that table column matchers cannot be empty and instructs users to use an explicit "*". This is much clearer than the previous message.

src/matcher.rs (15)

85-85: Properly handling the updated Matcher::Any variant with parameter

The change accommodates the new AnyVariant parameter in Matcher::Any, properly ignoring it with the underscore pattern while still calling the same implementation. This ensures backward compatibility with existing code while supporting the new wildcard distinction.

130-135: Good adaptation of test case for StringVariant::AngleBracket

Test has been updated to use the new StringVariant::AngleBracket enum variant, maintaining test coverage for the new string handling mechanism.

148-148: Fixed whitespace handling in bareword_anchor_end test

The updated test correctly reflects the improved whitespace handling in the parser, removing unnecessary trailing space.

156-157: Properly aligned expectations for starting anchor test

The test expectations have been correctly adjusted to accommodate the updated grammar handling of anchors.

162-163: Properly aligned expectations for ending anchor test

The test expectations have been correctly adjusted to accommodate the updated grammar handling of anchors.

168-168: Updated test for both anchors to reflect new parsing behavior

The test has been properly updated to match the new grammar's handling of combined anchors.

173-173: Correctly handling whitespace between anchors

The updated test properly reflects the new grammar's improved handling of whitespace between anchors.

216-216: Properly handling multiple whitespace in anchored strings

The test update correctly reflects the improved whitespace handling in the updated grammar.

228-233: Updated test for StringVariant::AngleBracket in bareword_end_delimiters

The test has been properly updated to use the new AngleBracket variant, ensuring correct testing of the delimiter handling.

237-238: Properly updated AngleBracket handling in dollar sign test

The test correctly uses the new AngleBracket enum variant while maintaining the same test logic.

274-282: Replaced error expectation with empty expectation

Good switch from expecting errors to expecting empty results for invalid quoted strings, which aligns with the new grammar behavior.

300-300: Updated error handling for unclosed regex

The test now correctly uses expect_empty instead of expecting an error, reflecting the new grammar's behavior.

312-313: Added tests for the wildcard with space and AngleBracket variant

Good addition of test cases to cover the new wildcard behavior with spaces and with the new AngleBracket variant.

322-328: Improved error handling in parse_and_check_with

Good improvement in error reporting when parsing fails, now printing the error message before panicking.

339-341: Added helpful expect_empty utility function

Good addition of a utility function to check for empty parse results, which simplifies test code and improves readability.

src/query/selectors.rs (33)

20-27: Good enhancement of Matcher with AnyVariant enum

The enhancement to Matcher::Any with the AnyVariant enum nicely distinguishes between implicit and explicit wildcard matches. This is an elegant solution to the problem described in the PR, allowing different handling for explicit * versus implied empty matchers.

47-47: Updated PartialEq implementation for Matcher::Any

The PartialEq implementation has been properly updated to handle the new AnyVariant parameter in Matcher::Any, correctly ignoring the specific variant when comparing equality.

57-69: Well-implemented logic for handling explicit vs implicit wildcards

The implementation nicely differentiates between explicit and implicit wildcards based on the new explicit_wildcard field, maintaining backward compatibility while adding the new functionality.

228-239: Good enhancement for table column matcher validation

This change improves usability by requiring an explicit wildcard for table column matchers rather than accepting an implicit empty matcher. The error message clearly explains what the user needs to do, enhancing the user experience.

331-332: Updated test expectation for wildcard handling

The test has been properly updated to use AnyVariant::Implicit in the section prefix_chaining test.

336-337: Updated test expectation for suffix_chaining

The test has been properly updated to use AnyVariant::Implicit in the section suffix_chaining test.

344-348: Updated chaining test expectations

The tests for chaining have been properly updated to use AnyVariant::Implicit in both the section and link matchers.

358-362: Updated empty_intermediate_chains test

The test has been properly updated to use AnyVariant::Implicit in both the section and link matchers.

373-373: Updated section_no_matcher test

The test has been properly updated to use AnyVariant::Implicit in the section matcher.

393-393: Updated unordered_list_item_no_matcher test

The test has been properly updated to use AnyVariant::Implicit in the matcher.

431-431: Updated ordered_list_item_no_matcher test

The test has been properly updated to use AnyVariant::Implicit in the matcher.

467-467: Updated unordered_unchecked_task test

The test has been properly updated to use AnyVariant::Implicit in the matcher.

479-479: Updated unordered_checked_task test

The test has been properly updated to use AnyVariant::Implicit in the matcher.

491-491: Updated unordered_either_task test

The test has been properly updated to use AnyVariant::Implicit in the matcher.

543-544: Updated link_no_matchers test

The test has been properly updated to use AnyVariant::Implicit in both the display and URL matchers.

555-555: Updated link_with_display test

The test has been properly updated to use AnyVariant::Implicit in the URL matcher.

565-565: Updated link_with_url test

The test has been properly updated to use AnyVariant::Implicit in the display matcher.

587-588: Updated image_no_matchers test

The test has been properly updated to use AnyVariant::Implicit in both the display and URL matchers.

599-599: Updated image_with_alt test

The test has been properly updated to use AnyVariant::Implicit in the URL matcher.

610-610: Updated block_quote_no_matcher test

The test has been properly updated to use AnyVariant::Implicit in the block quote matcher.

638-639: Updated code_block_no_matchers test

The test has been properly updated to use AnyVariant::Implicit in both the language and contents matchers.

650-650: Updated code_block_with_only_language test

The test has been properly updated to use AnyVariant::Implicit in the contents matcher.

661-661: Updated code_block_with_only_language_and_trailing_space test

The test has been properly updated to use AnyVariant::Implicit in the contents matcher.

671-671: Updated code_block_with_only_content test

The test has been properly updated to use AnyVariant::Implicit in the language matcher.

695-695: Updated html_no_matcher test

The test has been properly updated to use AnyVariant::Implicit in the HTML matcher.

719-719: Updated expected error message in html_with_unquoted_text test

The error message expectation has been updated to include the new * pattern option in the list of expected tokens.

737-737: Updated paragraph_no_matcher test

The test has been properly updated to use AnyVariant::Implicit in the paragraph matcher.

766-766: Improved error message position indicator

The error message position indicator (caret) is now correctly positioned to clearly identify the error location.

768-768: Improved table column error message

The error message has been improved to clearly explain that an explicit wildcard (*) is required for table column matchers.

791-792: Updated table_asterisk_column test

The test has been properly updated to use AnyVariant::Explicit for the column matcher and AnyVariant::Implicit for the row matcher.

803-803: Updated table_with_column test

The test has been properly updated to use AnyVariant::Implicit in the row matcher.

813-813: Updated table_with_row test

The test has been properly updated to use AnyVariant::Explicit in the column matcher.

873-887: Enhanced error message handling in expect_parse_error

The error handling code has been enhanced to properly format error messages from both pest errors and custom errors, improving test diagnostics.

src/query/strings.rs (12)

12-12: Good addition of explicit_wildcard flag to ParsedString

This addition enables tracking whether a wildcard was explicitly specified with * or implicitly through an empty string, which is key to the enhanced wildcard handling.

41-41: Enhanced Debug implementation for ParsedString

The Debug implementation now clearly distinguishes between explicit and implicit wildcards, improving debugging and error messages.

81-81: Default explicit_wildcard to false when initializing ParsedString

Properly initializes the new field to false by default, maintaining backward compatibility.

92-94: Good handling of asterisk rule

The code properly sets the explicit_wildcard flag when an asterisk is encountered during parsing, enabling the distinction between explicit and implicit wildcards.

205-211: Updated unquoted_string_to_angle_bracket test

The test has been renamed and updated to use StringVariant::AngleBracket, with appropriate adjustments to the input and expected output.

236-238: Added asterisk test

Good addition of a test for the asterisk wildcard, verifying both parsing and the is_equivalent_to_asterisk function.

240-248: Improved empty string test

The test for empty strings has been enhanced to also verify the is_equivalent_to_asterisk function returns true for empty strings.

277-277: Fixed expected output in anchors_unquoted_to_pipe_with_space test

The test now correctly expects the remainder without leading space, reflecting the improved handling of whitespace in the updated grammar.

287-287: Fixed expected output in anchors_unquoted_to_pipe_no_space test

The test now correctly expects the remainder without leading space, reflecting the improved handling of whitespace in the updated grammar.

331-332: Added explicit_wildcard to parsed_text helper function

The helper function has been updated to include the new explicit_wildcard field, initialized to false for backward compatibility.

341-342: Added explicit_wildcard to parsed_regex helper function

The helper function has been updated to include the new explicit_wildcard field, initialized to false for backward compatibility.

345-352: Added parsed_wildcard helper function

Good addition of a helper function to create a ParsedString with explicit_wildcard set to true, simplifying test code.

src/query/query.rs (4)

49-54: Good consolidation of string rules

The string rules have been nicely consolidated, simplifying the grammar by using the new helper rules and properly handling regex.

59-59: Added asterisk rule renaming

Good addition of the asterisk rule renaming to provide clearer error messages.

80-81: Simplified StringVariant enum

The StringVariant enum has been simplified by removing unused variants and adding the new AngleBracket variant, which aligns with the grammar simplification goal.

98-99: Updated as_rule method for StringVariant

The as_rule method has been properly updated to map the new AngleBracket variant to the appropriate test helper rule.

src/query/grammar.pest (8)

21-21: Good use of PUSH_LITERAL for delimiters

The change from previous string handling rules to using PUSH_LITERAL directly simplifies the grammar and follows the PR objectives to leverage this new Pest 2.8.0 feature.

24-24: Consistent application of PUSH_LITERAL pattern across selectors

The consistent approach of using PUSH_LITERAL to push delimiters onto the stack across all selector types shows good attention to detail and maintains a uniform pattern throughout the grammar.

Also applies to: 33-33, 37-37, 40-40, 43-43, 46-46

41-41: Explicit space handling in code blocks

Explicitly pushing a space character before the language reference ensures proper parsing of code block language specifications.

49-49: Table-specific delimiter handling

The table selector now correctly uses PUSH_LITERAL for both colon and pipe delimiters, maintaining consistency with other selectors while handling the unique structure of table matching.

52-54: Helper rules for unit testing

These new helper rules appear to be specifically designed for unit testing purposes. The naming convention clearly indicates they're not intended for general use within the grammar.

Would the unit tests that use these rules benefit from additional documentation or comments to explain their purpose?

55-64: Streamlined string rule with DROP operation

The consolidation of string parsing with the DROP operation aligns with the PR objectives. This implementation allows for empty strings, which was mentioned as one approach in the PR description.

As noted in the PR objectives, allowing empty strings in the string rule may require an additional check for explicit * in the table matcher. Has this check been implemented in the corresponding code that processes these parse results?

65-65: Explicit asterisk handling

Adding a dedicated rule for asterisk matching supports the special handling needed for wildcard operations, which is important for the checks mentioned in the PR objectives.

66-66: Improved unquoted string definition

The unquoted string rule now properly checks for both the pushed delimiter (via PEEK) and the end anchor, ensuring correct parsing boundaries.

✨ Finishing Touches

📝 Generate Docstrings

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai plan to trigger planning for file edits and PR creation.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

yshavit added 2 commits March 30, 2025 03:54

use PUSH_LITERAL to simplify grammar

1109d29

use pipe in more places

68fb32c

yshavit added the ai-review Tag a PR with this to get an AI review. label Mar 30, 2025

yshavit merged commit 31b0298 into main Mar 30, 2025
8 checks passed

yshavit deleted the pest_with_push_literal branch March 30, 2025 08:12

BrewTestBot mentioned this pull request Mar 31, 2025

mdq 0.5.0 Homebrew/homebrew-core#217365

Merged

coderabbitai bot mentioned this pull request May 11, 2025

add support for front matter #369

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

use `PUSH_LITERAL` to simplify grammar #285

use `PUSH_LITERAL` to simplify grammar #285

Uh oh!

yshavit commented Mar 30, 2025 •

edited

Loading

Uh oh!

coderabbitai bot commented Mar 30, 2025

Walkthrough

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

use PUSH_LITERAL to simplify grammar #285

use PUSH_LITERAL to simplify grammar #285

Uh oh!

Conversation

yshavit commented Mar 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot commented Mar 30, 2025

Walkthrough

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

use `PUSH_LITERAL` to simplify grammar #285

use `PUSH_LITERAL` to simplify grammar #285

yshavit commented Mar 30, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)