-
-
Notifications
You must be signed in to change notification settings - Fork 760
perf(parse/tailwind): use compact trie for lexing base names instead of linear search #7977
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
This stack of pull requests is managed by Graphite. Learn more about stacking. |
Do you have any benchmarks to share? I don't think we have any benches in codspeed |
|
See the previous PR in this stack: #7976 let me know if i need to do anything for those benches to show up in codspeed |
13c964b to
deec1b9
Compare
They won't show up until the benchmark PR is against main or next. So I think for the benchmarks, we shouldn't use stacked PRs |
deec1b9 to
3bd8aab
Compare
0253184 to
e37d41b
Compare
1d6d214 to
ac3fc64
Compare
3bd8aab to
6646e5c
Compare
WalkthroughAdds a trie-based dashed basename store in a new Possibly related PRs
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro 📒 Files selected for processing (2)
🧰 Additional context used🧠 Learnings (10)📚 Learning: 2025-10-15T09:24:31.042ZApplied to files:
📚 Learning: 2025-10-15T09:24:31.042ZApplied to files:
📚 Learning: 2025-10-15T09:23:33.055ZApplied to files:
📚 Learning: 2025-10-15T09:24:31.042ZApplied to files:
📚 Learning: 2025-10-15T09:22:15.851ZApplied to files:
📚 Learning: 2025-10-15T09:25:05.698ZApplied to files:
📚 Learning: 2025-10-15T09:23:33.055ZApplied to files:
📚 Learning: 2025-10-15T09:22:46.002ZApplied to files:
📚 Learning: 2025-10-25T07:22:18.540ZApplied to files:
📚 Learning: 2025-10-15T09:24:31.042ZApplied to files:
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
🔇 Additional comments (7)
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
crates/biome_tailwind_parser/src/lexer/base_name_store.rs (3)
28-47: Trie construction looks sound.The construction logic is straightforward: insert all basenames, then normalise children by sorting and deduplicating. The comment on line 37 acknowledges that binary search could be enabled by the sorting—worth considering if profiling shows
find_childas a bottleneck.If future profiling reveals that child lookup is a hotspot, consider replacing the linear search in
find_child()with a binary search, since children are already sorted:fn find_child(&self, node: usize, byte: u8) -> Option<usize> { self.nodes[node] .children .binary_search_by_key(&byte, |(b, _)| *b) .ok() .map(|idx| self.nodes[node].children[idx].1) }
110-153: Matching logic is sound but relies on caller guarantees.The longest-prefix matching with boundary validation is well-implemented. However, the fallback logic (lines 144-152) will return 0 if the input is empty or starts with a delimiter/dash. Given that
consume_base()only calls this when the first byte is alphanumeric, this should be safe—but the invariant is implicit.Consider adding a debug assertion at the start of
base_end()to document and verify this precondition:pub(crate) fn base_end(&self) -> usize { + debug_assert!(!self.text.is_empty() && self.text[0].is_ascii_alphanumeric(), + "base_end() expects non-empty input starting with an alphanumeric byte"); let mut node_idx = 0usize;This would catch misuse in debug builds without runtime cost in release builds.
174-205: Tests cover the main scenarios effectively.The tests verify longest-prefix matching, delimiter respect, and trie traversal boundaries. Coverage is solid for the expected use cases.
For additional confidence, consider adding edge-case tests:
#[test] fn handles_single_char_basename() { let store = BaseNameStore::new(&["p", "m"]); assert_eq!(store.match_base_end(b"p-4"), 1); assert_eq!(store.match_base_end(b"m-auto"), 1); } #[test] fn handles_no_match_basename() { let store = BaseNameStore::new(&["border"]); // Should fall back to "foo" when "foo" isn't in trie assert_eq!(store.match_base_end(b"foo-bar"), "foo".len()); }
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
crates/biome_tailwind_parser/src/lexer/base_name_store.rs(1 hunks)crates/biome_tailwind_parser/src/lexer/mod.rs(2 hunks)
🧰 Additional context used
🧠 Learnings (10)
📚 Learning: 2025-10-15T09:24:31.042Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-10-15T09:24:31.042Z
Learning: Applies to crates/biome_parser/crates/**/src/lexer/mod.rs : Create a lexer module at crates/<parser_crate>/src/lexer/mod.rs
Applied to files:
crates/biome_tailwind_parser/src/lexer/mod.rs
📚 Learning: 2025-10-15T09:24:31.042Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-10-15T09:24:31.042Z
Learning: Applies to crates/biome_parser/crates/biome_*_{syntax,factory}/** : Create per-language crates biome_<lang>_syntax and biome_<lang>_factory under crates/
Applied to files:
crates/biome_tailwind_parser/src/lexer/mod.rs
📚 Learning: 2025-10-15T09:22:15.851Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_formatter/CONTRIBUTING.md:0-0
Timestamp: 2025-10-15T09:22:15.851Z
Learning: Applies to crates/biome_formatter/src/**/*.rs : After generation, remove usages of `format_verbatim_node` and implement real formatting with biome_formatter utilities
Applied to files:
crates/biome_tailwind_parser/src/lexer/mod.rs
📚 Learning: 2025-10-15T09:24:31.042Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-10-15T09:24:31.042Z
Learning: Lexer must implement the biome_parser::Lexer trait
Applied to files:
crates/biome_tailwind_parser/src/lexer/mod.rs
📚 Learning: 2025-10-15T09:23:33.055Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_js_type_info/CONTRIBUTING.md:0-0
Timestamp: 2025-10-15T09:23:33.055Z
Learning: Applies to crates/biome_js_type_info/src/{type_info,local_inference,resolver,flattening}.rs : Avoid recursive type structures and cross-module Arcs; represent links between types using TypeReference and TypeData::Reference.
Applied to files:
crates/biome_tailwind_parser/src/lexer/mod.rs
📚 Learning: 2025-10-15T09:25:05.698Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_service/CONTRIBUTING.md:0-0
Timestamp: 2025-10-15T09:25:05.698Z
Learning: Applies to crates/biome_service/../biome_lsp/src/server.tests.rs : Keep end-to-end LSP tests in ../biome_lsp/src/server.tests.rs
Applied to files:
crates/biome_tailwind_parser/src/lexer/mod.rs
📚 Learning: 2025-10-15T09:22:46.002Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_js_formatter/CONTRIBUTING.md:0-0
Timestamp: 2025-10-15T09:22:46.002Z
Learning: Applies to crates/biome_js_formatter/**/Cargo.toml : Declare the dependency `biome_js_formatter = { version = "0.0.1", path = "../biome_js_formatter" }` for internal installation
Applied to files:
crates/biome_tailwind_parser/src/lexer/mod.rs
📚 Learning: 2025-10-15T09:22:46.002Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_js_formatter/CONTRIBUTING.md:0-0
Timestamp: 2025-10-15T09:22:46.002Z
Learning: Applies to crates/biome_js_formatter/**/*.rs : Import and use the `FormatNode` trait for AST nodes
Applied to files:
crates/biome_tailwind_parser/src/lexer/mod.rs
📚 Learning: 2025-10-25T07:22:18.540Z
Learnt from: ematipico
Repo: biomejs/biome PR: 7852
File: crates/biome_css_parser/src/syntax/property/mod.rs:161-168
Timestamp: 2025-10-25T07:22:18.540Z
Learning: In the Biome CSS parser, lexer token emission should not be gated behind parser options like `is_tailwind_directives_enabled()`. The lexer must emit correct tokens regardless of parser options to enable accurate diagnostics and error messages when the syntax is used incorrectly.
Applied to files:
crates/biome_tailwind_parser/src/lexer/mod.rs
📚 Learning: 2025-10-15T09:24:31.042Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-10-15T09:24:31.042Z
Learning: If lookahead is needed, wrap the lexer with BufferedLexer and implement TokenSourceWithBufferedLexer and LexerWithCheckpoint
Applied to files:
crates/biome_tailwind_parser/src/lexer/mod.rs
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
- GitHub Check: Test (depot-ubuntu-24.04-arm-16)
- GitHub Check: Test (depot-windows-2022-16)
- GitHub Check: Documentation
- GitHub Check: Check Dependencies
- GitHub Check: Lint project (depot-windows-2022)
- GitHub Check: Test Node.js API
- GitHub Check: Bench (biome_tailwind_parser)
- GitHub Check: autofix
🔇 Additional comments (6)
crates/biome_tailwind_parser/src/lexer/mod.rs (2)
1-4: Clean integration of the new module.The module declaration and import are straightforward and follow standard Rust conventions.
133-142: No issues found — code is well-tested and safe.The test suite in
crates/biome_tailwind_parser/src/lexer/base_name_store.rsconfirms thatbase_end()always returns at least 1 byte when consuming an alphanumeric character. All edge cases—including single-character bases and boundary conditions—are covered by existing tests with positive byte lengths. The lexer tests intests.rsfurther validate the parsing for real-world inputs. No infinite loop risk exists.crates/biome_tailwind_parser/src/lexer/base_name_store.rs (4)
1-24: Well-structured trie foundation.The use of
LazyLockfor global initialization and the compact trie representation are appropriate choices for this use case. The node structure with sorted children sets up efficient traversal.
49-80: Trie insertion and traversal logic are correct.Standard trie operations implemented cleanly. The linear search in
find_child()is reasonable given the expected small fan-out, and the comment documents this decision.
82-93: Clean API design.The two-tier API (
matcher()for flexibility andmatch_base_end()for convenience) is well-designed for different use cases.
156-172: Delimiter and boundary logic correctly captures Tailwind syntax.The distinction between
is_delimiter()andis_boundary_byte()is subtle but correct:
!stops scanning (delimiter) but isn't a valid post-basename boundary-isn't a delimiter (can appear in dashed basenames) but is a valid boundaryThis properly handles Tailwind's
!importantmodifier and dashed utility classes.
CodSpeed Performance ReportMerging #7977 will create unknown performance changesComparing Summary
Benchmarks breakdown
Footnotes
|
|
@ematipico no, those are the |
Probably this PR has been rebased before the CI in |
5e80fb0 to
c1bb7f9
Compare
ematipico
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some nit comments. Let's merge it!
c1bb7f9 to
b05b967
Compare
Summary
This optimizes how the tailwind parser searches for base names. Instead of a linear search, it uses a compact trie to prune possible base names.
The trie data structure was mostly generated with AI.
Test Plan
CI remains green, biome_tailwind_parser should see a 2x perf improvement.
Docs