Thanks to visit codestin.com
Credit goes to Github.com

Skip to content

Conversation

@jfmcdowell
Copy link

@jfmcdowell jfmcdowell commented Dec 20, 2025

Note

AI Assistance Disclosure: This PR was developed with assistance from Claude Code.

Summary

This PR completes Phase 1 of the Markdown support implementation as outlined in #3718. It delivers a functional Markdown parser that handles CommonMark-compliant syntax.

Implemented Features

Leaf Blocks:

  • ATX headers (# through ######) with proper trailing hash parsing
  • Fenced code blocks (``` and ~~~) with language info strings
  • Indented code blocks (4+ space indentation)
  • Paragraphs with inline content
  • Thematic breaks (---, ***, ___)

Container Blocks:

  • Block quotes (>)
  • Bullet lists (-, +, and * markers)

Inline Elements:

  • Code spans (`code`)
  • Emphasis/bold (**bold**, __bold__)
  • Italic (*italic*, _italic_)
  • Links ([text](url))
  • Images (![alt](src))
  • Escape sequences (\*, \[, etc.) per CommonMark spec

Parser Infrastructure:

  • Constants for magic numbers (INDENT_CODE_BLOCK_SPACES, MAX_HEADER_HASHES)
  • Lookahead-based trailing hash detection for ATX headers
  • TRIPLE_TILDE token for tilde-fenced code blocks
  • Proper error recovery with position tracking

Deferred to Future Work

  • Ordered lists (1., 2., etc.)
  • Setext headers (underlined with === or ---)
  • HTML blocks
  • Link reference definitions
  • Multi-line list items with proper nesting

Test Plan

  • 29 lexer tests passing
  • 8 spec tests passing:
    • paragraph.md
    • header.md (including trailing hashes)
    • fenced_code_block.md (including tilde fences)
    • indent_code_block.md
    • block_quote.md
    • bullet_list.md
    • inline_elements.md
    • thematic_break_block.md
  • All snapshots updated
  • just f and just l pass

Note

just ready fails due to a pre-existing rustdoc error in biome_html_parser (commit a3a713d) where set_double_text_expression links to private TextExpressionKind::Double. This is unrelated to this PR and will be addressed separately.

Next Steps

Phase 2 (Formatter) and Phase 3 (Service Integration) will follow in separate PRs.

Enhanced the Markdown lexer to recognize additional tokens needed for
block parsing:
- Hash (#) for ATX headers
- Backtick (`) and triple backtick for code fences
- Tilde (~) for fenced code blocks
- Greater than (>) for block quotes
- Star (*), underscore (_), minus (-) for emphasis and lists
- Brackets and parentheses for links
- Bang (!) for images

Implemented paragraph parsing with blank line detection:
- Added parse_paragraph, parse_inline_item_list, parse_textual
- Added trivia position tracking to detect paragraph boundaries
- Modified grammar to make hard_line optional in MdParagraph

Added comprehensive lexer tests for all new token types.
Implemented ATX header parsing for Markdown with support for:
- Header levels 1-6 (# through ######)
- Content parsing within headers
- Empty trailing hash list (trailing hashes included in content for now)

Added has_newline_since method to detect single newlines for header
boundaries, as headers end at a single newline unlike paragraphs which
end at blank lines (2+ newlines).
Add support for parsing fenced code blocks (``` delimiters).

Changes:
- Add parse_fenced_code_block function with fence detection
- Support optional language info string after opening fence
- Handle empty language lines using preceding line break detection
- Simplify MdCodeNameList grammar to use non-separated list
- Use MdInlineItemList for code block content
- Add fenced_code_block.md test file with snapshot
Implement parse_indent_code_block for CommonMark-style indented code
blocks that require 4+ spaces of indentation.

Key changes:
- Fix at_indent_code_block() to use >= 4 instead of > 4 (CommonMark spec)
- Implement parse_indent_code_block() with proper grammar structure
- Use bump_remap(MD_TEXTUAL_LITERAL) to handle all token types as textual
  content within code blocks (fixes BOGUS node issue)
- Simplify MdIndentCodeBlock grammar since indentation is tracked in trivia
- Fix parse_textual() to use bump_remap for syntax factory compatibility

The implementation correctly:
- Detects 4+ spaces of leading whitespace in trivia
- Parses all content as MdTextual nodes within MdInlineItemList
- Ends code blocks at blank lines or non-indented lines
- Preserves all source text in the CST for lossless parsing
Add parsing support for block quotes and bullet lists:

- Block quote parsing: detects `>` at line start, parses content as
  paragraph with quote marker included as textual content
- Bullet list parsing: detects `-` or `*` at line start, parses
  consecutive bullets into MdBulletListItem containing MdBulletList
- Add `at_start_of_input()` helper to detect document start for block
  detection
- Update grammar to remove required space token from MdBullet (whitespace
  is handled as trivia)
- Add test cases for block quotes and bullet lists
Add parsing for inline Markdown elements:
- Inline code spans (`code`)
- Emphasis/bold (**text** and __text__)
- Italic (*text* and _text_)
- Links ([text](url))
- Images (![alt](url))

Also fix bullet list parsing to stop at line breaks, ensuring
each list item contains only single-line content.

Updated the MdInlineImage grammar to match standard Markdown
image syntax (![alt](url)) instead of the linked image format.
- Fix `parse_bullet` to return `ParsedSyntax` instead of `()`
- Add error recovery for list parsing to prevent infinite loops
- Make closing tokens optional in grammar for inline elements
  (matches Markdown's permissive nature for unclosed elements)
- Fix grammar comment to say "Markdown's" instead of "Rust's"
- Fix clippy doc-link-with-quotes warnings

These changes address the critical issues identified in code review
that would have blocked a PR submission.
- Add backslash escape handling in lexer for CommonMark punctuation
- Add image support to nested inline parsing
- Document trailing hash limitation in ATX headers
- Add 3 lexer tests for escape sequences
- Update changeset to reflect escape sequence support
- Fix import order in quote.rs
- Update quick_test example in spec_test.rs
Improvements to the Markdown parser:

- Add constants for magic numbers (INDENT_CODE_BLOCK_SPACES, MAX_HEADER_HASHES)
- Implement trailing hash parsing for ATX headers using lookahead detection
- Add TRIPLE_TILDE token and support for ~~~ fenced code blocks per CommonMark
- Update tests and snapshots
@changeset-bot
Copy link

changeset-bot bot commented Dec 20, 2025

⚠️ No Changeset found

Latest commit: 7d3d2b9

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@github-actions github-actions bot added A-Parser Area: parser A-Tooling Area: internal tools labels Dec 20, 2025
@jfmcdowell jfmcdowell changed the title feat(parser/markdown): complete Phase 1 parser implementation feat(parser/markdown): parser implementation Dec 20, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 20, 2025

Walkthrough

The PR converts the crate into a Markdown parser: lexer rewritten for Markdown tokens (headers, fences, emphasis, block quotes, lists, escapes, textual grouping); new private lexer helpers for backticks, tildes, hashes, escapes and thematic-break vs emphasis logic; parser gains trivia inspection APIs and new modules (fenced_code_block, header, inline, list, quote) implementing fenced/indented code blocks, ATX headers, block quotes, bullet lists, paragraphs and inline elements; grammar/kind updates and a large suite of lexer and fixture tests added.

Suggested reviewers

  • dyc3
  • ematipico

Pre-merge checks and finishing touches

✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: a Markdown parser implementation, which aligns with the extensive parser infrastructure and syntax modules added across the codebase.
Description check ✅ Passed The description is well-related to the changeset, detailing implemented features, infrastructure, test results, and deferred work, all of which correspond to the actual code changes.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1102636 and 7d3d2b9.

⛔ Files ignored due to path filters (7)
  • crates/biome_markdown_parser/tests/md_test_suite/ok/block_quote.md.snap is excluded by !**/*.snap and included by **
  • crates/biome_markdown_parser/tests/md_test_suite/ok/bullet_list.md.snap is excluded by !**/*.snap and included by **
  • crates/biome_markdown_parser/tests/md_test_suite/ok/fenced_code_block.md.snap is excluded by !**/*.snap and included by **
  • crates/biome_markdown_parser/tests/md_test_suite/ok/header.md.snap is excluded by !**/*.snap and included by **
  • crates/biome_markdown_parser/tests/md_test_suite/ok/indent_code_block.md.snap is excluded by !**/*.snap and included by **
  • crates/biome_markdown_parser/tests/md_test_suite/ok/inline_elements.md.snap is excluded by !**/*.snap and included by **
  • crates/biome_markdown_parser/tests/md_test_suite/ok/paragraph.md.snap is excluded by !**/*.snap and included by **
📒 Files selected for processing (4)
  • crates/biome_markdown_parser/src/lexer/mod.rs (6 hunks)
  • crates/biome_markdown_parser/src/lexer/tests.rs (2 hunks)
  • crates/biome_markdown_parser/src/syntax.rs (3 hunks)
  • crates/biome_markdown_syntax/src/lib.rs (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
crates/**/*.rs

📄 CodeRabbit inference engine (CONTRIBUTING.md)

Update inline rustdoc documentation for rules, assists, and their options when adding new features or changing existing features in Rust crates

Files:

  • crates/biome_markdown_syntax/src/lib.rs
  • crates/biome_markdown_parser/src/syntax.rs
  • crates/biome_markdown_parser/src/lexer/mod.rs
  • crates/biome_markdown_parser/src/lexer/tests.rs
🧠 Learnings (32)
📓 Common learnings
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/src/**/*.rs : Implement a token source struct that wraps the lexer and implements `TokenSourceWithBufferedLexer` and `LexerWithCheckpoint` for lookahead and re-lexing capabilities
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/lexer/mod.rs : Implement a `Lexer` trait from `biome_parser` crate for the lexer struct that consumes characters from source code and emits tokens
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/*.ungram : Add a legend comment to `.ungram` files explaining the grammar syntax including comments, non-terminals, tokens, sequences, alternations, repetitions, and optional elements
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/src/**/*.rs : Use `p.eat(token)` for optional tokens, `p.expect(token)` for required tokens, `parse_rule(p).ok(p)` for optional nodes, and `parse_rule(p).or_add_diagnostic(p, error)` for required nodes
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/src/**/*.rs : A parser struct must implement the `Parser` trait and save the token source, parser context, and optional parser options
Learnt from: CR
Repo: biomejs/biome PR: 0
File: CONTRIBUTING.md:0-0
Timestamp: 2025-12-12T10:11:05.564Z
Learning: Applies to crates/**/*.rs : Update inline rustdoc documentation for rules, assists, and their options when adding new features or changing existing features in Rust crates
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/src/**/*.rs : Implement error recovery in list parsing using `or_recover()` to wrap unparseable tokens in a `BOGUS_*` node and consume tokens until a recovery token is found
📚 Learning: 2025-11-24T18:06:03.545Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/src/**/*.rs : Use `ConditionalParsedSyntax` for syntax that is only valid in specific contexts (e.g., strict mode, file types, language versions) and call `or_invalid_to_bogus()` to convert to a bogus node if not supported

Applied to files:

  • crates/biome_markdown_syntax/src/lib.rs
  • crates/biome_markdown_parser/src/syntax.rs
📚 Learning: 2025-12-19T12:53:30.399Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/*analyze/src/**/*.rs : Prefix line with `#` in documentation code examples sparingly; prefer concise complete snippets

Applied to files:

  • crates/biome_markdown_syntax/src/lib.rs
  • crates/biome_markdown_parser/src/syntax.rs
  • crates/biome_markdown_parser/src/lexer/mod.rs
📚 Learning: 2025-12-19T12:53:30.399Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/*analyze/src/**/*.rs : Specify `fix_kind: FixKind::Safe` in `declare_lint_rule!` for safe code actions

Applied to files:

  • crates/biome_markdown_syntax/src/lib.rs
📚 Learning: 2025-12-12T10:11:05.564Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: CONTRIBUTING.md:0-0
Timestamp: 2025-12-12T10:11:05.564Z
Learning: Applies to crates/**/*.rs : Update inline rustdoc documentation for rules, assists, and their options when adding new features or changing existing features in Rust crates

Applied to files:

  • crates/biome_markdown_syntax/src/lib.rs
  • crates/biome_markdown_parser/src/syntax.rs
  • crates/biome_markdown_parser/src/lexer/mod.rs
  • crates/biome_markdown_parser/src/lexer/tests.rs
📚 Learning: 2025-12-19T12:53:30.399Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/*analyze/src/**/*.rs : Lines prefixed with `#` in rule documentation code examples will be hidden from output

Applied to files:

  • crates/biome_markdown_syntax/src/lib.rs
  • crates/biome_markdown_parser/src/syntax.rs
  • crates/biome_markdown_parser/src/lexer/mod.rs
📚 Learning: 2025-12-19T12:53:30.399Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/*analyze/src/**/*.rs : Use `let else` trick when `run` function returns `Vec` to reduce code branching

Applied to files:

  • crates/biome_markdown_syntax/src/lib.rs
📚 Learning: 2025-12-19T12:53:30.399Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/*analyze/src/**/*.rs : Avoid string allocations by comparing against `&str` or using `TokenText`

Applied to files:

  • crates/biome_markdown_syntax/src/lib.rs
  • crates/biome_markdown_parser/src/lexer/mod.rs
  • crates/biome_markdown_parser/src/lexer/tests.rs
📚 Learning: 2025-11-24T18:05:42.356Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_js_type_info/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:05:42.356Z
Learning: Applies to crates/biome_js_type_info/**/*.rs : Distinguish between `TypeData::Unknown` and `TypeData::UnknownKeyword` to measure inference effectiveness versus explicit user-provided unknown types

Applied to files:

  • crates/biome_markdown_syntax/src/lib.rs
📚 Learning: 2025-11-24T18:06:03.545Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/src/**/*.rs : Implement a token source struct that wraps the lexer and implements `TokenSourceWithBufferedLexer` and `LexerWithCheckpoint` for lookahead and re-lexing capabilities

Applied to files:

  • crates/biome_markdown_parser/src/syntax.rs
  • crates/biome_markdown_parser/src/lexer/mod.rs
  • crates/biome_markdown_parser/src/lexer/tests.rs
📚 Learning: 2025-12-19T12:53:30.399Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/*analyze/src/**/*.rs : Rule documentation code blocks must have a language defined for syntax highlighting

Applied to files:

  • crates/biome_markdown_parser/src/syntax.rs
📚 Learning: 2025-12-19T12:53:30.399Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/*analyze/src/**/*.rs : Rule documentation code blocks should be ordered as language, expect_diagnostic, options/full_options/use_options, ignore, file

Applied to files:

  • crates/biome_markdown_parser/src/syntax.rs
📚 Learning: 2025-11-24T18:06:03.545Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/src/**/*.rs : Parse rules must take a mutable reference to the parser as their only parameter and return a `ParsedSyntax`

Applied to files:

  • crates/biome_markdown_parser/src/syntax.rs
📚 Learning: 2025-11-24T18:06:03.545Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/src/**/*.rs : Use `ParseSeparatedList` and `ParseNodeList` for parsing lists with error recovery to avoid infinite loops

Applied to files:

  • crates/biome_markdown_parser/src/syntax.rs
📚 Learning: 2025-11-24T18:06:03.545Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/src/**/*.rs : Parse rule functions must be prefixed with `parse_` and use the name defined in the grammar file, e.g., `parse_for_statement` or `parse_expression`

Applied to files:

  • crates/biome_markdown_parser/src/syntax.rs
📚 Learning: 2025-11-24T18:06:03.545Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/src/**/*.rs : Use `p.eat(token)` for optional tokens, `p.expect(token)` for required tokens, `parse_rule(p).ok(p)` for optional nodes, and `parse_rule(p).or_add_diagnostic(p, error)` for required nodes

Applied to files:

  • crates/biome_markdown_parser/src/syntax.rs
  • crates/biome_markdown_parser/src/lexer/mod.rs
  • crates/biome_markdown_parser/src/lexer/tests.rs
📚 Learning: 2025-11-24T18:06:03.545Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/src/**/*.rs : A parser struct must implement the `Parser` trait and save the token source, parser context, and optional parser options

Applied to files:

  • crates/biome_markdown_parser/src/syntax.rs
  • crates/biome_markdown_parser/src/lexer/mod.rs
📚 Learning: 2025-11-24T18:06:03.545Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/lexer/mod.rs : Implement a `Lexer` trait from `biome_parser` crate for the lexer struct that consumes characters from source code and emits tokens

Applied to files:

  • crates/biome_markdown_parser/src/syntax.rs
  • crates/biome_markdown_parser/src/lexer/mod.rs
  • crates/biome_markdown_parser/src/lexer/tests.rs
📚 Learning: 2025-11-24T18:06:03.545Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/src/**/*.rs : Parse rules must return `ParsedSyntax::Absent` if the rule can't predict by the next token(s) if they form the expected node, and must not progress the parser in this case

Applied to files:

  • crates/biome_markdown_parser/src/syntax.rs
📚 Learning: 2025-11-24T18:05:27.810Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_js_formatter/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:05:27.810Z
Learning: Applies to crates/biome_js_formatter/**/*.rs : When formatting AST nodes, use mandatory tokens from the AST instead of hardcoding token strings (e.g., use `node.l_paren_token().format()` instead of `token("(")`)

Applied to files:

  • crates/biome_markdown_parser/src/lexer/mod.rs
  • crates/biome_markdown_parser/src/lexer/tests.rs
📚 Learning: 2025-11-24T18:06:03.545Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/src/**/*.rs : Implement error recovery in list parsing using `or_recover()` to wrap unparseable tokens in a `BOGUS_*` node and consume tokens until a recovery token is found

Applied to files:

  • crates/biome_markdown_parser/src/lexer/mod.rs
📚 Learning: 2025-11-24T18:06:03.545Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/*.ungram : Add a legend comment to `.ungram` files explaining the grammar syntax including comments, non-terminals, tokens, sequences, alternations, repetitions, and optional elements

Applied to files:

  • crates/biome_markdown_parser/src/lexer/mod.rs
📚 Learning: 2025-12-19T12:53:30.399Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/*analyze/src/**/*.rs : Rule documentation must include `## Options` section if the rule has options

Applied to files:

  • crates/biome_markdown_parser/src/lexer/mod.rs
📚 Learning: 2025-11-24T18:05:27.810Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_js_formatter/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:05:27.810Z
Learning: Applies to crates/biome_js_formatter/**/*.rs : For tokens that are not mandatory, use helper functions instead of hardcoding

Applied to files:

  • crates/biome_markdown_parser/src/lexer/mod.rs
📚 Learning: 2025-12-19T12:53:30.399Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/*analyze/src/**/*.rs : The first paragraph of rule documentation must be a single line describing what the rule does

Applied to files:

  • crates/biome_markdown_parser/src/lexer/mod.rs
📚 Learning: 2025-11-24T18:05:20.371Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_formatter/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:05:20.371Z
Learning: Applies to crates/biome_formatter/**/biome_*_formatter/src/comments.rs : Define `<Language>CommentStyle` as a public type alias for `Comments<<Language>Language>` in a `comments.rs` file

Applied to files:

  • crates/biome_markdown_parser/src/lexer/mod.rs
📚 Learning: 2025-11-09T12:47:46.298Z
Learnt from: ematipico
Repo: biomejs/biome PR: 8031
File: crates/biome_html_parser/src/syntax/svelte.rs:140-147
Timestamp: 2025-11-09T12:47:46.298Z
Learning: In the Biome HTML parser, `expect` and `expect_with_context` consume the current token and then lex the next token. The context parameter in `expect_with_context` controls how the next token (after the consumed one) is lexed, not the current token being consumed. For example, in Svelte parsing, after `bump_with_context(T!["{:"], HtmlLexContext::Svelte)`, the next token is already lexed in the Svelte context, so `expect(T![else])` is sufficient unless the token after `else` also needs to be lexed in a specific context.

Applied to files:

  • crates/biome_markdown_parser/src/lexer/mod.rs
  • crates/biome_markdown_parser/src/lexer/tests.rs
📚 Learning: 2025-11-24T18:05:20.371Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_formatter/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:05:20.371Z
Learning: Applies to crates/biome_formatter/**/biome_*_formatter/tests/language.rs : Implement `TestFormatLanguage` trait in `tests/language.rs` for the formatter's test language

Applied to files:

  • crates/biome_markdown_parser/src/lexer/tests.rs
📚 Learning: 2025-11-24T18:05:20.371Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_formatter/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:05:20.371Z
Learning: Applies to crates/biome_formatter/**/biome_*_formatter/tests/spec_tests.rs : Use the `tests_macros::gen_tests!` macro in `spec_tests.rs` to generate test functions for each specification file matching the pattern `tests/specs/<language>/**/*.<ext>`

Applied to files:

  • crates/biome_markdown_parser/src/lexer/tests.rs
📚 Learning: 2025-11-24T18:06:12.048Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_service/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:12.048Z
Learning: Applies to crates/biome_service/src/workspace/watcher.tests.rs : Implement watcher tests for workspace methods in watcher.tests.rs and end-to-end tests in LSP tests

Applied to files:

  • crates/biome_markdown_parser/src/lexer/tests.rs
📚 Learning: 2025-11-24T18:05:20.371Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_formatter/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:05:20.371Z
Learning: Create test infrastructure with `tests/specs` folder structure and `spec_test.rs`, `spec_tests.rs`, and `language.rs` files in test directories

Applied to files:

  • crates/biome_markdown_parser/src/lexer/tests.rs
📚 Learning: 2025-11-24T18:05:20.371Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_formatter/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:05:20.371Z
Learning: Applies to crates/biome_formatter/**/biome_*_formatter/Cargo.toml : Include development dependencies in `Cargo.toml` for formatter tests: `biome_formatter_test`, `biome_<language>_factory`, `biome_<language>_parser`, `biome_parser`, `biome_service`, `countme`, `iai`, `quickcheck`, `quickcheck_macros`, and `tests_macros`

Applied to files:

  • crates/biome_markdown_parser/src/lexer/tests.rs
🧬 Code graph analysis (1)
crates/biome_markdown_parser/src/lexer/mod.rs (2)
crates/biome_unicode_table/src/lib.rs (1)
  • lookup_byte (106-109)
crates/biome_yaml_parser/src/lexer/mod.rs (1)
  • start_position (872-874)
🔇 Additional comments (2)
crates/biome_markdown_syntax/src/lib.rs (2)

60-64: LGTM—cleaner refactoring.

The direct match simplifies the conversion logic whilst maintaining identical behaviour. The mapping aligns perfectly with is_trivia() at lines 47-49.


40-41: Confirmed: is_root() is unused in Phase 1 and safe to leave as todo!().

No calls to is_root() exist anywhere in the Markdown codebase, so the panic won't be triggered during this phase of development.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
crates/biome_markdown_parser/src/syntax/header.rs (1)

151-155: Redundant wrapper function.

has_newline_since just delegates to p.has_newline_since. Consider inlining calls directly.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ab88099 and 5b90681.

⛔ Files ignored due to path filters (13)
  • crates/biome_markdown_factory/src/generated/node_factory.rs is excluded by !**/generated/**, !**/generated/** and included by **
  • crates/biome_markdown_factory/src/generated/syntax_factory.rs is excluded by !**/generated/**, !**/generated/** and included by **
  • crates/biome_markdown_parser/tests/md_test_suite/ok/block_quote.md.snap is excluded by !**/*.snap and included by **
  • crates/biome_markdown_parser/tests/md_test_suite/ok/bullet_list.md.snap is excluded by !**/*.snap and included by **
  • crates/biome_markdown_parser/tests/md_test_suite/ok/fenced_code_block.md.snap is excluded by !**/*.snap and included by **
  • crates/biome_markdown_parser/tests/md_test_suite/ok/header.md.snap is excluded by !**/*.snap and included by **
  • crates/biome_markdown_parser/tests/md_test_suite/ok/indent_code_block.md.snap is excluded by !**/*.snap and included by **
  • crates/biome_markdown_parser/tests/md_test_suite/ok/inline_elements.md.snap is excluded by !**/*.snap and included by **
  • crates/biome_markdown_parser/tests/md_test_suite/ok/paragraph.md.snap is excluded by !**/*.snap and included by **
  • crates/biome_markdown_syntax/src/generated/kind.rs is excluded by !**/generated/**, !**/generated/** and included by **
  • crates/biome_markdown_syntax/src/generated/macros.rs is excluded by !**/generated/**, !**/generated/** and included by **
  • crates/biome_markdown_syntax/src/generated/nodes.rs is excluded by !**/generated/**, !**/generated/** and included by **
  • crates/biome_markdown_syntax/src/generated/nodes_mut.rs is excluded by !**/generated/**, !**/generated/** and included by **
📒 Files selected for processing (21)
  • .changeset/curly-donkeys-show.md (1 hunks)
  • crates/biome_markdown_parser/src/lexer/mod.rs (4 hunks)
  • crates/biome_markdown_parser/src/lexer/tests.rs (1 hunks)
  • crates/biome_markdown_parser/src/parser.rs (1 hunks)
  • crates/biome_markdown_parser/src/syntax.rs (3 hunks)
  • crates/biome_markdown_parser/src/syntax/fenced_code_block.rs (1 hunks)
  • crates/biome_markdown_parser/src/syntax/header.rs (1 hunks)
  • crates/biome_markdown_parser/src/syntax/inline.rs (1 hunks)
  • crates/biome_markdown_parser/src/syntax/list.rs (1 hunks)
  • crates/biome_markdown_parser/src/syntax/quote.rs (1 hunks)
  • crates/biome_markdown_parser/src/token_source.rs (1 hunks)
  • crates/biome_markdown_parser/tests/md_test_suite/ok/block_quote.md (1 hunks)
  • crates/biome_markdown_parser/tests/md_test_suite/ok/bullet_list.md (1 hunks)
  • crates/biome_markdown_parser/tests/md_test_suite/ok/fenced_code_block.md (1 hunks)
  • crates/biome_markdown_parser/tests/md_test_suite/ok/header.md (1 hunks)
  • crates/biome_markdown_parser/tests/md_test_suite/ok/indent_code_block.md (1 hunks)
  • crates/biome_markdown_parser/tests/md_test_suite/ok/inline_elements.md (1 hunks)
  • crates/biome_markdown_parser/tests/md_test_suite/ok/paragraph.md (1 hunks)
  • crates/biome_markdown_parser/tests/spec_test.rs (1 hunks)
  • xtask/codegen/markdown.ungram (4 hunks)
  • xtask/codegen/src/markdown_kinds_src.rs (1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
crates/**/*.rs

📄 CodeRabbit inference engine (CONTRIBUTING.md)

Update inline rustdoc documentation for rules, assists, and their options when adding new features or changing existing features in Rust crates

Files:

  • crates/biome_markdown_parser/src/parser.rs
  • crates/biome_markdown_parser/src/syntax/list.rs
  • crates/biome_markdown_parser/src/syntax/header.rs
  • crates/biome_markdown_parser/src/syntax/fenced_code_block.rs
  • crates/biome_markdown_parser/src/syntax/inline.rs
  • crates/biome_markdown_parser/src/syntax/quote.rs
  • crates/biome_markdown_parser/src/token_source.rs
  • crates/biome_markdown_parser/src/lexer/tests.rs
  • crates/biome_markdown_parser/src/lexer/mod.rs
  • crates/biome_markdown_parser/tests/spec_test.rs
  • crates/biome_markdown_parser/src/syntax.rs
.changeset/*.md

📄 CodeRabbit inference engine (CONTRIBUTING.md)

Write changesets that are concise (1-3 sentences), user-focused, use past tense for actions taken and present tense for Biome behavior, include code examples for rules, and end sentences with periods

Files:

  • .changeset/curly-donkeys-show.md
🧠 Learnings (37)
📓 Common learnings
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/lexer/mod.rs : Implement a `Lexer` trait from `biome_parser` crate for the lexer struct that consumes characters from source code and emits tokens
📚 Learning: 2025-12-19T12:53:30.399Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/*analyze/src/**/*.rs : The first paragraph of rule documentation must be a single line describing what the rule does

Applied to files:

  • crates/biome_markdown_parser/tests/md_test_suite/ok/paragraph.md
📚 Learning: 2025-12-19T12:53:30.399Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/*analyze/src/**/*.rs : Prefix line with `#` in documentation code examples sparingly; prefer concise complete snippets

Applied to files:

  • crates/biome_markdown_parser/tests/md_test_suite/ok/paragraph.md
  • crates/biome_markdown_parser/tests/md_test_suite/ok/block_quote.md
  • crates/biome_markdown_parser/tests/md_test_suite/ok/indent_code_block.md
  • crates/biome_markdown_parser/tests/md_test_suite/ok/inline_elements.md
  • crates/biome_markdown_parser/src/syntax/fenced_code_block.rs
  • crates/biome_markdown_parser/src/syntax/inline.rs
  • crates/biome_markdown_parser/src/syntax/quote.rs
  • crates/biome_markdown_parser/tests/md_test_suite/ok/fenced_code_block.md
  • crates/biome_markdown_parser/tests/spec_test.rs
  • crates/biome_markdown_parser/src/syntax.rs
📚 Learning: 2025-12-04T13:29:49.287Z
Learnt from: dyc3
Repo: biomejs/biome PR: 8291
File: crates/biome_html_formatter/tests/specs/prettier/vue/html-vue/elastic-header.html:10-10
Timestamp: 2025-12-04T13:29:49.287Z
Learning: Files under `crates/biome_html_formatter/tests/specs/prettier` are test fixtures synced from Prettier and should not receive detailed code quality reviews (e.g., HTTP vs HTTPS, formatting suggestions, etc.). These files are test data meant to validate formatter behavior and should be preserved as-is.

Applied to files:

  • crates/biome_markdown_parser/tests/md_test_suite/ok/paragraph.md
  • crates/biome_markdown_parser/tests/md_test_suite/ok/indent_code_block.md
  • crates/biome_markdown_parser/tests/md_test_suite/ok/inline_elements.md
  • crates/biome_markdown_parser/tests/md_test_suite/ok/fenced_code_block.md
  • crates/biome_markdown_parser/src/lexer/tests.rs
  • crates/biome_markdown_parser/tests/spec_test.rs
📚 Learning: 2025-11-24T18:06:03.545Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/*.ungram : Add a legend comment to `.ungram` files explaining the grammar syntax including comments, non-terminals, tokens, sequences, alternations, repetitions, and optional elements

Applied to files:

  • crates/biome_markdown_parser/tests/md_test_suite/ok/paragraph.md
  • xtask/codegen/markdown.ungram
📚 Learning: 2025-11-24T18:06:03.545Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/src/**/*.rs : Implement a token source struct that wraps the lexer and implements `TokenSourceWithBufferedLexer` and `LexerWithCheckpoint` for lookahead and re-lexing capabilities

Applied to files:

  • crates/biome_markdown_parser/src/parser.rs
  • crates/biome_markdown_parser/src/syntax/list.rs
  • crates/biome_markdown_parser/src/syntax/fenced_code_block.rs
  • crates/biome_markdown_parser/src/syntax/inline.rs
  • crates/biome_markdown_parser/src/syntax/quote.rs
  • crates/biome_markdown_parser/src/token_source.rs
  • crates/biome_markdown_parser/src/lexer/tests.rs
  • crates/biome_markdown_parser/src/lexer/mod.rs
  • crates/biome_markdown_parser/src/syntax.rs
📚 Learning: 2025-11-24T18:06:03.545Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/src/**/*.rs : A parser struct must implement the `Parser` trait and save the token source, parser context, and optional parser options

Applied to files:

  • crates/biome_markdown_parser/src/parser.rs
  • crates/biome_markdown_parser/src/syntax/quote.rs
  • crates/biome_markdown_parser/src/syntax.rs
📚 Learning: 2025-12-19T12:53:30.399Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/*analyze/tests/specs/**/*.jsonc : Use `.jsonc` files to contain arrays of code snippet strings for snapshot tests

Applied to files:

  • crates/biome_markdown_parser/tests/md_test_suite/ok/indent_code_block.md
  • crates/biome_markdown_parser/tests/md_test_suite/ok/fenced_code_block.md
  • crates/biome_markdown_parser/tests/spec_test.rs
📚 Learning: 2025-11-24T18:06:03.545Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/language_kind.rs : Add a new language prefix to the `LANGUAGE_PREFIXES` constant in `language_kind.rs` file

Applied to files:

  • xtask/codegen/src/markdown_kinds_src.rs
  • crates/biome_markdown_parser/src/syntax/fenced_code_block.rs
📚 Learning: 2025-11-24T18:06:03.545Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/language_kind.rs : Add a new variant to `LanguageKind` enum in `language_kind.rs` file and implement all methods for the new language variant

Applied to files:

  • xtask/codegen/src/markdown_kinds_src.rs
  • crates/biome_markdown_parser/src/syntax/fenced_code_block.rs
  • crates/biome_markdown_parser/src/syntax.rs
📚 Learning: 2025-11-24T18:06:03.545Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/src/*_kinds_src.rs : Create a file called `src/{language}_kinds_src.rs` that returns a static `KindSrc` for new language parsers

Applied to files:

  • xtask/codegen/src/markdown_kinds_src.rs
📚 Learning: 2025-11-24T18:06:03.545Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/src/**/*.rs : Use `ParseSeparatedList` and `ParseNodeList` for parsing lists with error recovery to avoid infinite loops

Applied to files:

  • crates/biome_markdown_parser/src/syntax/list.rs
  • crates/biome_markdown_parser/src/syntax/inline.rs
  • crates/biome_markdown_parser/src/syntax.rs
📚 Learning: 2025-12-12T10:11:05.564Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: CONTRIBUTING.md:0-0
Timestamp: 2025-12-12T10:11:05.564Z
Learning: Applies to crates/**/*.rs : Update inline rustdoc documentation for rules, assists, and their options when adding new features or changing existing features in Rust crates

Applied to files:

  • crates/biome_markdown_parser/src/syntax/list.rs
  • crates/biome_markdown_parser/tests/md_test_suite/ok/inline_elements.md
  • crates/biome_markdown_parser/src/syntax/fenced_code_block.rs
  • crates/biome_markdown_parser/src/syntax/inline.rs
  • crates/biome_markdown_parser/src/lexer/tests.rs
  • crates/biome_markdown_parser/src/lexer/mod.rs
  • crates/biome_markdown_parser/src/syntax.rs
📚 Learning: 2025-11-24T18:06:03.545Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/src/**/*.rs : Implement error recovery in list parsing using `or_recover()` to wrap unparseable tokens in a `BOGUS_*` node and consume tokens until a recovery token is found

Applied to files:

  • crates/biome_markdown_parser/src/syntax/list.rs
  • crates/biome_markdown_parser/src/syntax.rs
📚 Learning: 2025-12-19T12:53:30.399Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/*analyze/src/lint/nursery/**/*.rs : Add `issue_number` field to `declare_lint_rule!` macro for work-in-progress rules

Applied to files:

  • crates/biome_markdown_parser/src/syntax/list.rs
📚 Learning: 2025-11-24T18:05:20.371Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_formatter/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:05:20.371Z
Learning: Applies to crates/biome_formatter/**/biome_*_formatter/Cargo.toml : Include development dependencies in `Cargo.toml` for formatter tests: `biome_formatter_test`, `biome_<language>_factory`, `biome_<language>_parser`, `biome_parser`, `biome_service`, `countme`, `iai`, `quickcheck`, `quickcheck_macros`, and `tests_macros`

Applied to files:

  • crates/biome_markdown_parser/tests/md_test_suite/ok/inline_elements.md
  • crates/biome_markdown_parser/src/lexer/tests.rs
📚 Learning: 2025-11-24T18:05:20.371Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_formatter/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:05:20.371Z
Learning: Applies to crates/biome_formatter/**/biome_*_formatter/tests/language.rs : Implement `TestFormatLanguage` trait in `tests/language.rs` for the formatter's test language

Applied to files:

  • crates/biome_markdown_parser/tests/md_test_suite/ok/inline_elements.md
  • crates/biome_markdown_parser/tests/md_test_suite/ok/fenced_code_block.md
  • crates/biome_markdown_parser/src/lexer/tests.rs
  • crates/biome_markdown_parser/tests/spec_test.rs
📚 Learning: 2025-12-19T12:53:30.399Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/*analyze/src/**/*.rs : Use `ignore` code block property to exclude documentation code examples from automatic validation

Applied to files:

  • crates/biome_markdown_parser/tests/md_test_suite/ok/inline_elements.md
  • crates/biome_markdown_parser/tests/md_test_suite/ok/fenced_code_block.md
  • crates/biome_markdown_parser/tests/spec_test.rs
📚 Learning: 2025-12-19T12:53:30.399Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/*analyze/src/**/*.rs : Lines prefixed with `#` in rule documentation code examples will be hidden from output

Applied to files:

  • crates/biome_markdown_parser/tests/md_test_suite/ok/inline_elements.md
  • crates/biome_markdown_parser/src/syntax/quote.rs
  • crates/biome_markdown_parser/tests/md_test_suite/ok/fenced_code_block.md
  • crates/biome_markdown_parser/tests/spec_test.rs
📚 Learning: 2025-12-12T10:11:05.564Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: CONTRIBUTING.md:0-0
Timestamp: 2025-12-12T10:11:05.564Z
Learning: Applies to .changeset/*.md : Write changesets that are concise (1-3 sentences), user-focused, use past tense for actions taken and present tense for Biome behavior, include code examples for rules, and end sentences with periods

Applied to files:

  • .changeset/curly-donkeys-show.md
📚 Learning: 2025-12-12T10:11:05.564Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: CONTRIBUTING.md:0-0
Timestamp: 2025-12-12T10:11:05.564Z
Learning: Create a changeset for user-facing bug fixes and features using `just new-changeset`, selecting appropriate packages and change type (`major`, `minor`, or `patch`)

Applied to files:

  • .changeset/curly-donkeys-show.md
📚 Learning: 2025-12-19T12:53:30.399Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/*analyze/src/**/*.rs : Rule documentation code blocks must have a language defined for syntax highlighting

Applied to files:

  • crates/biome_markdown_parser/src/syntax/fenced_code_block.rs
  • crates/biome_markdown_parser/src/syntax/inline.rs
  • crates/biome_markdown_parser/src/syntax/quote.rs
  • crates/biome_markdown_parser/tests/md_test_suite/ok/fenced_code_block.md
  • crates/biome_markdown_parser/tests/spec_test.rs
  • crates/biome_markdown_parser/src/syntax.rs
📚 Learning: 2025-12-19T12:53:30.399Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/*analyze/src/**/*.rs : Rule documentation code blocks should be ordered as language, expect_diagnostic, options/full_options/use_options, ignore, file

Applied to files:

  • crates/biome_markdown_parser/src/syntax/fenced_code_block.rs
  • crates/biome_markdown_parser/src/syntax/inline.rs
  • crates/biome_markdown_parser/src/syntax/quote.rs
  • crates/biome_markdown_parser/tests/md_test_suite/ok/fenced_code_block.md
  • crates/biome_markdown_parser/src/syntax.rs
📚 Learning: 2025-11-24T18:06:03.545Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/src/**/*.rs : Parse rules must take a mutable reference to the parser as their only parameter and return a `ParsedSyntax`

Applied to files:

  • crates/biome_markdown_parser/src/syntax/fenced_code_block.rs
  • crates/biome_markdown_parser/src/syntax/inline.rs
  • crates/biome_markdown_parser/src/syntax/quote.rs
  • crates/biome_markdown_parser/src/syntax.rs
📚 Learning: 2025-11-24T18:06:03.545Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/src/**/*.rs : Use `ConditionalParsedSyntax` for syntax that is only valid in specific contexts (e.g., strict mode, file types, language versions) and call `or_invalid_to_bogus()` to convert to a bogus node if not supported

Applied to files:

  • crates/biome_markdown_parser/src/syntax/fenced_code_block.rs
  • crates/biome_markdown_parser/src/syntax/inline.rs
  • crates/biome_markdown_parser/src/syntax/quote.rs
  • crates/biome_markdown_parser/src/syntax.rs
📚 Learning: 2025-11-24T18:06:03.545Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/src/**/*.rs : Parse rule functions must be prefixed with `parse_` and use the name defined in the grammar file, e.g., `parse_for_statement` or `parse_expression`

Applied to files:

  • crates/biome_markdown_parser/src/syntax/fenced_code_block.rs
  • crates/biome_markdown_parser/src/syntax/inline.rs
  • crates/biome_markdown_parser/src/syntax/quote.rs
  • crates/biome_markdown_parser/src/syntax.rs
📚 Learning: 2025-11-24T18:06:03.545Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/src/**/*.rs : Parse rules must return `ParsedSyntax::Absent` if the rule can't predict by the next token(s) if they form the expected node, and must not progress the parser in this case

Applied to files:

  • crates/biome_markdown_parser/src/syntax/fenced_code_block.rs
  • crates/biome_markdown_parser/src/syntax/inline.rs
  • crates/biome_markdown_parser/src/syntax/quote.rs
  • crates/biome_markdown_parser/src/syntax.rs
📚 Learning: 2025-11-24T18:06:03.545Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/src/**/*.rs : Use `p.eat(token)` for optional tokens, `p.expect(token)` for required tokens, `parse_rule(p).ok(p)` for optional nodes, and `parse_rule(p).or_add_diagnostic(p, error)` for required nodes

Applied to files:

  • crates/biome_markdown_parser/src/syntax/inline.rs
  • crates/biome_markdown_parser/src/lexer/tests.rs
  • crates/biome_markdown_parser/src/lexer/mod.rs
  • crates/biome_markdown_parser/src/syntax.rs
📚 Learning: 2025-12-19T12:53:30.399Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/*analyze/src/**/*.rs : Use `full_options` code block property for complete biome.json configuration snippets in documentation

Applied to files:

  • crates/biome_markdown_parser/tests/md_test_suite/ok/fenced_code_block.md
📚 Learning: 2025-12-19T12:53:30.399Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/*analyze/src/**/*.rs : Invalid code examples in rule documentation must be marked with `expect_diagnostic` code block property

Applied to files:

  • crates/biome_markdown_parser/tests/md_test_suite/ok/fenced_code_block.md
  • crates/biome_markdown_parser/tests/spec_test.rs
📚 Learning: 2025-11-24T18:06:03.545Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/lexer/mod.rs : Implement a `Lexer` trait from `biome_parser` crate for the lexer struct that consumes characters from source code and emits tokens

Applied to files:

  • crates/biome_markdown_parser/src/lexer/tests.rs
  • crates/biome_markdown_parser/src/lexer/mod.rs
  • crates/biome_markdown_parser/src/syntax.rs
📚 Learning: 2025-11-24T18:05:20.371Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_formatter/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:05:20.371Z
Learning: Applies to crates/biome_formatter/**/biome_*_formatter/tests/spec_tests.rs : Use the `tests_macros::gen_tests!` macro in `spec_tests.rs` to generate test functions for each specification file matching the pattern `tests/specs/<language>/**/*.<ext>`

Applied to files:

  • crates/biome_markdown_parser/src/lexer/tests.rs
  • crates/biome_markdown_parser/tests/spec_test.rs
📚 Learning: 2025-11-24T18:06:12.048Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_service/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:12.048Z
Learning: Applies to crates/biome_service/src/workspace/watcher.tests.rs : Implement watcher tests for workspace methods in watcher.tests.rs and end-to-end tests in LSP tests

Applied to files:

  • crates/biome_markdown_parser/src/lexer/tests.rs
  • crates/biome_markdown_parser/tests/spec_test.rs
📚 Learning: 2025-11-24T18:05:20.371Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_formatter/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:05:20.371Z
Learning: Create test infrastructure with `tests/specs` folder structure and `spec_test.rs`, `spec_tests.rs`, and `language.rs` files in test directories

Applied to files:

  • crates/biome_markdown_parser/src/lexer/tests.rs
📚 Learning: 2025-11-24T18:05:27.810Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_js_formatter/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:05:27.810Z
Learning: Applies to crates/biome_js_formatter/**/*.rs : When formatting AST nodes, use mandatory tokens from the AST instead of hardcoding token strings (e.g., use `node.l_paren_token().format()` instead of `token("(")`)

Applied to files:

  • crates/biome_markdown_parser/src/lexer/tests.rs
  • crates/biome_markdown_parser/src/lexer/mod.rs
📚 Learning: 2025-11-09T12:47:46.298Z
Learnt from: ematipico
Repo: biomejs/biome PR: 8031
File: crates/biome_html_parser/src/syntax/svelte.rs:140-147
Timestamp: 2025-11-09T12:47:46.298Z
Learning: In the Biome HTML parser, `expect` and `expect_with_context` consume the current token and then lex the next token. The context parameter in `expect_with_context` controls how the next token (after the consumed one) is lexed, not the current token being consumed. For example, in Svelte parsing, after `bump_with_context(T!["{:"], HtmlLexContext::Svelte)`, the next token is already lexed in the Svelte context, so `expect(T![else])` is sufficient unless the token after `else` also needs to be lexed in a specific context.

Applied to files:

  • crates/biome_markdown_parser/src/lexer/mod.rs
📚 Learning: 2025-12-19T12:53:30.399Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/*analyze/src/**/*.rs : Valid code examples in rule documentation should not trigger any diagnostics

Applied to files:

  • crates/biome_markdown_parser/tests/spec_test.rs
🧬 Code graph analysis (7)
crates/biome_markdown_parser/src/parser.rs (2)
crates/biome_markdown_parser/src/token_source.rs (2)
  • has_blank_line_since (93-122)
  • has_newline_since (126-133)
crates/biome_markdown_parser/src/syntax/header.rs (1)
  • has_newline_since (153-155)
crates/biome_markdown_parser/src/syntax/list.rs (2)
crates/biome_markdown_parser/src/syntax.rs (1)
  • parse_any_inline (160-162)
crates/biome_markdown_parser/src/syntax/inline.rs (1)
  • parse_any_inline (244-258)
crates/biome_markdown_parser/src/syntax/inline.rs (2)
crates/biome_markdown_parser/src/lexer/mod.rs (1)
  • source (74-76)
crates/biome_markdown_parser/src/syntax.rs (2)
  • parse_any_inline (160-162)
  • parse_textual (171-181)
crates/biome_markdown_parser/src/syntax/quote.rs (1)
crates/biome_markdown_parser/src/syntax.rs (1)
  • parse_any_inline (160-162)
crates/biome_markdown_parser/src/token_source.rs (2)
crates/biome_markdown_parser/src/parser.rs (2)
  • has_blank_line_since (40-42)
  • has_newline_since (45-47)
crates/biome_markdown_parser/src/syntax/header.rs (1)
  • has_newline_since (153-155)
crates/biome_markdown_parser/src/lexer/mod.rs (2)
crates/biome_unicode_table/src/lib.rs (1)
  • lookup_byte (106-109)
crates/biome_yaml_parser/src/lexer/mod.rs (1)
  • start_position (872-874)
crates/biome_markdown_parser/src/syntax.rs (5)
crates/biome_markdown_parser/src/syntax/fenced_code_block.rs (2)
  • at_fenced_code_block (9-11)
  • parse_fenced_code_block (21-55)
crates/biome_markdown_parser/src/syntax/header.rs (2)
  • at_header (15-17)
  • parse_header (27-53)
crates/biome_markdown_parser/src/syntax/list.rs (2)
  • at_bullet_list_item (17-30)
  • parse_bullet_list_item (42-72)
crates/biome_markdown_parser/src/syntax/quote.rs (2)
  • at_quote (14-16)
  • parse_quote (24-58)
crates/biome_markdown_parser/src/syntax/inline.rs (1)
  • parse_any_inline (244-258)
🪛 LanguageTool
crates/biome_markdown_parser/tests/md_test_suite/ok/inline_elements.md

[style] ~4-~4: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...This is italic and this is bold. This is also italic and also bold. He...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)


[style] ~5-~5: You’ve already used the word ‘also’ once in your sentence, so using it again may be redundant.
Context: ... bold. This is also italic and also bold. Here is a [link](https://example.co...

(REDUNDANT_FILLER)

🔇 Additional comments (53)
xtask/codegen/src/markdown_kinds_src.rs (1)

19-20: LGTM!

The TRIPLE_TILDE token follows the established pattern alongside TRIPLE_BACKTICK, neatly supporting CommonMark tilde-fenced code blocks.

crates/biome_markdown_parser/tests/md_test_suite/ok/block_quote.md (1)

1-4: LGTM!

Solid test fixture covering multi-line block quotes and blank-line separation between distinct quote blocks.

crates/biome_markdown_parser/tests/md_test_suite/ok/inline_elements.md (1)

1-7: LGTM!

Comprehensive coverage of inline elements: code spans, both emphasis syntaxes (*/_), links, and images. The static analysis grumbles about repetitive phrasing, but that's rather the point of a test fixture.

crates/biome_markdown_parser/tests/md_test_suite/ok/header.md (1)

1-17: LGTM!

Excellent coverage of ATX headers (H1–H6) and trailing-hash edge cases. The "Mixed # content ## with ### trailing ####" line is a nice stress test for the lookahead-based detection.

Consider adding edge-case tests for invalid headers in a separate fixture (e.g., ####### Seven hashes which shouldn't parse as a header, or #NoSpace without the required space). These would exercise error recovery paths.

crates/biome_markdown_parser/tests/md_test_suite/ok/bullet_list.md (1)

1-6: LGTM!

Covers both bullet markers (- and *) with clean separation. Given multi-line list items are deferred to future work, this is appropriate Phase 1 coverage.

crates/biome_markdown_parser/tests/md_test_suite/ok/indent_code_block.md (1)

1-8: LGTM!

Test fixture appropriately exercises indented code block parsing with proper 4-space indentation and mixed content.

crates/biome_markdown_parser/tests/md_test_suite/ok/paragraph.md (1)

1-3: LGTM!

Clean test fixture for paragraph parsing with blank line separation.

crates/biome_markdown_parser/tests/spec_test.rs (1)

138-141: LGTM!

Test input updated with realistic indented code block content, aligning with the new test fixtures.

crates/biome_markdown_parser/src/lexer/tests.rs (1)

165-422: LGTM!

Comprehensive test coverage for the new Markdown lexer capabilities. The tests are well-structured, consistently use the assert_lex! macro, and appropriately exercise ATX headers, fenced code blocks, block quotes, emphasis, links, images, and escape sequences.

crates/biome_markdown_parser/tests/md_test_suite/ok/fenced_code_block.md (1)

1-15: LGTM!

Test fixture properly exercises fenced code block parsing with both backtick and tilde syntax, including language specifiers.

.changeset/curly-donkeys-show.md (1)

7-16: LGTM!

Clear, detailed description of the implemented features with appropriate caveats about scope limitations.

crates/biome_markdown_parser/src/token_source.rs (3)

86-89: LGTM!

Simple trivia length getter appropriately exposes the trivia list size for position tracking.


91-122: LGTM!

The blank line detection logic is correct: it properly counts consecutive newlines (ignoring whitespace between them), resets on trailing trivia or non-newline/whitespace trivia, and returns true when 2+ consecutive newlines are found.


124-133: LGTM!

Simple and correct detection of any non-trailing newline since the given position.

crates/biome_markdown_parser/src/parser.rs (1)

34-53: LGTM!

All four methods are clean delegations to the token source with clear documentation. They appropriately expose trivia tracking capabilities for block-level construct detection and paragraph boundary handling.

crates/biome_markdown_parser/src/syntax/quote.rs (2)

1-11: Module structure looks good.

Clean imports and module documentation. The module is correctly scoped as pub(crate).


13-16: Detection logic is correct.

Properly checks for > at line start using has_preceding_line_break() or at_start_of_input().

crates/biome_markdown_parser/src/syntax/fenced_code_block.rs (4)

8-11: Detection function is correct.

Properly identifies both backtick and tilde fence openings.


21-55: Well-structured fenced code block parsing.

Good handling of fence type matching per CommonMark spec. The optional closing fence is correctly handled.


62-82: Language name parsing is correct.

Sensibly checks for preceding line break to detect empty language strings and parses until newline.


86-105: Content parsing handles fence matching correctly.

Appropriately stops at matching fence type or EOF. Each token is wrapped as MD_TEXTUAL which aligns with the grammar.

crates/biome_markdown_parser/src/syntax/header.rs (4)

10-11: Good use of named constant.

MAX_HEADER_HASHES = 6 makes the CommonMark constraint clear.


57-70: Hash list parsing is straightforward.

Correctly wraps each # in MD_HASH and the sequence in MD_HASH_LIST.


126-149: Trailing hash detection uses checkpoint correctly.

The lookahead pattern with checkpoint/rewind is the right approach for this detection.


27-42: No issue here — the parser state is properly cleaned up.

parse_header is called only from within try_parse, which saves a checkpoint before the call. When rewind() is triggered on parse failure, it truncates the entire events list back to the checkpoint position, discarding the MD_HASH_LIST completion and all subsequent work. The function signature correctly returns Absent and relies on the caller's checkpoint/rewind mechanism, which is the standard pattern in biome_parser.

Likely an incorrect or invalid review comment.

crates/biome_markdown_parser/src/syntax.rs (6)

1-5: Good module organisation.

Clean separation of concerns with dedicated submodules for each block/inline type.


21-22: Named constant improves readability.

INDENT_CODE_BLOCK_SPACES = 4 documents the CommonMark requirement clearly.


27-28: EOF bump is required by the grammar.

Good addition with clear comment explaining the rationale.


41-77: Block dispatch logic is well-structured.

Good fallback chain with try_parse for constructs that may fail validation (thematic breaks, headers). The fallback to paragraph parsing is sensible.


90-109: Indented code block parsing is correct.

Properly checks for 4+ spaces and consumes content while indentation holds.


133-155: Inline item list parsing handles blank lines correctly.

Unlike quote.rs, the trivia position is captured before the loop (line 137), so blank-line detection works as intended.

crates/biome_markdown_parser/src/syntax/list.rs (4)

17-30: Bullet detection logic is sound.

Correctly requires line-start position for bullet markers.


52-68: Error recovery prevents infinite loops.

Good defensive programming with position tracking.


101-122: Bullet content parsing stops at newline correctly.

The has_preceding_line_break() check before parsing ensures single-line items.


124-137: Ordered list stubs are appropriately marked.

Using #[expect(dead_code)] is cleaner than #[allow(dead_code)] as it will warn when the code is eventually used.

crates/biome_markdown_parser/src/lexer/mod.rs (5)

13-14: Import refactoring is cleaner.

Importing Dispatch module and lookup_byte separately improves clarity.


182-196: Dispatch table extended correctly.

Good coverage of Markdown-specific tokens: hash, backtick, tilde, brackets, escape sequences.


199-250: Escape handling follows CommonMark spec.

All escapable ASCII punctuation characters are correctly enumerated.


336-399: Thematic break vs emphasis disambiguation is well-handled.

The save/restore position pattern for speculative parsing is correct. Good handling of the edge case where IDT can match letters or underscore.


401-436: Hash, backtick, and tilde consumers are straightforward.

Clean implementation differentiating single vs triple tokens where applicable.

crates/biome_markdown_parser/src/syntax/inline.rs (8)

1-10: Clean module setup with clear documentation.

Good overview of supported inline elements.


15-40: Inline code parsing handles unclosed spans gracefully.

Correctly parses content until closing backtick, newline, or EOF. The optional closing backtick handling is appropriate for error tolerance.


45-78: Emphasis parsing is correct.

Good use of opener variable to ensure matching close marker.


80-116: Italic parsing mirrors emphasis structure.

Structurally similar to parse_inline_emphasis. If this pattern repeats further, consider extracting a helper, but for two cases it's fine.


118-165: Link parsing handles partial syntax gracefully.

Good error tolerance - incomplete [text] or missing (url) portions are handled without panicking.


167-223: Image parsing correctly requires ![ prefix.

The two-token lookahead (BANG then L_BRACK) is the right approach.


225-241: Nested inline avoids emphasis recursion - good design.

The comment at lines 228-230 explains the intentional limitation clearly.


243-258: Dispatcher routes correctly with textual fallback.

Complete coverage of inline constructs with sensible default to parse_textual.

xtask/codegen/markdown.ungram (5)

109-113: Grammar allows mismatched fence characters.

The grammar permits a block opened with backticks to close with tildes (or vice versa). This is fine since the grammar defines structure rather than semantics — the parser is responsible for enforcing that the closing fence matches the opening fence per CommonMark.

If this is intentional for maximum error tolerance, brilliant. Just worth confirming the parser validates matching fence types.


163-167: Optional closing tokens for error tolerance — sensible approach.

Making r_fence optional here (and similarly for emphasis, code spans, links, and images) allows the parser to produce a valid AST even when the user hasn't finished typing. Good call for editor integration.


188-195: MdInlineLink: Consider requiring '[' to pair with ']' in the required set.

Currently '[' is required whilst ']', '(', and ')' are all optional. This means a valid link node could be just [ followed by content — which is maximally tolerant but potentially quite lenient.

If you find false positives during testing (stray [ characters being parsed as links), you might tighten this. For now, this is a design choice rather than a defect.


200-207: MdInlineImage structure looks good.

The '!' and '[' are required (correctly identifying intent to create an image), with everything else optional for error tolerance. This mirrors the link structure and is consistent.


7-19: Legend comment present — nicely done.

As per the contribution guidelines, the legend explaining grammar syntax is included. Appreciate the clarity.

"@biomejs/biome": patch
---

Add Markdown parser support for inline elements, block quotes, lists, headers, and code blocks.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Use past tense for the summary.

Per coding guidelines, changesets should use past tense for actions taken.

🔎 Proposed fix
-Add Markdown parser support for inline elements, block quotes, lists, headers, and code blocks.
+Added Markdown parser support for inline elements, block quotes, lists, headers, and code blocks.

As per coding guidelines, changesets should use past tense for actions taken.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
Add Markdown parser support for inline elements, block quotes, lists, headers, and code blocks.
Added Markdown parser support for inline elements, block quotes, lists, headers, and code blocks.
🤖 Prompt for AI Agents
In .changeset/curly-donkeys-show.md around line 5 the changeset summary is
written in present tense ("Add Markdown parser support..."); update that summary
to past tense ("Added Markdown parser support for inline elements, block quotes,
lists, headers, and code blocks.") and save the file so the changeset follows
the repository's past-tense guideline.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added support for + bullet list marker per CommonMark spec.

Move start_trivia_pos capture before the while loop so that blank line
detection correctly checks from the start of the quote content rather
than only since the last inline parse.

// *italic*
// ^^^^^^^^
// Note: r_fence is optional to handle unclosed italic gracefully
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unusual. We usually chose to emit bogus nodes and recover for these cases.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed: Removed ? from grammar and now use p.expect() for closing delimiters, which emits diagnostics when missing.

Comment on lines 42 to 72
pub(crate) fn parse_bullet_list_item(p: &mut MarkdownParser) -> ParsedSyntax {
if !at_bullet_list_item(p) {
return Absent;
}

let item_m = p.start();
let list_m = p.start();

// Parse bullet items until we're no longer at a valid bullet marker.
// Track position for error recovery to prevent infinite loops.
loop {
if !at_bullet_list_item(p) {
break;
}

let prev_position = p.source().position();

if parse_bullet(p).is_absent() {
break;
}

// Error recovery: if we didn't advance, break to avoid infinite loop
if p.source().position() == prev_position {
// Skip the problematic token and continue
p.bump_any();
}
}

list_m.complete(p, MD_BULLET_LIST);
Present(item_m.complete(p, MD_BULLET_LIST_ITEM))
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you parse lists, it should be a ParseNodeList or ParseSeparatedList impl so you can properly implement recovery.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed: Implemented ParseNodeList for BulletList with MD_BOGUS_BULLET recovery via ParseRecoveryTokenSet.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
crates/biome_markdown_parser/src/syntax/quote.rs (2)

36-40: Consider explaining the rationale for remapping the quote marker.

The code remaps > to MD_TEXTUAL_LITERAL, making it part of the paragraph content rather than a separate quote marker node. Whilst this works, the design choice is non-obvious. Expanding the comment to explain why this approach was chosen (e.g., to preserve the marker for formatting or due to AST design constraints) would help future maintainers.


1-5: Document Phase 1 scope limitations.

The module documentation mentions "nested block elements", but the current implementation only parses inline content within a single paragraph and doesn't support nested quotes or multiple blocks. Adding a note about Phase 1 limitations (deferred: nesting, multi-line quote consolidation, multiple blocks) would set correct expectations.

Also applies to: 18-23

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5b90681 and ae590a3.

⛔ Files ignored due to path filters (1)
  • crates/biome_markdown_parser/tests/md_test_suite/ok/block_quote.md.snap is excluded by !**/*.snap and included by **
📒 Files selected for processing (1)
  • crates/biome_markdown_parser/src/syntax/quote.rs (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
crates/**/*.rs

📄 CodeRabbit inference engine (CONTRIBUTING.md)

Update inline rustdoc documentation for rules, assists, and their options when adding new features or changing existing features in Rust crates

Files:

  • crates/biome_markdown_parser/src/syntax/quote.rs
🧠 Learnings (6)
📚 Learning: 2025-11-24T18:06:03.545Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/src/**/*.rs : Parse rules must take a mutable reference to the parser as their only parameter and return a `ParsedSyntax`

Applied to files:

  • crates/biome_markdown_parser/src/syntax/quote.rs
📚 Learning: 2025-12-19T12:53:30.399Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/*analyze/src/**/*.rs : Prefix line with `#` in documentation code examples sparingly; prefer concise complete snippets

Applied to files:

  • crates/biome_markdown_parser/src/syntax/quote.rs
📚 Learning: 2025-12-19T12:53:30.399Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/*analyze/src/**/*.rs : Rule documentation code blocks must have a language defined for syntax highlighting

Applied to files:

  • crates/biome_markdown_parser/src/syntax/quote.rs
📚 Learning: 2025-12-19T12:53:30.399Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/*analyze/src/**/*.rs : Rule documentation code blocks should be ordered as language, expect_diagnostic, options/full_options/use_options, ignore, file

Applied to files:

  • crates/biome_markdown_parser/src/syntax/quote.rs
📚 Learning: 2025-11-24T18:06:03.545Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/src/**/*.rs : Implement a token source struct that wraps the lexer and implements `TokenSourceWithBufferedLexer` and `LexerWithCheckpoint` for lookahead and re-lexing capabilities

Applied to files:

  • crates/biome_markdown_parser/src/syntax/quote.rs
📚 Learning: 2025-11-24T18:06:03.545Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/src/**/*.rs : A parser struct must implement the `Parser` trait and save the token source, parser context, and optional parser options

Applied to files:

  • crates/biome_markdown_parser/src/syntax/quote.rs
🧬 Code graph analysis (1)
crates/biome_markdown_parser/src/syntax/quote.rs (1)
crates/biome_markdown_parser/src/syntax.rs (1)
  • parse_any_inline (160-162)
🔇 Additional comments (1)
crates/biome_markdown_parser/src/syntax/quote.rs (1)

24-57: Multi-line quote handling requires clarification.

The implementation treats consecutive lines with > markers as separate block quotes rather than as a single multi-line quote. This is confirmed by the test snapshot for block_quote.md, which shows the second line's > being parsed as inline textual content within the first quote block, rather than as a quote continuation marker.

In the test input:

> This is a quote
> It continues here

The expected AST includes the second > as MD_TEXTUAL_LITERAL within the inline item list of the first quote, not as a block-level marker. This differs from CommonMark handling, where each > at line start represents a block quote marker.

Is this an intentional Phase 1 limitation for basic quote support, or should continuation lines with > be recognised as proper quote structure?

Markdown support is not yet publicly available, so no changeset is needed.
Address PR review comments from dyc3:
- Implement ParseNodeList trait for MdBulletList with proper recovery
- Add MD_BOGUS_BULLET node type for error recovery
- Make closing delimiters required in grammar (remove ?)
- Use p.expect() instead of if p.at() for closing tokens
- Emits diagnostics for unclosed inline elements
@jfmcdowell jfmcdowell requested a review from dyc3 December 20, 2025 18:54
Per CommonMark spec section 5.2, bullet list markers can be -, *, or +.
This adds the missing + marker support to the lexer and parser.
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
crates/biome_markdown_parser/src/lexer/mod.rs (1)

200-251: Consider using rustdoc comments for documentation.

The escape handling logic correctly implements CommonMark's ASCII punctuation escape rules. However, the function uses standard // comments rather than /// rustdoc comments. Per coding guidelines, inline rustdoc documentation should be updated when adding new features.

🔎 Suggested documentation style
-    /// Consume a backslash escape sequence.
-    /// Per CommonMark spec, a backslash before ASCII punctuation makes it literal.
-    /// Escapable: !"#$%&'()*+,-./:;<=>?@[\]^_`{|}~
+    /// Consume a backslash escape sequence.
+    ///
+    /// Per CommonMark spec, a backslash before ASCII punctuation makes it literal.
+    /// Escapable: `!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~`
+    ///
+    /// If the backslash is not followed by an escapable character, only the backslash
+    /// is consumed and returned as a textual literal.
     fn consume_escape(&mut self) -> MarkdownSyntaxKind {

Based on learnings, inline rustdoc documentation should be maintained for features in Rust crates.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e179019 and e75b98f.

⛔ Files ignored due to path filters (3)
  • crates/biome_markdown_factory/src/generated/syntax_factory.rs is excluded by !**/generated/**, !**/generated/** and included by **
  • crates/biome_markdown_parser/tests/md_test_suite/ok/bullet_list.md.snap is excluded by !**/*.snap and included by **
  • crates/biome_markdown_syntax/src/generated/kind.rs is excluded by !**/generated/**, !**/generated/** and included by **
📒 Files selected for processing (6)
  • crates/biome_markdown_parser/src/lexer/mod.rs (4 hunks)
  • crates/biome_markdown_parser/src/lexer/tests.rs (2 hunks)
  • crates/biome_markdown_parser/src/syntax/list.rs (1 hunks)
  • crates/biome_markdown_parser/tests/md_test_suite/ok/bullet_list.md (1 hunks)
  • xtask/codegen/markdown.ungram (5 hunks)
  • xtask/codegen/src/markdown_kinds_src.rs (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • crates/biome_markdown_parser/src/syntax/list.rs
  • crates/biome_markdown_parser/src/lexer/tests.rs
🧰 Additional context used
📓 Path-based instructions (1)
crates/**/*.rs

📄 CodeRabbit inference engine (CONTRIBUTING.md)

Update inline rustdoc documentation for rules, assists, and their options when adding new features or changing existing features in Rust crates

Files:

  • crates/biome_markdown_parser/src/lexer/mod.rs
🧠 Learnings (13)
📚 Learning: 2025-12-04T13:29:49.287Z
Learnt from: dyc3
Repo: biomejs/biome PR: 8291
File: crates/biome_html_formatter/tests/specs/prettier/vue/html-vue/elastic-header.html:10-10
Timestamp: 2025-12-04T13:29:49.287Z
Learning: Files under `crates/biome_html_formatter/tests/specs/prettier` are test fixtures synced from Prettier and should not receive detailed code quality reviews (e.g., HTTP vs HTTPS, formatting suggestions, etc.). These files are test data meant to validate formatter behavior and should be preserved as-is.

Applied to files:

  • crates/biome_markdown_parser/tests/md_test_suite/ok/bullet_list.md
📚 Learning: 2025-11-24T18:06:03.545Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/language_kind.rs : Add a new variant to `LanguageKind` enum in `language_kind.rs` file and implement all methods for the new language variant

Applied to files:

  • xtask/codegen/src/markdown_kinds_src.rs
📚 Learning: 2025-11-24T18:06:03.545Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/language_kind.rs : Add a new language prefix to the `LANGUAGE_PREFIXES` constant in `language_kind.rs` file

Applied to files:

  • xtask/codegen/src/markdown_kinds_src.rs
📚 Learning: 2025-11-24T18:06:03.545Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/lexer/mod.rs : Implement a `Lexer` trait from `biome_parser` crate for the lexer struct that consumes characters from source code and emits tokens

Applied to files:

  • crates/biome_markdown_parser/src/lexer/mod.rs
📚 Learning: 2025-11-24T18:06:03.545Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/src/**/*.rs : Implement a token source struct that wraps the lexer and implements `TokenSourceWithBufferedLexer` and `LexerWithCheckpoint` for lookahead and re-lexing capabilities

Applied to files:

  • crates/biome_markdown_parser/src/lexer/mod.rs
📚 Learning: 2025-11-24T18:06:03.545Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/src/**/*.rs : Use `p.eat(token)` for optional tokens, `p.expect(token)` for required tokens, `parse_rule(p).ok(p)` for optional nodes, and `parse_rule(p).or_add_diagnostic(p, error)` for required nodes

Applied to files:

  • crates/biome_markdown_parser/src/lexer/mod.rs
  • xtask/codegen/markdown.ungram
📚 Learning: 2025-11-24T18:05:27.810Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_js_formatter/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:05:27.810Z
Learning: Applies to crates/biome_js_formatter/**/*.rs : When formatting AST nodes, use mandatory tokens from the AST instead of hardcoding token strings (e.g., use `node.l_paren_token().format()` instead of `token("(")`)

Applied to files:

  • crates/biome_markdown_parser/src/lexer/mod.rs
📚 Learning: 2025-12-12T10:11:05.564Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: CONTRIBUTING.md:0-0
Timestamp: 2025-12-12T10:11:05.564Z
Learning: Applies to crates/**/*.rs : Update inline rustdoc documentation for rules, assists, and their options when adding new features or changing existing features in Rust crates

Applied to files:

  • crates/biome_markdown_parser/src/lexer/mod.rs
  • xtask/codegen/markdown.ungram
📚 Learning: 2025-11-24T18:06:03.545Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/src/**/*.rs : Implement error recovery in list parsing using `or_recover()` to wrap unparseable tokens in a `BOGUS_*` node and consume tokens until a recovery token is found

Applied to files:

  • crates/biome_markdown_parser/src/lexer/mod.rs
📚 Learning: 2025-11-09T12:47:46.298Z
Learnt from: ematipico
Repo: biomejs/biome PR: 8031
File: crates/biome_html_parser/src/syntax/svelte.rs:140-147
Timestamp: 2025-11-09T12:47:46.298Z
Learning: In the Biome HTML parser, `expect` and `expect_with_context` consume the current token and then lex the next token. The context parameter in `expect_with_context` controls how the next token (after the consumed one) is lexed, not the current token being consumed. For example, in Svelte parsing, after `bump_with_context(T!["{:"], HtmlLexContext::Svelte)`, the next token is already lexed in the Svelte context, so `expect(T![else])` is sufficient unless the token after `else` also needs to be lexed in a specific context.

Applied to files:

  • crates/biome_markdown_parser/src/lexer/mod.rs
  • xtask/codegen/markdown.ungram
📚 Learning: 2025-11-24T18:06:03.545Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/*.ungram : Add a legend comment to `.ungram` files explaining the grammar syntax including comments, non-terminals, tokens, sequences, alternations, repetitions, and optional elements

Applied to files:

  • xtask/codegen/markdown.ungram
📚 Learning: 2025-12-12T10:11:05.564Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: CONTRIBUTING.md:0-0
Timestamp: 2025-12-12T10:11:05.564Z
Learning: Applies to .changeset/*.md : Write changesets that are concise (1-3 sentences), user-focused, use past tense for actions taken and present tense for Biome behavior, include code examples for rules, and end sentences with periods

Applied to files:

  • xtask/codegen/markdown.ungram
📚 Learning: 2025-11-24T18:06:03.545Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Preserve valid tree structure during error recovery by marking invalid parts as 'bogus' nodes rather than invalidating parent nodes

Applied to files:

  • xtask/codegen/markdown.ungram
🧬 Code graph analysis (1)
crates/biome_markdown_parser/src/lexer/mod.rs (2)
crates/biome_unicode_table/src/lib.rs (1)
  • lookup_byte (106-109)
crates/biome_yaml_parser/src/lexer/mod.rs (1)
  • start_position (872-874)
🔇 Additional comments (13)
xtask/codegen/src/markdown_kinds_src.rs (1)

16-16: LGTM! Token and node additions align with CommonMark support.

The PLUS token enables + bullet lists, TRIPLE_TILDE supports tilde-fenced code blocks (~~~), and MD_BOGUS_BULLET provides error recovery for malformed bullets. All additions are consistent with the PR objectives.

Also applies to: 21-21, 43-43

xtask/codegen/markdown.ungram (6)

105-113: Approve tilde fence support; verify content type is addressed.

The addition of tilde-fenced code blocks (~~~) aligns with CommonMark. However, the content type issue flagged separately must be resolved.


115-115: LGTM! Simplified grammar.

Removing the label simplifies the grammar without losing functionality.


134-134: LGTM! Plus bullet support added.

The + bullet marker aligns with CommonMark and is consistent with the PLUS token addition.


144-144: Reasonable to make hard line optional.

Not all paragraphs end with explicit hard line breaks, so making this optional is sensible.


193-202: LGTM! Simplified inline image structure.

Removing intermediate nodes (MdInlineImageAlt, MdInlineImageSource, MdInlineImageLink) and using MdInlineItemList directly simplifies the grammar whilst maintaining the necessary structure for image alt text and source.


97-99: [Rewritten review comment]
[Classification tag]

crates/biome_markdown_parser/tests/md_test_suite/ok/bullet_list.md (1)

1-9: Test fixture looks good.

The bullet list test data appropriately covers the three common Markdown bullet markers (-, *, +) with proper formatting and blank line separation. This should provide good coverage for bullet list parsing validation.

crates/biome_markdown_parser/src/lexer/mod.rs (5)

1-14: Import structure looks solid.

The updated imports correctly bring in the dispatch mechanism and lookup_byte function needed for the expanded token routing. Documentation update appropriately reflects the Markdown-specific implementation.


180-198: Token dispatch routing is well-structured.

The dispatch logic properly routes each token type to its appropriate handler, with simple single-character tokens using the consume_byte helper and complex constructs (thematic breaks, escapes) delegated to dedicated parsing functions.


402-437: Token consumption logic is sound.

The three token consumers (hash, backtick, tilde) correctly handle their respective characters. The backtick and tilde handlers appropriately distinguish between single and triple variants using lookahead, which aligns with CommonMark's fenced code block syntax.


503-507: Helper function is fit for purpose.

The consume_byte helper cleanly handles single-byte token consumption and is actively used throughout the dispatch logic.


382-390: No action required—the implementation correctly reflects the CommonMark specification.

The code tokenises ** and __ as DOUBLE_STAR and DOUBLE_UNDERSCORE respectively for strong emphasis, whilst -- is correctly tokenised as MINUS. This is not an inconsistency; it reflects the markdown spec: dashes are not emphasis markers, only asterisks and underscores are. The comment at line 387 is accurate.

@ematipico
Copy link
Member

Thank you @jfmcdowell

Please review the implementation of the parser. While I understand you used AI assistance, a human intervention is always required.

Here's an example: check the emitted CST in the snapshots. For each letter inside a paragraph, we create a node. This implantation will create an exponentially big CST for large documents. That's not acceptable for us and the users. Please review the parsing logic so that the emitted CST groups text into a single node.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
crates/biome_markdown_parser/src/lexer/tests.rs (1)

166-184: Consider consolidating hash tests or testing the full header use case.

The hash_token and multiple_hashes tests verify that hashes are emitted individually, which is correct for ATX header parsing. However, it would be valuable to add a test covering a complete header scenario (e.g., "# Title") to ensure the combination works as expected in practice.

crates/biome_markdown_parser/src/lexer/mod.rs (1)

409-437: Code fences support 3+ characters per CommonMark spec—consider extending beyond the current hard-coded triple.

CommonMark requires code fences to be at least three consecutive backticks or tildes, and using four backticks to wrap three is perfectly valid for nesting. The current implementation handles exactly 1 or 3 characters; supporting variable lengths (4, 5, etc.) would allow proper code block nesting without resorting to tildes.

For Phase 1 this is acceptable, but worth addressing in a follow-up.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e75b98f and a7c2a67.

⛔ Files ignored due to path filters (7)
  • crates/biome_markdown_parser/tests/md_test_suite/ok/block_quote.md.snap is excluded by !**/*.snap and included by **
  • crates/biome_markdown_parser/tests/md_test_suite/ok/bullet_list.md.snap is excluded by !**/*.snap and included by **
  • crates/biome_markdown_parser/tests/md_test_suite/ok/fenced_code_block.md.snap is excluded by !**/*.snap and included by **
  • crates/biome_markdown_parser/tests/md_test_suite/ok/header.md.snap is excluded by !**/*.snap and included by **
  • crates/biome_markdown_parser/tests/md_test_suite/ok/indent_code_block.md.snap is excluded by !**/*.snap and included by **
  • crates/biome_markdown_parser/tests/md_test_suite/ok/inline_elements.md.snap is excluded by !**/*.snap and included by **
  • crates/biome_markdown_parser/tests/md_test_suite/ok/paragraph.md.snap is excluded by !**/*.snap and included by **
📒 Files selected for processing (2)
  • crates/biome_markdown_parser/src/lexer/mod.rs (5 hunks)
  • crates/biome_markdown_parser/src/lexer/tests.rs (2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
crates/**/*.rs

📄 CodeRabbit inference engine (CONTRIBUTING.md)

Update inline rustdoc documentation for rules, assists, and their options when adding new features or changing existing features in Rust crates

Files:

  • crates/biome_markdown_parser/src/lexer/tests.rs
  • crates/biome_markdown_parser/src/lexer/mod.rs
🧠 Learnings (18)
📓 Common learnings
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/src/**/*.rs : Implement a token source struct that wraps the lexer and implements `TokenSourceWithBufferedLexer` and `LexerWithCheckpoint` for lookahead and re-lexing capabilities
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/*.ungram : Add a legend comment to `.ungram` files explaining the grammar syntax including comments, non-terminals, tokens, sequences, alternations, repetitions, and optional elements
Learnt from: CR
Repo: biomejs/biome PR: 0
File: CONTRIBUTING.md:0-0
Timestamp: 2025-12-12T10:11:05.564Z
Learning: Applies to crates/**/*.rs : Update inline rustdoc documentation for rules, assists, and their options when adding new features or changing existing features in Rust crates
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/lexer/mod.rs : Implement a `Lexer` trait from `biome_parser` crate for the lexer struct that consumes characters from source code and emits tokens
📚 Learning: 2025-11-24T18:06:03.545Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/src/**/*.rs : Implement a token source struct that wraps the lexer and implements `TokenSourceWithBufferedLexer` and `LexerWithCheckpoint` for lookahead and re-lexing capabilities

Applied to files:

  • crates/biome_markdown_parser/src/lexer/tests.rs
  • crates/biome_markdown_parser/src/lexer/mod.rs
📚 Learning: 2025-11-24T18:05:20.371Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_formatter/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:05:20.371Z
Learning: Applies to crates/biome_formatter/**/biome_*_formatter/tests/language.rs : Implement `TestFormatLanguage` trait in `tests/language.rs` for the formatter's test language

Applied to files:

  • crates/biome_markdown_parser/src/lexer/tests.rs
📚 Learning: 2025-11-24T18:06:03.545Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/lexer/mod.rs : Implement a `Lexer` trait from `biome_parser` crate for the lexer struct that consumes characters from source code and emits tokens

Applied to files:

  • crates/biome_markdown_parser/src/lexer/tests.rs
  • crates/biome_markdown_parser/src/lexer/mod.rs
📚 Learning: 2025-11-24T18:06:12.048Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_service/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:12.048Z
Learning: Applies to crates/biome_service/src/workspace/watcher.tests.rs : Implement watcher tests for workspace methods in watcher.tests.rs and end-to-end tests in LSP tests

Applied to files:

  • crates/biome_markdown_parser/src/lexer/tests.rs
📚 Learning: 2025-12-12T10:11:05.564Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: CONTRIBUTING.md:0-0
Timestamp: 2025-12-12T10:11:05.564Z
Learning: Applies to crates/**/*.rs : Update inline rustdoc documentation for rules, assists, and their options when adding new features or changing existing features in Rust crates

Applied to files:

  • crates/biome_markdown_parser/src/lexer/tests.rs
  • crates/biome_markdown_parser/src/lexer/mod.rs
📚 Learning: 2025-11-24T18:05:20.371Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_formatter/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:05:20.371Z
Learning: Applies to crates/biome_formatter/**/biome_*_formatter/tests/spec_tests.rs : Use the `tests_macros::gen_tests!` macro in `spec_tests.rs` to generate test functions for each specification file matching the pattern `tests/specs/<language>/**/*.<ext>`

Applied to files:

  • crates/biome_markdown_parser/src/lexer/tests.rs
📚 Learning: 2025-11-24T18:05:27.810Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_js_formatter/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:05:27.810Z
Learning: Applies to crates/biome_js_formatter/**/*.rs : When formatting AST nodes, use mandatory tokens from the AST instead of hardcoding token strings (e.g., use `node.l_paren_token().format()` instead of `token("(")`)

Applied to files:

  • crates/biome_markdown_parser/src/lexer/tests.rs
  • crates/biome_markdown_parser/src/lexer/mod.rs
📚 Learning: 2025-12-04T13:29:49.287Z
Learnt from: dyc3
Repo: biomejs/biome PR: 8291
File: crates/biome_html_formatter/tests/specs/prettier/vue/html-vue/elastic-header.html:10-10
Timestamp: 2025-12-04T13:29:49.287Z
Learning: Files under `crates/biome_html_formatter/tests/specs/prettier` are test fixtures synced from Prettier and should not receive detailed code quality reviews (e.g., HTTP vs HTTPS, formatting suggestions, etc.). These files are test data meant to validate formatter behavior and should be preserved as-is.

Applied to files:

  • crates/biome_markdown_parser/src/lexer/tests.rs
📚 Learning: 2025-11-24T18:05:20.371Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_formatter/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:05:20.371Z
Learning: Create test infrastructure with `tests/specs` folder structure and `spec_test.rs`, `spec_tests.rs`, and `language.rs` files in test directories

Applied to files:

  • crates/biome_markdown_parser/src/lexer/tests.rs
📚 Learning: 2025-12-19T12:53:30.399Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/*analyze/src/**/*.rs : Avoid string allocations by comparing against `&str` or using `TokenText`

Applied to files:

  • crates/biome_markdown_parser/src/lexer/tests.rs
📚 Learning: 2025-11-24T18:06:03.545Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/src/**/*.rs : Use `p.eat(token)` for optional tokens, `p.expect(token)` for required tokens, `parse_rule(p).ok(p)` for optional nodes, and `parse_rule(p).or_add_diagnostic(p, error)` for required nodes

Applied to files:

  • crates/biome_markdown_parser/src/lexer/tests.rs
  • crates/biome_markdown_parser/src/lexer/mod.rs
📚 Learning: 2025-11-09T12:47:46.298Z
Learnt from: ematipico
Repo: biomejs/biome PR: 8031
File: crates/biome_html_parser/src/syntax/svelte.rs:140-147
Timestamp: 2025-11-09T12:47:46.298Z
Learning: In the Biome HTML parser, `expect` and `expect_with_context` consume the current token and then lex the next token. The context parameter in `expect_with_context` controls how the next token (after the consumed one) is lexed, not the current token being consumed. For example, in Svelte parsing, after `bump_with_context(T!["{:"], HtmlLexContext::Svelte)`, the next token is already lexed in the Svelte context, so `expect(T![else])` is sufficient unless the token after `else` also needs to be lexed in a specific context.

Applied to files:

  • crates/biome_markdown_parser/src/lexer/tests.rs
  • crates/biome_markdown_parser/src/lexer/mod.rs
📚 Learning: 2025-11-24T18:06:03.545Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/*.ungram : Add a legend comment to `.ungram` files explaining the grammar syntax including comments, non-terminals, tokens, sequences, alternations, repetitions, and optional elements

Applied to files:

  • crates/biome_markdown_parser/src/lexer/mod.rs
📚 Learning: 2025-11-24T18:06:03.545Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/src/**/*.rs : Implement error recovery in list parsing using `or_recover()` to wrap unparseable tokens in a `BOGUS_*` node and consume tokens until a recovery token is found

Applied to files:

  • crates/biome_markdown_parser/src/lexer/mod.rs
📚 Learning: 2025-12-19T12:53:30.399Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/*analyze/src/**/*.rs : Prefix line with `#` in documentation code examples sparingly; prefer concise complete snippets

Applied to files:

  • crates/biome_markdown_parser/src/lexer/mod.rs
📚 Learning: 2025-11-24T18:05:27.810Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_js_formatter/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:05:27.810Z
Learning: Applies to crates/biome_js_formatter/**/*.rs : For tokens that are not mandatory, use helper functions instead of hardcoding

Applied to files:

  • crates/biome_markdown_parser/src/lexer/mod.rs
📚 Learning: 2025-11-24T18:06:03.545Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/src/**/*.rs : A parser struct must implement the `Parser` trait and save the token source, parser context, and optional parser options

Applied to files:

  • crates/biome_markdown_parser/src/lexer/mod.rs
🧬 Code graph analysis (1)
crates/biome_markdown_parser/src/lexer/mod.rs (1)
crates/biome_unicode_table/src/lib.rs (1)
  • lookup_byte (106-109)
🔇 Additional comments (7)
crates/biome_markdown_parser/src/lexer/tests.rs (3)

364-392: LGTM!

The escape sequence tests correctly verify CommonMark escape handling: punctuation escapes are grouped with their backslash, while non-punctuation escapes leave the backslash as a standalone token.


232-240: Good coverage for textual grouping.

These tests directly verify that consecutive text characters are batched into single MD_TEXTUAL_LITERAL tokens rather than per-character nodes — precisely what was requested in the PR feedback regarding CST size.

Also applies to: 262-273, 299-308


143-164: LGTM!

Thematic break tests cover the key CommonMark variations including spaced markers, confirming that internal whitespace is included in the literal token.

crates/biome_markdown_parser/src/lexer/mod.rs (4)

200-251: LGTM!

The escape handling correctly implements CommonMark's backslash escape rules with the full set of escapable ASCII punctuation characters.


493-540: Core fix for CST bloat — well implemented.

This grouping logic directly addresses the reviewer concern about per-letter nodes. By batching consecutive non-special characters into single MD_TEXTUAL_LITERAL tokens, the CST size becomes linear rather than exponential with document length.

The dispatch-based boundary detection is thorough and correctly handles multi-byte UTF-8 via advance_char_unchecked().


13-14: LGTM!

The dispatch-based token routing is clean and efficient. Using lookup_byte for character classification keeps the lexer fast while supporting the full Markdown token set.

Also applies to: 180-198


345-352: LGTM!

Good defensive check — IDT dispatch covers both letters and underscore, so explicitly filtering for b'_' prevents letters from being mishandled as emphasis markers.

Comment on lines 381 to 390
// Check for double emphasis markers (**, __, --)
if self.peek_byte() == Some(start_char) {
self.advance(2);
return match start_char {
b'*' => DOUBLE_STAR,
b'_' => DOUBLE_UNDERSCORE,
b'-' => MINUS, // No DOUBLE_MINUS in grammar, use MINUS
_ => unreachable!(),
};
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Potential inconsistency: -- returns MINUS with length 2.

When encountering --, this advances by 2 bytes but returns MINUS, which is typically a single-character token. This creates an inconsistency where the same token kind can have different lengths (1 or 2), which may confuse downstream parsing or source reconstruction.

Consider either:

  1. Adding a DOUBLE_MINUS token to the grammar, or
  2. Returning two separate MINUS tokens by only advancing 1 byte here.
🔎 Option 2: Return single MINUS for consistency
-        // Check for double emphasis markers (**, __, --)
-        if self.peek_byte() == Some(start_char) {
-            self.advance(2);
-            return match start_char {
-                b'*' => DOUBLE_STAR,
-                b'_' => DOUBLE_UNDERSCORE,
-                b'-' => MINUS, // No DOUBLE_MINUS in grammar, use MINUS
-                _ => unreachable!(),
-            };
-        }
+        // Check for double emphasis markers (**, __)
+        // Note: -- is not doubled; let it fall through to single MINUS
+        if self.peek_byte() == Some(start_char) && start_char != b'-' {
+            self.advance(2);
+            return match start_char {
+                b'*' => DOUBLE_STAR,
+                b'_' => DOUBLE_UNDERSCORE,
+                _ => unreachable!(),
+            };
+        }
🤖 Prompt for AI Agents
In crates/biome_markdown_parser/src/lexer/mod.rs around lines 381–390, the lexer
treats `--` by advancing 2 bytes but returns a single-character MINUS token,
causing token-length inconsistency; change the logic so that when start_char ==
b'-' you only advance(1) and return MINUS (while keeping advance(2) for `*` and
`_`), ensuring `--` will be tokenized as two separate MINUS tokens; update or
add lexer tests to cover `-` vs `--` cases and adjust any downstream assumptions
about MINUS token length if necessary.

@jfmcdowell
Copy link
Author

Thank you @jfmcdowell

Please review the implementation of the parser. While I understand you used AI assistance, a human intervention is always required.

Here's an example: check the emitted CST in the snapshots. For each letter inside a paragraph, we create a node. This implantation will create an exponentially big CST for large documents. That's not acceptable for us and the users. Please review the parsing logic so that the emitted CST groups text into a single node.

Fixed the issue where each character in a paragraph created a separate MdTextual node. The lexer's
consume_textual() now groups consecutive non-special characters into a single MD_TEXTUAL_LITERAL token.

Copy link
Member

@ematipico ematipico left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's better, but I think we need to change things.

Compared to other languages, markdown is purely markup. This means that: trivia don't exist, exception for some rare cases.

The markdown lexer needs to treat whitespaces and newlines as part of the language:

  • whitespaces are part of a text
  • newlines become relevant during parsing

Modified the lexer's consume_textual() to consume multiple consecutive
non-special characters in a single token, instead of creating one token
per character. This dramatically reduces CST node count for large documents.

Before: "This is a paragraph." → 27 MdTextual nodes (one per character)
After: "This is a paragraph." → 5 MdTextual nodes (one per word)

Special characters that stop grouping: * - _ + # ` ~ > ! [ ] ( ) \ and whitespace
@jfmcdowell jfmcdowell force-pushed the feat/markdown-support-next branch from a7c2a67 to 1102636 Compare December 21, 2025 12:04
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (2)
crates/biome_markdown_parser/src/lexer/mod.rs (2)

211-243: Consider refactoring the escapable character match for maintainability.

The 33-line match against escapable ASCII punctuation is difficult to read and maintain. Consider extracting this into a helper method or using a lookup approach.

🔎 Example refactor using a helper function

Add a helper method:

/// Returns true if the byte is an ASCII punctuation character that can be escaped
/// per the CommonMark spec: !"#$%&'()*+,-./:;<=>?@[\]^_`{|}~
#[inline]
fn is_escapable_ascii_punctuation(byte: u8) -> bool {
    matches!(
        byte,
        b'!' | b'"' | b'#' | b'$' | b'%' | b'&' | b'\'' 
        | b'(' | b')' | b'*' | b'+' | b',' | b'-' | b'.' | b'/'
        | b':' | b';' | b'<' | b'=' | b'>' | b'?' | b'@'
        | b'[' | b'\\' | b']' | b'^' | b'_' | b'`'
        | b'{' | b'|' | b'}' | b'~'
    )
}

Then simplify consume_escape:

     // Check if next character is escapable ASCII punctuation
-    if let Some(next) = self.current_byte()
-        && matches!(
-            next,
-            b'!' | b'"'
-                | b'#'
-                | b'$'
-                | b'%'
-                | b'&'
-                | b'\''
-                | b'('
-                | b')'
-                | b'*'
-                | b'+'
-                | b','
-                | b'-'
-                | b'.'
-                | b'/'
-                | b':'
-                | b';'
-                | b'<'
-                | b'='
-                | b'>'
-                | b'?'
-                | b'@'
-                | b'['
-                | b'\\'
-                | b']'
-                | b'^'
-                | b'_'
-                | b'`'
-                | b'{'
-                | b'|'
-                | b'}'
-                | b'~'
-        )
-    {
+    if let Some(next) = self.current_byte() && Self::is_escapable_ascii_punctuation(next) {
         // Consume the escaped character too
         self.advance(1);
     }

345-352: IDT dispatch handling could be more maintainable.

The IDT dispatch matches both letters (A-Z, a-z) and underscore, but only underscore is a Markdown emphasis marker. This special-case handling is duplicated in consume_textual (lines 526-531), making the pattern less obvious and potentially error-prone during future maintenance.

Consider adding a comment explaining why IDT requires special handling, or extracting the underscore check into a helper method for clarity.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a7c2a67 and 1102636.

⛔ Files ignored due to path filters (7)
  • crates/biome_markdown_parser/tests/md_test_suite/ok/block_quote.md.snap is excluded by !**/*.snap and included by **
  • crates/biome_markdown_parser/tests/md_test_suite/ok/bullet_list.md.snap is excluded by !**/*.snap and included by **
  • crates/biome_markdown_parser/tests/md_test_suite/ok/fenced_code_block.md.snap is excluded by !**/*.snap and included by **
  • crates/biome_markdown_parser/tests/md_test_suite/ok/header.md.snap is excluded by !**/*.snap and included by **
  • crates/biome_markdown_parser/tests/md_test_suite/ok/indent_code_block.md.snap is excluded by !**/*.snap and included by **
  • crates/biome_markdown_parser/tests/md_test_suite/ok/inline_elements.md.snap is excluded by !**/*.snap and included by **
  • crates/biome_markdown_parser/tests/md_test_suite/ok/paragraph.md.snap is excluded by !**/*.snap and included by **
📒 Files selected for processing (2)
  • crates/biome_markdown_parser/src/lexer/mod.rs (5 hunks)
  • crates/biome_markdown_parser/src/lexer/tests.rs (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • crates/biome_markdown_parser/src/lexer/tests.rs
🧰 Additional context used
📓 Path-based instructions (1)
crates/**/*.rs

📄 CodeRabbit inference engine (CONTRIBUTING.md)

Update inline rustdoc documentation for rules, assists, and their options when adding new features or changing existing features in Rust crates

Files:

  • crates/biome_markdown_parser/src/lexer/mod.rs
🧠 Learnings (11)
📓 Common learnings
Learnt from: CR
Repo: biomejs/biome PR: 0
File: CONTRIBUTING.md:0-0
Timestamp: 2025-12-12T10:11:05.564Z
Learning: Applies to crates/**/*.rs : Update inline rustdoc documentation for rules, assists, and their options when adding new features or changing existing features in Rust crates
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/*.ungram : Add a legend comment to `.ungram` files explaining the grammar syntax including comments, non-terminals, tokens, sequences, alternations, repetitions, and optional elements
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/*analyze/src/**/*.rs : Prefix line with `#` in documentation code examples sparingly; prefer concise complete snippets
📚 Learning: 2025-11-24T18:06:03.545Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/lexer/mod.rs : Implement a `Lexer` trait from `biome_parser` crate for the lexer struct that consumes characters from source code and emits tokens

Applied to files:

  • crates/biome_markdown_parser/src/lexer/mod.rs
📚 Learning: 2025-11-24T18:06:03.545Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/src/**/*.rs : Implement a token source struct that wraps the lexer and implements `TokenSourceWithBufferedLexer` and `LexerWithCheckpoint` for lookahead and re-lexing capabilities

Applied to files:

  • crates/biome_markdown_parser/src/lexer/mod.rs
📚 Learning: 2025-11-24T18:06:03.545Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/src/**/*.rs : Use `p.eat(token)` for optional tokens, `p.expect(token)` for required tokens, `parse_rule(p).ok(p)` for optional nodes, and `parse_rule(p).or_add_diagnostic(p, error)` for required nodes

Applied to files:

  • crates/biome_markdown_parser/src/lexer/mod.rs
📚 Learning: 2025-12-12T10:11:05.564Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: CONTRIBUTING.md:0-0
Timestamp: 2025-12-12T10:11:05.564Z
Learning: Applies to crates/**/*.rs : Update inline rustdoc documentation for rules, assists, and their options when adding new features or changing existing features in Rust crates

Applied to files:

  • crates/biome_markdown_parser/src/lexer/mod.rs
📚 Learning: 2025-11-24T18:05:27.810Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_js_formatter/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:05:27.810Z
Learning: Applies to crates/biome_js_formatter/**/*.rs : When formatting AST nodes, use mandatory tokens from the AST instead of hardcoding token strings (e.g., use `node.l_paren_token().format()` instead of `token("(")`)

Applied to files:

  • crates/biome_markdown_parser/src/lexer/mod.rs
📚 Learning: 2025-12-19T12:53:30.399Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/*analyze/src/**/*.rs : Prefix line with `#` in documentation code examples sparingly; prefer concise complete snippets

Applied to files:

  • crates/biome_markdown_parser/src/lexer/mod.rs
📚 Learning: 2025-11-24T18:06:03.545Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/*.ungram : Add a legend comment to `.ungram` files explaining the grammar syntax including comments, non-terminals, tokens, sequences, alternations, repetitions, and optional elements

Applied to files:

  • crates/biome_markdown_parser/src/lexer/mod.rs
📚 Learning: 2025-11-24T18:06:03.545Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/src/**/*.rs : Implement error recovery in list parsing using `or_recover()` to wrap unparseable tokens in a `BOGUS_*` node and consume tokens until a recovery token is found

Applied to files:

  • crates/biome_markdown_parser/src/lexer/mod.rs
📚 Learning: 2025-12-19T12:53:30.399Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/*analyze/src/**/*.rs : Avoid string allocations by comparing against `&str` or using `TokenText`

Applied to files:

  • crates/biome_markdown_parser/src/lexer/mod.rs
📚 Learning: 2025-11-09T12:47:46.298Z
Learnt from: ematipico
Repo: biomejs/biome PR: 8031
File: crates/biome_html_parser/src/syntax/svelte.rs:140-147
Timestamp: 2025-11-09T12:47:46.298Z
Learning: In the Biome HTML parser, `expect` and `expect_with_context` consume the current token and then lex the next token. The context parameter in `expect_with_context` controls how the next token (after the consumed one) is lexed, not the current token being consumed. For example, in Svelte parsing, after `bump_with_context(T!["{:"], HtmlLexContext::Svelte)`, the next token is already lexed in the Svelte context, so `expect(T![else])` is sufficient unless the token after `else` also needs to be lexed in a specific context.

Applied to files:

  • crates/biome_markdown_parser/src/lexer/mod.rs
🧬 Code graph analysis (1)
crates/biome_markdown_parser/src/lexer/mod.rs (1)
crates/biome_unicode_table/src/lib.rs (1)
  • lookup_byte (106-109)
🔇 Additional comments (7)
crates/biome_markdown_parser/src/lexer/mod.rs (7)

1-14: Documentation and imports updated appropriately.

The module documentation now accurately describes the Markdown lexer, and the import changes support the dispatch-based token handling.


180-198: Dispatch table correctly routes Markdown tokens.

The expanded dispatcher appropriately handles all Markdown syntax characters, routing them to dedicated consumption methods.


381-390: The -- token-length inconsistency has been resolved.

The past review concern about -- advancing 2 bytes whilst returning a single-character MINUS token has been addressed. Line 383 now explicitly excludes - from double-marker treatment, so each hyphen in -- is lexed separately as an individual MINUS token with consistent length.


493-540: Textual batching successfully addresses CST bloat concern.

This implementation groups consecutive textual characters into a single MD_TEXTUAL_LITERAL token, directly addressing the issue raised in the PR objectives where each letter was previously emitted as a separate node. The dispatch-based boundary detection correctly identifies when to stop batching, maintaining accurate parsing whilst improving efficiency.


258-262: Inconsistency with AI-generated summary.

The AI summary states "Removed the dead_code attribute from the eat_byte method", but the #[expect(dead_code)] attribute is still present at line 258. If this method is now used, the attribute should be removed; otherwise, the summary is incorrect.


409-422: Convert comment to proper rustdoc format.

The comment at line 409 should use /// for rustdoc instead of // to be included in generated documentation.

As per coding guidelines, update inline rustdoc documentation when adding new features.

⛔ Skipped due to learnings
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/*analyze/src/**/*.rs : Prefix line with `#` in documentation code examples sparingly; prefer concise complete snippets
Learnt from: CR
Repo: biomejs/biome PR: 0
File: CONTRIBUTING.md:0-0
Timestamp: 2025-12-12T10:11:05.564Z
Learning: Applies to crates/**/*.rs : Update inline rustdoc documentation for rules, assists, and their options when adding new features or changing existing features in Rust crates
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/*analyze/src/**/*.rs : Lines prefixed with `#` in rule documentation code examples will be hidden from output
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/*analyze/src/**/*.rs : Rule documentation code blocks should be ordered as language, expect_diagnostic, options/full_options/use_options, ignore, file
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/*analyze/src/**/*.rs : The first paragraph of rule documentation must be a single line describing what the rule does
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/*.ungram : Add a legend comment to `.ungram` files explaining the grammar syntax including comments, non-terminals, tokens, sequences, alternations, repetitions, and optional elements
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/*analyze/src/**/*.rs : Use `options` code block property for rule-specific configuration snippets in documentation
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/*analyze/src/**/*.rs : Rule documentation code blocks must have a language defined for syntax highlighting
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/*analyze/src/**/*.rs : Invalid code examples in rule documentation must be marked with `expect_diagnostic` code block property
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/*analyze/src/**/*.rs : Rule documentation must include `## Options` section if the rule has options

493-494: Convert comment to proper rustdoc format.

The comments at lines 493-494 should use /// for rustdoc instead of // to be included in generated documentation.

As per coding guidelines, update inline rustdoc documentation when adding new features.

⛔ Skipped due to learnings
Learnt from: CR
Repo: biomejs/biome PR: 0
File: CONTRIBUTING.md:0-0
Timestamp: 2025-12-12T10:11:05.564Z
Learning: Applies to crates/**/*.rs : Update inline rustdoc documentation for rules, assists, and their options when adding new features or changing existing features in Rust crates
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/*analyze/src/**/*.rs : Prefix line with `#` in documentation code examples sparingly; prefer concise complete snippets
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/*analyze/src/**/*.rs : Invalid code examples in rule documentation must be marked with `expect_diagnostic` code block property
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/*analyze/src/**/*.rs : Use `options` code block property for rule-specific configuration snippets in documentation
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/*analyze/src/**/*.rs : Use `use_options` code block property for code examples that follow an options configuration in documentation
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/*analyze/src/**/*.rs : Lines prefixed with `#` in rule documentation code examples will be hidden from output
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_formatter/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:05:20.371Z
Learning: Applies to crates/biome_formatter/**/biome_*_formatter/src/comments.rs : Define `<Language>CommentStyle` as a public type alias for `Comments<<Language>Language>` in a `comments.rs` file
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/*analyze/src/**/*.rs : Rule documentation must include `## Examples` section with `### Invalid` and `### Valid` subsections
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/*analyze/src/**/*.rs : Rule documentation code blocks should be ordered as language, expect_diagnostic, options/full_options/use_options, ignore, file
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/*analyze/src/**/*.rs : Rule documentation must include `## Options` section if the rule has options

Comment on lines +200 to +202
/// Consume a backslash escape sequence.
/// Per CommonMark spec, a backslash before ASCII punctuation makes it literal.
/// Escapable: !"#$%&'()*+,-./:;<=>?@[\]^_`{|}~
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Add rustdoc documentation.

This method lacks documentation. Per coding guidelines, rustdoc should be added when introducing new features. Consider documenting the CommonMark escape behaviour and which characters are escapable.

As per coding guidelines, update inline rustdoc documentation when adding new features.

🤖 Prompt for AI Agents
In crates/biome_markdown_parser/src/lexer/mod.rs around lines 200 to 202, the
method that consumes a backslash escape sequence is missing rustdoc; add a
triple-slash doc comment above the method that briefly states its purpose
(consume a backslash escape per CommonMark), describes the behavior (a backslash
before ASCII punctuation makes that character literal), lists or references the
set of escapable characters (e.g. !"#$%&'()*+,-./:;<=>?@[\]^_`{|}~), and include
a short note pointing to the CommonMark spec and a tiny usage example or
edge-case note (e.g., that non-ASCII characters are not escaped). Ensure the
wording is concise and follows project rustdoc style.

Comment on lines +337 to +339
/// Consumes thematic break literal or returns emphasis marker tokens.
/// Called when we see *, -, or _.
fn consume_thematic_break_or_emphasis(&mut self, dispatched: Dispatch) -> MarkdownSyntaxKind {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Add rustdoc documentation.

This method lacks documentation explaining when it returns a thematic break versus emphasis tokens, and the special handling for the IDT dispatch case.

As per coding guidelines, update inline rustdoc documentation when adding new features.

Comment on lines +402 to +407
/// Consume a single hash character for ATX headers
fn consume_hash(&mut self) -> MarkdownSyntaxKind {
self.assert_at_char_boundary();
self.advance(1);
HASH
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Add rustdoc documentation.

Brief documentation explaining that this consumes a single hash character for ATX headers would improve code clarity.

As per coding guidelines, update inline rustdoc documentation when adding new features.

🤖 Prompt for AI Agents
In crates/biome_markdown_parser/src/lexer/mod.rs around lines 402 to 407, the
private method consume_hash lacks rustdoc; add a concise /// doc comment above
the function stating that it consumes a single '#' character used for ATX
headers, ensures the current position is at a UTF-8 character boundary, advances
the lexer by one byte, and returns the HASH MarkdownSyntaxKind token; keep the
doc short and aligned with project guidelines and update it if behavior changes
in future.

Comment on lines +424 to 437
/// Consume tilde(s) - either single for other uses or triple for fenced code blocks
fn consume_tilde(&mut self) -> MarkdownSyntaxKind {
self.assert_at_char_boundary();

// Check for triple tilde
if self.peek_byte() == Some(b'~') && self.byte_at(2) == Some(b'~') {
self.advance(3);
return TRIPLE_TILDE;
}

// Single tilde
self.advance(1);
TILDE
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Convert comment to proper rustdoc format.

The comment at line 424 should use /// for rustdoc instead of // to be included in generated documentation.

As per coding guidelines, update inline rustdoc documentation when adding new features.

🤖 Prompt for AI Agents
In crates/biome_markdown_parser/src/lexer/mod.rs around lines 424 to 437, the
leading comment for consume_tilde should be a rustdoc comment; replace the
current inline comment with a triple-slash doc comment (///) immediately above
the fn declaration and ensure the text is formatted as a single-line or short
paragraph rustdoc description so it appears in generated docs.

Include whitespace in text tokens when not at line start to reduce CST
node count. Previously each word was a separate token; now entire phrases
become single tokens.

Key changes:
- Lexer handles whitespace context-sensitively (line start vs mid-line)
- consume_textual() includes spaces/tabs but stops at newlines
- Parser detects paragraph boundaries via has_blank_line_since()

Results: 10 MdTextual nodes -> 2 for a two-paragraph document.
@jfmcdowell
Copy link
Author

jfmcdowell commented Dec 21, 2025

@ematipico thanks for the feedback. 7d3d2b9 addresses it. Since this is getting long would you prefer I stop here and once this is done submit a second PR for the remaining common mark features for the parser or keep going in this PR? Whatever makes it easier for the maintainers.

@jfmcdowell jfmcdowell requested a review from ematipico December 21, 2025 16:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-Parser Area: parser A-Tooling Area: internal tools

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants