Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@alissonlauffer
Copy link
Contributor

@alissonlauffer alissonlauffer commented Oct 31, 2025

Summary

I've fixed BOM parsing by moving the consume_potential_bom function from consume_token_inside_tag to consume_token, as seen in other supported languages.

Test Plan

All current HTML tests passes, and a newly created test passes too.

Fixes #7919.

@changeset-bot
Copy link

changeset-bot bot commented Oct 31, 2025

🦋 Changeset detected

Latest commit: 5301029

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 13 packages
Name Type
@biomejs/biome Patch
@biomejs/cli-win32-x64 Patch
@biomejs/cli-win32-arm64 Patch
@biomejs/cli-darwin-x64 Patch
@biomejs/cli-darwin-arm64 Patch
@biomejs/cli-linux-x64 Patch
@biomejs/cli-linux-arm64 Patch
@biomejs/cli-linux-x64-musl Patch
@biomejs/cli-linux-arm64-musl Patch
@biomejs/wasm-web Patch
@biomejs/wasm-bundler Patch
@biomejs/wasm-nodejs Patch
@biomejs/backend-jsonrpc Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@github-actions github-actions bot added A-Parser Area: parser L-HTML Language: HTML and super languages labels Oct 31, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 31, 2025

Walkthrough

The HTML lexer changes BOM handling: inside tags the fallback branch no longer checks for a BOM and now directly calls consume_unexpected_character(). In the regular consume_token path, BOM detection at position 0 is retained — a BOM token is returned if present, otherwise it continues with consume_html_text(). A test file with a BOM-prefixed doctype (tests/html_specs/ok/bom.html) and a patch changeset were added.

Suggested reviewers

  • dyc3

Pre-merge checks and finishing touches

✅ Passed checks (5 passed)
Check name Status Explanation
Title Check ✅ Passed The pull request title "fix(html): correctly handle BOM in HTML-ish languages" directly and accurately describes the main objective of the changeset. It clearly identifies both the scope (HTML) and the specific issue being fixed (BOM handling), using a conventional commit format that's concise and immediately understandable to someone reviewing the project history. The title is not vague or misleading, and it correctly reflects the primary changes made to the HTML lexer.
Linked Issues Check ✅ Passed The PR directly addresses the regression reported in issue #7919, where HTML files with doctype declarations fail to parse in Biome 2.3.2. The changes move BOM handling to the correct processing location in the lexer, align the implementation with other supported languages, and include a new test file specifically covering BOM-prefixed HTML. The test plan confirms all HTML tests pass, including the newly created test for this scenario, meeting the objective to restore proper parsing of HTML files with doctype declarations.
Out of Scope Changes Check ✅ Passed All changes in the PR are directly related to fixing the BOM handling regression in HTML parsing. The modifications to the HTML lexer address the core issue, the new test file validates the fix, and the changeset file documents the change for release notes as required by the repository. No unrelated or extraneous changes have been introduced outside the scope of fixing issue #7919.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description Check ✅ Passed The PR description clearly and directly addresses the changeset. It explains that BOM parsing was fixed by moving the consume_potential_bom function from consume_token_inside_tag to consume_token, which aligns with the actual code changes made to the HTML lexer. The description also includes a test plan confirming that existing tests pass and a new test was created, matching the test file addition in the changeset. A specific issue reference (#7919) is provided, giving clear context for the fix.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@ematipico
Copy link
Member

Please commit a changeset, as mentioned in the template, and by the bot

Copy link
Member

@ematipico ematipico left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@ematipico ematipico merged commit a35c496 into biomejs:main Nov 1, 2025
14 checks passed
@github-actions github-actions bot mentioned this pull request Oct 31, 2025
@alissonlauffer alissonlauffer deleted the fix/fix-html-with-bom branch November 2, 2025 03:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-Parser Area: parser L-HTML Language: HTML and super languages

Projects

None yet

Development

Successfully merging this pull request may close these issues.

📝 HTML file with <!doctype html> breaks parser in Biome 2.3.2 (works in 2.2.6)

2 participants