-
-
Notifications
You must be signed in to change notification settings - Fork 794
fix(biome_js_analyze): fix useValidLang rejecting BCP 47 language tags with script subtags #8118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🦋 Changeset detectedLatest commit: 208d530 The changes in this PR will be included in the next version bump. This PR includes changesets to release 13 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
WalkthroughParses BCP 47 lang attributes to accept optional script subtags, supporting language[-script][-region] forms and rejecting values with more than three primary subtags. Adds Suggested labels
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
crates/biome_js_analyze/src/lint/a11y/use_valid_lang.rs (1)
69-103: Language-script tags likezh-Hantandsr-Latnare incorrectly rejected despite being valid BCP 47The code lacks a dedicated match arm for two-part tags where the second component is a script. BCP 47 (RFC 5646) permits a primary language subtag followed by a script subtag with no region, yet the current implementation falls through to the
(Some(language), Some(country), None)arm and fails the country validation on the script code.To fix this, you'll need to distinguish scripts from country codes—likely via a heuristic (scripts are typically 4 letters, countries 2) or by importing/creating an
is_valid_script()function frombiome_aria_metadata. Add a dedicated arm before the country check and validate viais_valid_language("{language}-{script}")as the three-part branch does.Include tests for
<html lang="zh-Hant" />and<html lang="sr-Latn" />to prevent regression.
🧹 Nitpick comments (1)
crates/biome_js_analyze/tests/specs/a11y/useValidLang/valid.jsx (1)
8-9: Consider adding tests for language–script tags without regionThe new
zh-Hans-CN/zh-Hant-TWcases look spot‑on for language–script–region. To fully exercise the BCP 47 shape from the issue (language[-script][-region]…), it might be worth adding a couple oflanguage-script‑only examples as valid too (e.g.<html lang="zh-Hant" />,<html lang="sr-Cyrl" />), if you intend to support those.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (2)
crates/biome_js_analyze/tests/specs/a11y/useValidLang/invalid.jsx.snapis excluded by!**/*.snapand included by**crates/biome_js_analyze/tests/specs/a11y/useValidLang/valid.jsx.snapis excluded by!**/*.snapand included by**
📒 Files selected for processing (2)
crates/biome_js_analyze/src/lint/a11y/use_valid_lang.rs(2 hunks)crates/biome_js_analyze/tests/specs/a11y/useValidLang/valid.jsx(1 hunks)
82dd282 to
c515454
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
crates/biome_js_analyze/src/lint/a11y/use_valid_lang.rs (3)
70-87: Consider edge case: language-country-extra patterns.When the input is
language-country-extra(e.g.,en-US-GB), line 71 validatesen-USas a language code, which should fail. This results in a Language error, though the actual issue is an extraneous segment after a valid language-country pair.Whilst this only affects invalid inputs, distinguishing this case would improve error messages. You could check whether the second component is a valid country before validating
language-script.
89-103: Logic correctly distinguishes language-country from language-script.The nested checks appropriately handle both patterns:
language-country(when script_or_country is a valid country)language-script(when the combined form is a valid language)The flow is correct but fairly intricate. A brief comment explaining the two cases might aid future maintainers.
70-112: Consider adding negative test cases for script subtag patterns.Whilst the valid cases are well covered, consider adding tests for invalid script subtag patterns to ensure they're still properly rejected:
- Invalid script:
zh-Xxxx- Invalid region with script:
zh-Hans-ZZ- Extra segments:
en-US-GB-ExtraThis would confirm the error paths work correctly for the new 3-component logic.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (2)
crates/biome_js_analyze/tests/specs/a11y/useValidLang/invalid.jsx.snapis excluded by!**/*.snapand included by**crates/biome_js_analyze/tests/specs/a11y/useValidLang/valid.jsx.snapis excluded by!**/*.snapand included by**
📒 Files selected for processing (2)
crates/biome_js_analyze/src/lint/a11y/use_valid_lang.rs(2 hunks)crates/biome_js_analyze/tests/specs/a11y/useValidLang/valid.jsx(1 hunks)
🔇 Additional comments (2)
crates/biome_js_analyze/tests/specs/a11y/useValidLang/valid.jsx (1)
8-9: Good test coverage for the script subtag cases.These additions properly validate the fix for both 2-component (
zh-Hant) and 3-component (zh-Hans-CN) patterns with script subtags.crates/biome_js_analyze/src/lint/a11y/use_valid_lang.rs (1)
69-69: Match expanded to handle script subtags.The 3-tuple match correctly supports parsing language-script-region patterns.
CodSpeed Performance ReportMerging #8118 will not alter performanceComparing Summary
Footnotes
|
c515454 to
4061978
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
crates/biome_js_analyze/src/lint/a11y/use_valid_lang.rs (1)
149-149: Add helpful context to the Value error diagnostic.When
InvalidKind::Valueis returned (due to extra components), the diagnostic provides no specific guidance. Users would benefit from understanding that BCP 47 tags should follow a specific structure.Consider adding a footer note:
- InvalidKind::Value => diagnostic, + InvalidKind::Value => diagnostic.note( + markup! { + "Language tags should follow the structure: language[-script][-region]." + } + ),
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (2)
crates/biome_js_analyze/tests/specs/a11y/useValidLang/invalid.jsx.snapis excluded by!**/*.snapand included by**crates/biome_js_analyze/tests/specs/a11y/useValidLang/valid.jsx.snapis excluded by!**/*.snapand included by**
📒 Files selected for processing (2)
crates/biome_js_analyze/src/lint/a11y/use_valid_lang.rs(2 hunks)crates/biome_js_analyze/tests/specs/a11y/useValidLang/valid.jsx(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- crates/biome_js_analyze/tests/specs/a11y/useValidLang/valid.jsx
🔇 Additional comments (2)
crates/biome_js_analyze/src/lint/a11y/use_valid_lang.rs (2)
81-86: Extra component detection works correctly.The check properly detects when more than three subtags are present and returns an appropriate error.
70-87: Review comment can be dismissed; implementation is correct.The verification confirms the three-component logic is sound. The BCP 47 standard mandates that script subtags precede region subtags, so three-component tags must follow the
language-script-regionpattern. The code correctly validateszh-Hans-CNand rejectsen-GB-something, as evidenced by the test suite. Theis_valid_language()function properly accepts compound language-script codes like"zh-Hans".
4061978 to
91f99f6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
crates/biome_js_analyze/src/lint/a11y/use_valid_lang.rs (1)
89-105: Misleading error type when script validation fails.This is a known issue already flagged in the previous review. When the second component is neither a valid country nor part of a valid language-script combination (lines 95-103), the code returns
InvalidKind::Country. For a tag likezh-InvalidScript, users will see "Some of valid countries:" in the diagnostic, which doesn't help them understand that their script subtag is invalid.As suggested in the previous review, consider introducing a separate
InvalidKind::Scriptvariant or a more generic error message that acknowledges both script and country possibilities.
🧹 Nitpick comments (1)
crates/biome_js_analyze/tests/specs/a11y/useValidLang/valid.jsx (1)
8-9: Test cases cover the primary fix.The added tests for
zh-Hantandzh-Hans-CNdemonstrate that the fix resolves the reported issue. However, consider adding test coverage for:
- Other script subtags mentioned in the issue (e.g.,
sr-Cyrl-RS)- Invalid script cases in a separate invalid test file (e.g.,
zh-InvalidScript) to verify error handling
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (2)
crates/biome_js_analyze/tests/specs/a11y/useValidLang/invalid.jsx.snapis excluded by!**/*.snapand included by**crates/biome_js_analyze/tests/specs/a11y/useValidLang/valid.jsx.snapis excluded by!**/*.snapand included by**
📒 Files selected for processing (2)
crates/biome_js_analyze/src/lint/a11y/use_valid_lang.rs(2 hunks)crates/biome_js_analyze/tests/specs/a11y/useValidLang/valid.jsx(1 hunks)
🔇 Additional comments (1)
crates/biome_js_analyze/src/lint/a11y/use_valid_lang.rs (1)
70-87: Metadata coverage is incomplete for language-script combinations.The verification confirms your concern: whilst
zh-Hansandzh-Hantare present inbiome_aria_metadata/src/lib.rs, Serbian script variants (sr-Cyrl,sr-Latn) are missing. Only the base language code"sr"is listed. Valid BCP 47 tags using these script subtags will be incorrectly rejected. The code logic itself is sound, but the data in the hardcodedISO_LANGUAGESarray needs expansion to cover additional language-script combinations beyond Chinese.
dyc3
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There also needs to be a changeset.
91f99f6 to
685f39d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (3)
crates/biome_aria_metadata/build.rs (1)
46-50: Script enum generation looks good; minor scope nitThe new
IsoScriptswiring and token emission look solid and consistent with countries/languages.Tiny nit:
ISO_SCRIPTSdoes not need to bepubinsidebuild.rs, since it’s only used locally, and the other ISO constants here are private. Droppingpubwould make the file a bit more self‑consistent.Also applies to: 273-283
crates/biome_js_analyze/tests/specs/a11y/useValidLang/valid.jsx (1)
8-9: Nice targeted coverage for script subtagsThese two cases neatly cover
language-scriptandlanguage-script-countryfor Chinese. If you fancy one more, addingsr-Cyrl-RSwould round out the examples mentioned in the issue, but it’s not strictly necessary.crates/biome_js_analyze/src/lint/a11y/use_valid_lang.rs (1)
70-135: 3-part parsing works, but script vs country classification could be sharperThe new 2- and 3-part handling plus
InvalidKind::Scriptis a solid upgrade and fixes the original regression for tags likezh-Hans-CNandsr-Cyrl-RS.One minor behavioural quirk: in the
(Some(language), Some(script), Some(country))arm the second subtag is always treated as a script. For inputs such asen-GB-typo,GBis really a region, but we now surface a Script error (and list scripts) rather than pointing at the extra/variant part. If you want diagnostics to line up more closely with BCP‑47 structure, you could first disambiguate the second token by length or byis_valid_script/is_valid_countrybefore deciding whether to reportScript,Country, orValuehere—similar to what you already do in the two-part case.Otherwise, the integration with
is_valid_scriptand the Script footer listing looks clean.Also applies to: 150-182
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (2)
crates/biome_js_analyze/tests/specs/a11y/useValidLang/invalid.jsx.snapis excluded by!**/*.snapand included by**crates/biome_js_analyze/tests/specs/a11y/useValidLang/valid.jsx.snapis excluded by!**/*.snapand included by**
📒 Files selected for processing (6)
.changeset/angry-carpets-switch.md(1 hunks)crates/biome_aria_metadata/build.rs(2 hunks)crates/biome_aria_metadata/src/lib.rs(3 hunks)crates/biome_js_analyze/src/lint/a11y/use_valid_lang.rs(4 hunks)crates/biome_js_analyze/tests/specs/a11y/useValidLang/invalid.jsx(1 hunks)crates/biome_js_analyze/tests/specs/a11y/useValidLang/valid.jsx(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- .changeset/angry-carpets-switch.md
🧰 Additional context used
🧬 Code graph analysis (3)
crates/biome_js_analyze/tests/specs/a11y/useValidLang/invalid.jsx (1)
crates/biome_js_analyze/tests/specs/a11y/useValidLang/valid.jsx (8)
a(2-2)a(3-3)a(4-4)a(5-5)a(6-6)a(7-7)a(8-8)a(9-9)
crates/biome_js_analyze/tests/specs/a11y/useValidLang/valid.jsx (1)
crates/biome_js_analyze/tests/specs/a11y/useValidLang/invalid.jsx (6)
a(1-1)a(2-2)a(3-3)a(4-4)a(5-5)a(6-6)
crates/biome_js_analyze/src/lint/a11y/use_valid_lang.rs (1)
crates/biome_aria_metadata/src/lib.rs (2)
is_valid_script(53-55)scripts(68-70)
🔇 Additional comments (1)
crates/biome_js_analyze/tests/specs/a11y/useValidLang/invalid.jsx (1)
4-6: Good coverage of new failure modesThese cases nicely exercise invalid script, invalid country after a valid script, and too many subtags, so they should keep the new parsing logic honest.
685f39d to
d53fb6f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
crates/biome_js_analyze/src/lint/a11y/use_valid_lang.rs (1)
95-125: Two-part disambiguation logic is sound.The length-based heuristic appropriately distinguishes scripts (4 chars) from countries (2-3 chars) when neither validation passes. This aligns with ISO 15924 and ISO 3166-1 standards.
One minor consideration: BCP 47 permits 3-digit UN M.49 numeric region codes (e.g.,
en-001for world). Ifis_valid_countrydoesn't handle these, they'd be rejected asInvalidKind::Value. However, this would be a pre-existing limitation, not introduced here.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (2)
crates/biome_js_analyze/tests/specs/a11y/useValidLang/invalid.jsx.snapis excluded by!**/*.snapand included by**crates/biome_js_analyze/tests/specs/a11y/useValidLang/valid.jsx.snapis excluded by!**/*.snapand included by**
📒 Files selected for processing (6)
.changeset/angry-carpets-switch.md(1 hunks)crates/biome_aria_metadata/build.rs(2 hunks)crates/biome_aria_metadata/src/lib.rs(3 hunks)crates/biome_js_analyze/src/lint/a11y/use_valid_lang.rs(4 hunks)crates/biome_js_analyze/tests/specs/a11y/useValidLang/invalid.jsx(1 hunks)crates/biome_js_analyze/tests/specs/a11y/useValidLang/valid.jsx(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- crates/biome_js_analyze/tests/specs/a11y/useValidLang/valid.jsx
- .changeset/angry-carpets-switch.md
🧰 Additional context used
🧬 Code graph analysis (2)
crates/biome_js_analyze/src/lint/a11y/use_valid_lang.rs (1)
crates/biome_aria_metadata/src/lib.rs (4)
is_valid_country(43-45)is_valid_language(48-50)is_valid_script(53-55)scripts(68-70)
crates/biome_js_analyze/tests/specs/a11y/useValidLang/invalid.jsx (1)
crates/biome_js_analyze/tests/specs/a11y/useValidLang/valid.jsx (8)
a(2-2)a(3-3)a(4-4)a(5-5)a(6-6)a(7-7)a(8-8)a(9-9)
🔇 Additional comments (6)
crates/biome_aria_metadata/build.rs (1)
46-50: LGTM!The ISO_SCRIPTS constant and enum generation follow the established pattern for countries and languages. The 28 script codes are standard ISO 15924 identifiers.
Also applies to: 275-275, 282-282
crates/biome_js_analyze/tests/specs/a11y/useValidLang/invalid.jsx (1)
4-6: Excellent test coverage for script validation.The three new test cases appropriately cover:
- Invalid script code (Xxxx)
- Valid script with invalid country (Hans + ZZ)
- Excessive components (4+ parts)
crates/biome_aria_metadata/src/lib.rs (2)
36-40: LGTM! Script validation API follows established patterns.The ISO_SCRIPTS constant,
is_valid_script(), andscripts()accessor mirror the existing country/language APIs. Implementation is consistent and correct.Also applies to: 52-55, 67-70
23-34: The review comment is based on incorrect information.Verification shows no changes to ISO_LANGUAGES between main and the current branch—both have identical 150-entry arrays with no removals or additions. The claim of a reduction from 152 to 150 entries appears to be a miscount. This code change can be approved as-is.
Likely an incorrect or invalid review comment.
crates/biome_js_analyze/src/lint/a11y/use_valid_lang.rs (2)
70-93: LGTM! Three-part validation correctly implements BCP 47 structure.The validation properly handles language-script-country tags by checking each component in order and returning the appropriate
InvalidKindfor each failure case. The check for extra components (4+) correctly rejects malformed tags.
171-180: LGTM! Diagnostic footer follows established pattern.The
InvalidKind::Scriptdiagnostic correctly mirrors the Language and Country cases, providing helpful suggestions to users.
…s with script subtags Fixes biomejs#8117.
d53fb6f to
208d530
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
crates/biome_js_analyze/tests/specs/a11y/useValidLang/valid.jsx (1)
8-9: Good script-tag coverage; consider one more sampleThe zh-Hant and zh-Hans-CN cases nicely exercise script-only and language–script–region. You might also add a
sr-Cyrl-RScase to mirror the example from the linked issue, but this is strictly optional.crates/biome_js_analyze/src/lint/a11y/use_valid_lang.rs (1)
70-93: Script-aware parsing behaves correctly for the targeted tag shapesThe new branches handle the key cases as expected: three-part
language-script-country(zh-Hans-CN,sr-Cyrl-RS) and two-partlanguage-script(zh-Hant) orlanguage-country(en-US), while still rejecting extra subtags and clearly distinguishing invalid language/script/country/value. The length-based fallback in the two-part arm is a neat way to tailor the error kind without over-parsing BCP47.Also applies to: 95-127
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (2)
crates/biome_js_analyze/tests/specs/a11y/useValidLang/invalid.jsx.snapis excluded by!**/*.snapand included by**crates/biome_js_analyze/tests/specs/a11y/useValidLang/valid.jsx.snapis excluded by!**/*.snapand included by**
📒 Files selected for processing (6)
.changeset/angry-carpets-switch.md(1 hunks)crates/biome_aria_metadata/build.rs(2 hunks)crates/biome_aria_metadata/src/lib.rs(3 hunks)crates/biome_js_analyze/src/lint/a11y/use_valid_lang.rs(4 hunks)crates/biome_js_analyze/tests/specs/a11y/useValidLang/invalid.jsx(1 hunks)crates/biome_js_analyze/tests/specs/a11y/useValidLang/valid.jsx(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- .changeset/angry-carpets-switch.md
- crates/biome_js_analyze/tests/specs/a11y/useValidLang/invalid.jsx
🧰 Additional context used
🧬 Code graph analysis (2)
crates/biome_js_analyze/src/lint/a11y/use_valid_lang.rs (1)
crates/biome_aria_metadata/src/lib.rs (4)
is_valid_country(43-45)is_valid_language(48-50)is_valid_script(53-55)scripts(68-70)
crates/biome_js_analyze/tests/specs/a11y/useValidLang/valid.jsx (1)
crates/biome_js_analyze/tests/specs/a11y/useValidLang/invalid.jsx (6)
a(1-1)a(2-2)a(3-3)a(4-4)a(5-5)a(6-6)
🔇 Additional comments (3)
crates/biome_aria_metadata/build.rs (1)
33-44: IsoScripts generation looks consistent; double-check language list changesThe new
ISO_SCRIPTSslice andIsoScriptsenum generation follow the existing countries/languages pattern and should integrate cleanly with the include!‑generated types. The only thing I’d sanity‑check is theISO_LANGUAGESedits here (some entries were removed/reordered per the summary) to ensure no language codes were accidentally dropped compared to the previous list or lib.rs.Also applies to: 46-50, 273-283
crates/biome_js_analyze/src/lint/a11y/use_valid_lang.rs (1)
3-4: Nice improvement in error specificity for script failuresPulling in
is_valid_scriptand addingInvalidKind::Script(with a dedicated footer listingscripts()) makes diagnostics much less confusing for values likezh-Xxxxorzh-Hans-ZZ. This also addresses the earlier review concern about mislabelling script errors as country issues.Also applies to: 43-48, 171-180
crates/biome_aria_metadata/src/lib.rs (1)
23-40: Script metadata API aligns well with existing country/language helpers
ISO_SCRIPTS,is_valid_script, andscripts()mirror the existing country/language pattern nicely and giveuseValidLangexactly what it needs. Given thatISO_LANGUAGESwas also adjusted here, it’s worth quickly cross-checking the language/script lists against whatever ISO source you’re using to confirm no desired codes went missing.Also applies to: 52-55, 67-70
|
@dyc3 I made requested changes and added a changeset. |
dyc3
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
…s with script subtags (biomejs#8118)
…s with script subtags (biomejs#8118)
Summary
Closes #8117.
This PR fixes an issue where the useValidLang rule incorrectly rejected valid BCP 47 tags that include a script subtag (such as
zh-Hans-CN).Test Plan
New rules were added to:
crates/biome_js_analyze/tests/specs/a11y/useValidLang/invalid.jsxcrates/biome_js_analyze/tests/specs/a11y/useValidLang/valid.jsx