Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Selector: Decode invalid escape code points to U+FFFD#5845

Open
rootvector2 wants to merge 1 commit into
jquery:mainfrom
rootvector2:unescape-invalid-codepoints
Open

Selector: Decode invalid escape code points to U+FFFD#5845
rootvector2 wants to merge 1 commit into
jquery:mainfrom
rootvector2:unescape-invalid-codepoints

Conversation

@rootvector2

Copy link
Copy Markdown

unescapeSelector decodes CSS hex escapes that the spec maps to U+FFFD wrong:

\0       -> U+0000   (literal NUL)
\D800    -> U+D800   (lone surrogate)
\110000  -> garbage  (above the Unicode max)

Return U+FFFD for null, surrogate, and out-of-range escapes. Reachable via the seeded .filter() matcher; test added alongside the others.

@timmywil timmywil added the Discuss in Meeting Reserved for Issues and PRs that anyone would like to discuss in the weekly meeting. label Jun 1, 2026
@timmywil timmywil requested a review from gibson042 June 15, 2026 16:08
@timmywil timmywil added Needs review and removed Discuss in Meeting Reserved for Issues and PRs that anyone would like to discuss in the weekly meeting. labels Jun 15, 2026
@timmywil

Copy link
Copy Markdown
Member

The workflow failures are unrelated. Chrome has been updated in CI and the failures we saw in beta related to :enabled and :disabled pseudos have made it to Chrome stable.

@gibson042 gibson042 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! This should be good to go after a few small tweaks.

Comment thread src/selector/unescapeSelector.js Outdated
Comment on lines +8 to +21
var codePoint = parseInt( escape.slice( 1 ), 16 );

if ( nonHex ) {

// Strip the backslash prefix from a non-hex escape sequence
return nonHex;
}

// Per the CSS spec, a NULL, surrogate, or out-of-range code point is
// replaced with the REPLACEMENT CHARACTER (U+FFFD).
if ( codePoint === 0 || codePoint > 0x10FFFF ||
( codePoint >= 0xD800 && codePoint <= 0xDFFF ) ) {
return "\uFFFD";
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gzippability improvements:

Suggested change
var codePoint = parseInt( escape.slice( 1 ), 16 );
if ( nonHex ) {
// Strip the backslash prefix from a non-hex escape sequence
return nonHex;
}
// Per the CSS spec, a NULL, surrogate, or out-of-range code point is
// replaced with the REPLACEMENT CHARACTER (U+FFFD).
if ( codePoint === 0 || codePoint > 0x10FFFF ||
( codePoint >= 0xD800 && codePoint <= 0xDFFF ) ) {
return "\uFFFD";
}
var codePoint = "0x" + escape.slice( 1 ) - 0;
if ( nonHex ) {
// Strip the backslash prefix from a non-hex escape sequence
return nonHex;
}
// Per the CSS spec, a NULL, surrogate, or out-of-range code point is
// replaced with the REPLACEMENT CHARACTER (U+FFFD).
// https://www.w3.org/TR/css-syntax-3/#consume-escaped-code-point
if ( !codePoint || codePoint > 0x10FFFF ||
( codePoint >= 0xD800 && codePoint < 0xE000 ) ) {
return "\uFFFD";
}

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

applied. went with the "0x" + escape.slice( 1 ) - 0 form, !codePoint, and < 0xE000, and added the spec link.

Comment thread src/selector/unescapeSelector.js Outdated
Comment on lines +27 to +32
return codePoint > 0xFFFF ?
String.fromCharCode(
( codePoint - 0x10000 ) >> 10 | 0xD800,
( codePoint - 0x10000 ) & 0x3FF | 0xDC00
) :
String.fromCharCode( codePoint );

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gzippability improvements:

Suggested change
return codePoint > 0xFFFF ?
String.fromCharCode(
( codePoint - 0x10000 ) >> 10 | 0xD800,
( codePoint - 0x10000 ) & 0x3FF | 0xDC00
) :
String.fromCharCode( codePoint );
return codePoint < 0x10000 ?
String.fromCharCode( codePoint ) :
String.fromCharCode(
( codePoint - 0x10000 ) >> 10 | 0xD800,
( codePoint - 0x10000 ) & 0x3FF | 0xDC00
);

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, BMP branch first now.

Comment thread test/unit/selector.js
"Long numeric escape (non-BMP)" );
} );

QUnit.test( "attributes - invalid escaped code points", function( assert ) {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's also include a test case with complete and off-by-one coverage, e.g. that [data-attr='\0 \1 \D7FF \D800 \DFFF \E000 \10FFFF \110000'] matches an element with attribute value "\uFFFD\u0001\uD7FF\uFFFD\uFFFD\uE000\uDBFF\uDFFF\uFFFD".

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added it as a fifth assertion: seeded an element with value ��퟿���� and matched it with the full \0 \1 \D7FF \D800 \DFFF \E000 \10FFFF \110000 list, so both sides of each boundary are covered.

@rootvector2 rootvector2 force-pushed the unescape-invalid-codepoints branch from 52d7b8a to 694ba83 Compare June 15, 2026 19:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

3 participants