Fix internalSkeleton #31

ohhithere · 2025-02-23T16:12:18Z

While going over UTS #39 I noticed that the internalSkeleton function is defined as:

Convert X to NFD format, as described in UAX15.
Remove any characters in X that have the property Default_Ignorable_Code_Point.
Concatenate the prototypes for each character in X according to the specified data, producing a string of exemplar characters.
Reapply NFD.

But in the current implementation, we leave out step 2 which probably does not change the behaviour in most cases. Still, it is likely a good idea to fix this.

This pull request adds this step.

Notes:

I took the liberty to refactor load_properties in unicode.py to effectively remove a copy of the function.
I regenerated the tables as is. There seem to have been some changes to the IDENTIFIER_TYPE table upstream which are included here in addition to the new table.

This test fails due to the code currently not removing characters with the Default_Ignorable_Code_Point property.

Manishearth · 2025-02-24T18:29:22Z

Thanks!

ohhithere added 5 commits February 23, 2025 16:34

unicode-script: Refactor load_properties

1955790

confusables: Add failing test

526c6f2

This test fails due to the code currently not removing characters with the Default_Ignorable_Code_Point property.

unicode-script: Add default ignorable code point detection module

4594fbc

Regenerate tables

ec837b2

confusables: Fix internal skeleton

78707a7

Manishearth approved these changes Feb 24, 2025

View reviewed changes

Manishearth merged commit eb9d304 into unicode-rs:master Feb 24, 2025
3 checks passed

ohhithere deleted the fix-internal-skeleton branch February 24, 2025 20:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix internalSkeleton #31

Fix internalSkeleton #31

Uh oh!

ohhithere commented Feb 23, 2025

Uh oh!

Uh oh!

Manishearth commented Feb 24, 2025

Uh oh!

Uh oh!

Fix internalSkeleton #31

Fix internalSkeleton #31

Uh oh!

Conversation

ohhithere commented Feb 23, 2025

Uh oh!

Uh oh!

Manishearth commented Feb 24, 2025

Uh oh!

Uh oh!