Check script of combining marks during font fallback #6857
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes #6801
Description
Font fallback does not check combining marks associated with a base character, because the shaping engine might precompose the character with mark into single one which might be present in the font.
However, when the marks are from a different script than the base character, the text does not get to the correct shaping engine, and inappropriate clusters are made (e.g. punctuation with Indic vowels).
The PR compares the script of the base character and the combining marks and only allows to skip font fallback check if they are the same, using the existing character classification tables.
Customer Impact
Customers not taking this fix will see incorrectly shaped text and will have difficulty editing affected text, since in non-Latin script cases, cursor cannot be put between the base character and the combining mark.
Visual Studio is a notable impacted customer as in general programming languages are more susceptible to placing isolated combining marks inside quotation marks as chars and strings, and those get glued together. Depending on how the editor colorizes and tokenizes the string, customers will see inconsistent rendering that might cause a reflow just as cursor is moved around.
Regression
No.
Testing
Built and verified the test case in #6801 is shaped as expected in 7.0.0-preview.4.22229.4, i.e. Indic vowel marks don't get shaped together with punctuation as base characters, and that the specified font for fallback is selected. Ad-hoc verified that Latin combining marks still work with Latin bases like before.
Risk
Runs with marks from different scripts will get split iff the marks are not present in the font. This could theoretically affect some combinations valid via script extensions, however, neither DWrite nor WPF currently support script extensions.
There will be slight performance decrease for combining mark characters (two more lookups of character attributes). This is a trade off decision that allows base characters (including standard Latin text going through fast path) to remain unaffected.
The existing code looks up attributes several times per each character. That is an opportunity for potentially interesting performance improvement, however this PR does not address it.
Microsoft Reviewers: Open in CodeFlow