Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Independent vowels are confusing #95

@r12a

Description

@r12a

Like other Indic scripts, Gurmukhi has independent vowels which may be visualised as made up of 2 code points, whereas Unicode provides precomposed code points for each independent vowel. The precomposed code points and the decomposed sequences that may be rendered to look the same are not canonically equivalent in Unicode, and therefore may be problematic for users who are unaware.

This is particularly pronounced for Gurmukhi because in principle independent vowels are (visually) a vowel carrier plus a vowel sign. For more information see Standalone vowels.

Searching Google for the word ਅਾਲੂ (potato), where the initial 'a' sound is composed of 2 code points, rather than the precomposed code point recommended by Unicode, produces 2,570 pages, compared to 361,000 using the precomposed character. While this is small in comparison (0.7%), it is large enough to indicate an issue.

Browsers should be able to recognise the decomposed sequences and treat them as equivalent to the precomposed code points for sorting, search, collation, etc.

Many fonts produce a dotted circle or fail to correctly align the glyphs of the decomposed sequence, which also helps reduce this issue, however some fonts do not (such as the Gurmukhi MN Mac system font).

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Status

    Issue identified, needing investigation

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions