Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Improve: sort emoji's in room symbols dialog (up to unicode 10)#6407

Merged
vadi2 merged 4 commits intodevelopmentfrom
update-normalization-unicode-10
Nov 17, 2022
Merged

Improve: sort emoji's in room symbols dialog (up to unicode 10)#6407
vadi2 merged 4 commits intodevelopmentfrom
update-normalization-unicode-10

Conversation

@vadi2
Copy link
Member

@vadi2 vadi2 commented Nov 4, 2022

Brief overview of PR changes/additions

Updates Unicode normalization to 10, up from 8 as we use a more recent Qt version as minimum now.

Motivation for adding to Mudlet

Moving on with the times.

Other info (issues closed, discussion etc)

The same change will need to be applied to #6354.

@vadi2 vadi2 requested a review from a team as a code owner November 4, 2022 06:27
@vadi2 vadi2 requested review from a team November 4, 2022 06:27
@add-deployment-links
Copy link

add-deployment-links bot commented Nov 4, 2022

Hey there! Thanks for helping Mudlet improve. 🌟

Test versions

You can directly test the changes here:

No need to install anything - just unzip and run.
Let us know if it works well, and if it doesn't, please give details.

Copy link
Member

@SlySven SlySven left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

As a related - slightly - matter - with the minimum version of Qt now being higher there is somewhere in the TTextEdit class where a run-time check against the Qt version was being done so as to enable/disable some changes that came about with character widths between Unicode 8.0 and 9.0 - I suppose that the code to handle the older versions can probably be ripped out carefully excised now... 😆

Ah, yeah, look for the (bool) TTextEdit::mUseOldUnicode8 flag...

@wiploo
Copy link
Contributor

wiploo commented Nov 4, 2022

Hello, maybe I do something wrong, but with this PR I can't see any emoji using the label:echo syntax.

@SlySven
Copy link
Member

SlySven commented Nov 4, 2022

Hello, maybe I do something wrong, but with this PR I can't see any emoji using the label:echo syntax.

Which OS? If it is Windows or MacOs the (colour) emoji handling is entirely down to the OS. OTOH if it is Linux then you need to use the same font as the main console is set to use (IIRC) because that font is forced to use the Noto Color Emoji font as the alternative source for glyphs that are not in the main font which is what tends to happen for the coloured emojis... that setting is applied to that font on an application wide basis (so anywhere that font is used the coloured ones will show up - but not if a different font is used {unless it is used for the main console of a different profile!})

@Kebap
Copy link
Contributor

Kebap commented Nov 4, 2022

Maybe @wiploo could show a minimal working example for command / setup which does not show emoji?
Maybe @SlySven could show an MWE which does show?
Bonus question: Did they do the same before this PR?

@Kebap
Copy link
Contributor

Kebap commented Nov 4, 2022

Looking at the code, this seems to only relate to the symbols displayed on rooms in the mapper, not about labels, etc. at all.
How are these the only points in Mudlet where we respect players inputs to be Unicode or am I misinterpreting the code?

@wiploo
Copy link
Contributor

wiploo commented Nov 5, 2022

Sorry, today I try to get more information, but I didn't detect any error. It's cleary my mistake. Have a nice day.

@vadi2
Copy link
Member Author

vadi2 commented Nov 5, 2022

Thank you for reporting anyhow! Better to report a false positive than to let a potential error fly under the radar.

@vadi2
Copy link
Member Author

vadi2 commented Nov 5, 2022

How are these the only points in Mudlet where we respect players inputs to be Unicode or am I misinterpreting the code?

I think you raise a fair question, also what problems does the code exactly fix?

@SlySven
Copy link
Member

SlySven commented Nov 5, 2022

I think you raise a fair question, also what problems does the code exactly fix?

It is to handle a corner case. When a grapheme is composed of multiple Unicode code points (Unicode allows for a limit of around 30 IIRC but I cannot find where that was) in order to compare them with others it is important that those multiple code points are assembled in a consistent manner. Normalisation (particularly in this case the Canonical form) is the process to ensure that, see: https://www.unicode.org/reports/tr15/

It is being done in these places in the Mudlet code so that if the order that the end-user inserts the codepoints into one of the entry points in the symbol related parts they can be accurately compared to any existing symbols entered before by them or someone else who might not have ordered them in the same manner but intended for the same grapheme to be used.

@vadi2
Copy link
Member Author

vadi2 commented Nov 5, 2022

Thanks for the refresher! That's for the room symbol dialog to collate a list of rooms that use particular symbols?

@SlySven
Copy link
Member

SlySven commented Nov 5, 2022

Yes.

@Kebap
Copy link
Contributor

Kebap commented Nov 8, 2022

For testing purposes I have researched a few emoji from the different Unicode versions, which should or should not function in the current and updated Mudlet, right?

Unicode 8 released: 2015 June 17

Unicode 9 released: 2016 June 21

Unicode 10 released: 2017 June 20

Unicode 11 released: 2018 June 5

@SlySven
Now to test this PR, would I expect Mudlet 4.16 to only display Unicode 8 and not Unicode 9 or 10, right?
Whereas this PR should be able to display all emoji from Unicode 8, 9, 10, but not from Unicode 11, right?
Or should we test this in a different manner completely, because I likely misunderstood something maybe?

src/T2DMap.cpp Outdated
Comment on lines 3714 to 3718
// 8.0 is the maximum supported by all the Qt versions (>= 5.7.0) we
// 10.0 is the maximum supported by all the Qt versions (5.14+) we
// handle/use/allow - by normalising the symbol we can ensure that
// all the entered ones are decomposed and recomposed in a
// "standard" way and will have the same sequence of codepoints:
newSymbol = newSymbol.normalized(QString::NormalizationForm_C, QChar::Unicode_8_0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please copy the change to slot_setRoomProperties as well, thanks!

@SlySven
Copy link
Member

SlySven commented Nov 14, 2022

Now to test this PR, would I expect Mudlet 4.16 to only display Unicode 8 and not Unicode 9 or 10, right?
Whereas this PR should be able to display all emoji from Unicode 8, 9, 10, but not from Unicode 11, right?
Or should we test this in a different manner completely, because I likely misunderstood something maybe?

No, this is about updating the process of re-ordering the Unicode codepoints that are used to make (normally a) single grapheme - but there are some corner cases involving multiples - so that they are in the "official" (correct) order for comparisons. It also standardises the way that characters that can be composed both of singled Codepoints and also by a base character with one or more combining diacriticals are "normalised" so that they can be compared and seem to be the same or not. E.g.

  • is U+0065 {LATIN SMALL LETTER E} and U+0301 {COMBINING ACUTE ACCENT}
  • é is U+00E9 {LATIN SMALL LETTER E WITH ACUTE}

however literally comparing those string would say that they are different - however, Normalising (Canonical form) them first would convert the first into the second and than they WOULD be the same. What this PR is doing is updating the formulas that are used to do this so that they come up to the Unicode 10 standard (the highest that Qt 5.14 supports) whereas they were at 8 before.

To actually test this you would need to find a sequence of codepoints whose normalisation was changed between Unicode 8 and 10 then try using them as a room symbol and then trying a different sequence that should compare the same under the version 10 but not under 8. I do not know how to determine such a thing...

@vadi2
Copy link
Member Author

vadi2 commented Nov 14, 2022

In short, this is only for the 'room symbol' feature so that in case you manage to assemble the same emoji in two+ different ways, this will order everything under the hood in the same way so the amount of times an emoji was used can be counted properly.

@vadi2 vadi2 added this to the 4.17.0 milestone Nov 14, 2022
@Kebap
Copy link
Contributor

Kebap commented Nov 14, 2022

We can't test if it is working as expected, as we don't have any examples to test.
I am not even sure if it was working before, as again there are no test cases.

For now, I say let's merge and hope it works as expected, otherwise review once a reproducible error is reported.

@Kebap
Copy link
Contributor

Kebap commented Nov 14, 2022

For some reason, my suggestion was marked as outdated, but it still holds true:

Please copy the change to slot_setRoomProperties as well, thanks!

@vadi2 vadi2 changed the title Improve: improve support for newer emoji's Improve: sort emoji's in room symbols dialog (up to unicode 10) Nov 15, 2022
@vadi2
Copy link
Member Author

vadi2 commented Nov 15, 2022

Sure, done.

I've updated the title of the PR to be less misleading - it still isn't quite on point, but the factual correct description will be lost on most people.

@vadi2 vadi2 merged commit e28220b into development Nov 17, 2022
@vadi2 vadi2 deleted the update-normalization-unicode-10 branch November 17, 2022 13:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants