-
-
Notifications
You must be signed in to change notification settings - Fork 9.6k
[Intl] [Emoji] Move emoji data in a new component #53096
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I have no memory of this PR, but there is a thumbs-up from me, so... 😮💨 |
Fabbot fails but i cannot fix it (as it wants to edit emoji files to add licence, fix typos, etc.) |
In the end, i renamed IntlEmoji just Emoji, as nothing from Emoji uses/interact with the Intl component. Moreover, Emoji requires native php Intl extension. |
My understanding of this topic:
About symfony/intl, I think it does what is should: make the intl data easily available. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this approach works, thanks for proposing.
Here are some comments.
symfony new test-webapp --webapp
du -sh test-webapp
#displays 154MB (uncompressed - 141MB just for vendor)
rm test-webapp/vendor/symfony/intl/ -rf
du -sh test-webapp
#displays 109MB (still uncompressed - 96MB just for vendor) |
this is not true. The polyfills are in |
That is interesting. Really. Because i have clearly not the same numbers. And i wonder why.
(yes i let you enjoy my |
I'm not sure how i can fix the remaining test... if someone can help me / point me in the good direction that'd be nice :) |
@smnandre just a random comment related to this. In Symfony we have a config option called enabled_locales. We could delete (automatically or letting developers doing it explicitly via a command provided by us) all the data of all languages not included in that config option. We'd delete 95% of Intl data (emojis transliterators, names of coutnries/languages/currencies in other languages, etc.) without affecting to the application. |
I suppose that could solve the "final deployable size" question... even more if combined with the "compress script" But... that would still download every emojis variants x every languages everytime you create a webapp, run composer install, or need the CountryValidator. I won't push too hard on this, even less if i'm far from beeing in the majority, thinking that this data should not be in the main repo. I guess my pov on this is more "philosophical" or "opiniated" than i imagined... :) As i said while investigating this issue, i have some alternative ideas to reduce/optimise those files, it can be a step in the good* direction ! (*according to me :) ) |
Let's continue with this approach. 7% compressed diff also means less time to uncompress the app so that can lead to a slightly better DX. |
Ok will do tomorrow. I can reduce 25% of uncompressed data files... by removing all spaces (only 3/4% when compressed... as expected) Do we need to php-cs-fix those files ? |
9696acc
to
fa77c33
Compare
not worth it to me
nope, fabbot can be ignored on those |
Ok ! I'll fix what i can/figure out.. and probably will call for a bit of help :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be able to solve the deps=high failure by stating that intl conflicts with string < 7.1
src/Symfony/Component/Intl/Transliterator/EmojiTransliterator.php
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just one minor thing 🚀
src/Symfony/Component/Intl/Transliterator/EmojiTransliterator.php
Outdated
Show resolved
Hide resolved
💥 |
9937ab8
to
1fdac4d
Compare
If possible, could you configure fabbot to ignore (cf https://fabbot.io/report/symfony/symfony/53096/25c5b345a64dbce84dd180c8e8c779af5e2d8fe0 ) |
25c5b34
to
bd03efd
Compare
Transfert emoji data from Intl to emoji component Update main composer.json Fix phpunit config Update composer and README descriptions Fix LICENCE date Update src/Symfony/Component/Intl/CHANGELOG.md Co-authored-by: Oskar Stark <[email protected]> Fix Changelog I feel that cool resolve some of the recent issues linked to the Profiler. Rename component Emoji + unlink from Intl (no shared resp/code) Isolated commit to move data Update Github worflows Use Emoji in String component Update src/Symfony/Component/Emoji/CHANGELOG.md Co-authored-by: Nicolas Grekas <[email protected]> Update src/Symfony/Component/Emoji/README.md Co-authored-by: Nicolas Grekas <[email protected]> Present the compress command in both README's Update src/Symfony/Component/Intl/CHANGELOG.md Co-authored-by: Nicolas Grekas <[email protected]> Fix main composer.json Revert symfony/intl requires symfony/emoji Remove EmojiTransliteratorTrait Move emoji data Add "symfony/deprecation-contracts" to Intl Revert data test split Add symfony/emoji to String (dev) Fix String Test namespace Fix .gitattributes hides "bin/compress" script Please Psalm ? Compute quickCheck once Update LICENCE Add Intl conflict with string < 7.1 Fix Int changelog Fix composer.json CS Throw exception in Intl BC layer when symfony/emoji is not installed Test Intl & Emoji in the same job Remove useless check Remove useless check (without breaking things)
bd03efd
to
f5ba7e3
Compare
(squashed / rebased on 7.1) |
Is there something else i can do here ? :) |
To make the description fairer, installing |
@stof I updated the description
Feel free to rephrase or edit directly if you have another idea in mind! |
Thank you @smnandre. |
"symfony/var-exporter": "^6.4|^7.0" | ||
}, | ||
"conflict": { | ||
"symfony/string": "<7.1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need the conflict?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better link to this message than trying to rephrase it (i'd fail)
#53096 (review)
@@ -24,8 +24,9 @@ | |||
}, | |||
"require-dev": { | |||
"symfony/error-handler": "^6.4|^7.0", | |||
"symfony/intl": "^6.4|^7.0", | |||
"symfony/emoji": "^7.0", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
^7.1
(fixed in f0ddd8d)
Here is a small bc break if we do not require symfony/emoji component inside |
This PR move all the emoji data & code from the Intl component into its own new Emoji component.
Objectives/reasons:
Thanks to all the reviewers for the feedbacks, opinions, advices ❤️
--- Original (obselete) post below ---
This PR move all the emoji data & code from the Intl component into its own new IntlEmoji component.
... and hopefully open a debate aboute the future of the Intl component, its role, and the way we handle "data" in the framework and the repositories
Important
🎙️ DISCLAIMER: This PR contains both metrics and opinions. The metrics were collected this morning in a neutral and transparent manner, ensuring they can be reproduced by anyone. However, the opinions I present here are just that – opinions, and should not be interpreted as objective truths or claims of fact.
Update: details/summary added to improve the readability
What is symfony/intl ?
Repository: https://github.com/symfony/intl
Documentation: https://symfony.com/doc/current/components/intl.html
Unicode: https://home.unicode.org/technical-quick-start-guide/
Responsabilities
Currently, it seems to me this component:
Opiniated remarks
Two comments highlight the "blurred lines" I believe this component navigates:
1) Access or data ?
Maybe my english is in fault, but it seems to me it does not provides access to the data... it provides the data.
2) Unicode = CLDR + ICU + UTC
CLDR (where the emoji data comes from) is not in the ICU library. I'm quibbling over details here, i know. But i think that illustrates the volatile "scope" and "responsabilities" of this component.
So we come close to the problem...
Symfony/Intl is massive
The data included in the Intl component is massive (especially the emoji descriptions), and will grow more every semester.
I looked at the following cases
Versions:
Some metrics...
So symfony/intl accounts for 30% of the files in the monorepo ... and nearly 75% of its total disk size.
...over time
Size (in MB) of the sources
It was already big in previous versions, but since the emoji data integration, it's off charts.
The symfony/intl alone is twice as big as: all the other components, all the bridges and all the bundles. Combined.
And it's not over. At all.
Why it'll grow more
The ICU components used in the component are well-defined and constrained by 'real-world' factors, so we can expect minor changes regarding countries, formatting data, etc. It's unlikely, for instance, that 200 new countries will suddenly emerge in 2024.
However, emojis may present a major challenge in the near future. New ones are added with every CLDR release. Except for a significant drop (like the upcoming 2000 hieroglyphs), this should be a gradual increase.
What bothers me more is the 'combinatorial nature' of these descriptions. We generate a line of text for every combination. And that's why this component is so large. But it's just the beginning of what could be exponential growth.
As of today, the 'hand emoji' has variations for skin color (I'm not certain, but let's say there are 6 possible colors), and emojis with multiple people often vary by gender ('boy and two girls').
In the upcoming release, a new variable is the concept of 'left-handed' versus 'right-handed'. So, we'll create a new line for every existing emoji with a visible hand. But we'll need way more than just a new line, because of every emoji where two hands are visible. I don't remember if it's already implemented, but there was discussion about including the same thing for the age of a person, or some hairstyles.
So, the symfony/intl component could very soon be 50GB, and a short while later 10^80TB. But there's no way it reduces in size... or even slows its growth.
And.... where is the problem ?
I see negative effect on three very different layers.
Developper Experience
Whether these values are low or not in absolute terms (and I have no doubt that everyone will have their own opinion on this)... the reality is that users are downloading a component that is twice as heavy as all the others combined... and this inevitably affects installation times, bandwidth, update times, static analysis, IDE indexing, etc. A prime example is Docker on macOS, which was a real pain until recently with Orbstack, and the performance nightmare was directly related to the number of files mounted in a volume.
Contributor experience
I've lost count of how many times I've seen a contributor propose a feature only to be told: it's userland. (Full disclosure, I understand and share this point of view). But it can be frustrating to see closed doors for a few classes, while at the same time Symfony contains hundreds of lines like 'young woman with dark hair and kid'
Real world consequences: Ecological & financial costs
I have no desire to open a debate (on either of those topics). But again, these small things have real-world consequences. We are talking about Symfony, so the impact is enormous, even on small matters
What is the real impact ?
Downloads data, as provided by packagist (collected today)
Let's agree on: "it's not without an impact".
Why is it used ?
For its quality
Please don't misinterpret my message. I'm not criticizing the value of the component or questioning its qualities. Besides, my opinion wouldn't have any value for that matter anyway. And i'm absolutely convinced a lot of people decide to install this component knowing what they do.
For another reason
But there are also people who install... Symfony. The recommended installation procedure, as outlined in the documentation on the website, is to install the web application skeleton, which requires symfony/intl.
To revisit the argument from earlier, I'm really not sure if anyone realizes after installation why its vendors directory is 80MB and what it's used for (young woman with...).
I'm unsure why symfony/intl is included by default in a new project, while other components are not. As a developer, I would appreciate the ability to install a small, lightweight application or to have more packages for the same amount of overhead :)
For another reason (bis)
"There is a third reason, and once again, I'm not fully understanding the situation (and may not have all the backstory required for it).
The Country validator requires symfony/intl to validate a given string as a valid ISO alpha country code. To do this, it tries to retrieve the list of country names (indexed by code) from the locale data. Consequently, it's not possible to use BIC, Country, Currency, and probably others without symfony/intl.
So, if a developer installs symfony/validator and then wants to validate a BIC, they cannot do so without downloading 80MB of locale-specific data.
Wouldn't it be simpler to have a couple of ISO classes/methods in the Validator component? Or perhaps create a small component just for that purpose? Because having to parse giant files just to check if "FR" is a valid ISO country code seems quite inefficient to me.
Well: Suggestions
So, personal conclusion and some suggestions..
TODO list
Sooner
Later
Discussions
--
Open to any feedback :)