Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[Intl] Apply localeDisplayPattern and fix locale generation #31337

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
May 7, 2019
Merged

[Intl] Apply localeDisplayPattern and fix locale generation #31337

merged 2 commits into from
May 7, 2019

Conversation

ro0NL
Copy link
Contributor

@ro0NL ro0NL commented May 1, 2019

Q A
Branch? 3.4
Bug fix? yes
New feature? no
BC breaks? no (few data changes)
Deprecations? no
Tests pass? yes (including intl-data group)
Fixed tickets #...
License MIT
Doc PR symfony/symfony-docs#...

See https://github.com/unicode-org/icu/blob/e2d85306162d3a0691b070b4f0a73e4012433444/icu4c/source/data/lang/en.txt#L1281-L1285

Technically, this should be applied here:

$name .= ' ('.implode(', ', $extras).')';

This PR aims to implement it, but before it got to this point i noticed a false positive during generation (AFAIK). The current state solves this issue first.

While at it, i cleaned up dead-code in update-data.php to reduce the noise.

}

// Some languages are translated together with their region,
// i.e. "en_GB" is translated as "British English"
Copy link
Contributor Author

@ro0NL ro0NL May 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not applicable; we get the language name based on Locale::getPrimaryLanguage, which is en for en_GB already.

Thus we translate the expected language name already.


// Some languages are not translated
// Example: "az" (Azerbaijani) has no translation in "af" (Afrikaans)
if (null === ($name = $this->languageDataProvider->getName($lang, $displayLocale))) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

null API is not applicable, it throws instead. This is try/catched on the calling side already

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

besides, this case is translated in fact :D

// i.e. "en_GB" is translated as "British English"
// we don't include these languages though because they mess up
// the name sorting
// $name = $this->langBundle->getLanguageName($displayLocale, $lang, $region);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dead code ($this->langBundle)

}

// "as" (Assamese) has no "Variants" block
//if (!$langBundle->get('Variants')) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dead code ($langBundle)

// For example, in German, zh_Hans is "Chinesisch (vereinfacht)".
// The latter is the script part which is already included in the
// extras and will be appended again with the other extras.
if (preg_match('/^(.+)\s+\([^\)]+\)$/', $name, $matches)) {
Copy link
Contributor Author

@ro0NL ro0NL May 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this causes the false positive, as some primary language names matches

@@ -409,8 +409,8 @@
"my": "бірманская",
"my_MM": "бірманская (М’янма (Бірма))",
"nb": "нарвежская (букмол)",
"nb_NO": "нарвежская (Нарвегія)",
"nb_SJ": "нарвежская (Шпіцберген і Ян-Маен)",
"nb_NO": "нарвежская (букмол) (Нарвегія)",
Copy link
Contributor Author

@ro0NL ro0NL May 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we look at the primary language, e.g. nb we see it already includes (букмол) (this seems consistent across the changeset).

Other locales use different/no interpunction here, e.g. en

        "nb": "Norwegian Bokmål",
        "nb_NO": "Norwegian Bokmål (Norway)",

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Google confirms букмол means Bokmål :)

@@ -37,6 +37,7 @@
"ks": "kashmiri",
"ks_IN": "kashmiri (Inde)",
"lu": "luba-katanga",
"lu_CD": "luba-katanga (Congo-Kinshasa)",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"zh_MO": "кытай (Макао Махсус Идарәле Төбәге)",
"zh_SG": "кытай (Сингапур)",
"zh_TW": "кытай (Тайвань)"
"zh_CN": "кытай (тәрҗемә киңәше: аерым алганда, мандарин кытайчасы) (Кытай)",
Copy link
Contributor Author

@ro0NL ro0NL May 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks scary, but seems the legit translation of zh primary language :| (see https://github.com/unicode-org/icu/blob/master/icu4c/source/data/lang/tt.txt)

playing a bit with google translate i get Chinese (Interpretation of Interpretation: Mandarin Chinese), but im not sure if anyone speaks tt (Tatar?)

im still convinced the current regex wasnt anticipating it explicitly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as we already used this translation for zh directly (see above), im not sure we should care for now.

From here on we should rely on native speakers to add a possible "per case fix" IMHO.

@nicolas-grekas nicolas-grekas added this to the 3.4 milestone May 1, 2019
@ro0NL
Copy link
Contributor Author

ro0NL commented May 1, 2019

to obtain the localeDisplayPattern we need to get it from the region domain, as such we need to temporary cross compile it from the locale generator.

To do so, i've refactored the LocaleDataGenerator to extend from AbstractDataGenerator, just like the other data generators. It's makes it a bit easier to follow IMHO, and a decent amount of logic was duplicated.

cc @webmozart

@ro0NL ro0NL mentioned this pull request May 1, 2019
@ro0NL
Copy link
Contributor Author

ro0NL commented May 1, 2019

Don't use the characters "(" and ")", since they will be confusing in combination with countries or scripts in more complex language names. If you have to use brackets, use square ones: [ and ].
(http://cldr.unicode.org/translation/language-names)

I think this is up to ICU to follow yes/no, they don't :)

As such we can replace the () in the primary language name with [] genericly. Im hesitant to remove it as a whole, as it clearly creates a difference with other locales (en, nl) who still include it, but use different/no interpunction. Then if a native speaker confirms the [...] is nonsense, we remove it per case.

@ro0NL ro0NL changed the title [Intl] Apply localeDisplayPattern and fix locale generation [Intl] Fix locale name generation May 1, 2019
@ro0NL ro0NL marked this pull request as ready for review May 1, 2019 13:26
@ro0NL ro0NL changed the title [Intl] Fix locale name generation [Intl] Apply localeDisplayPattern and fix locale generation May 1, 2019
@ro0NL
Copy link
Contributor Author

ro0NL commented May 1, 2019

Ready for me 👍 , i'll send a follow up PR for master, to patch the timezone generator.

@ro0NL
Copy link
Contributor Author

ro0NL commented May 7, 2019

This diff should be re-applied on 4.2: https://www.diffchecker.com/m77BMN8j Sorry if this is causing inconvenience :)

The important change here is we expose BundleEntryReaderInterface to generators, this enables reading ICU data with fallback support, and helps us to get the correct display format or any other metadata needed.

I realized i've effectively rebuild this in master when adding timezones:

$accessor = static function (array $indices, &$inherited = false) use ($localeBundles, $accessor) {

This should leverage the BundleEntryReaderInterface instead.

@fabpot
Copy link
Member

fabpot commented May 7, 2019

@ro0NL Can you create a PR?

@ro0NL
Copy link
Contributor Author

ro0NL commented May 7, 2019

See #31402

@fabpot
Copy link
Member

fabpot commented May 7, 2019

Thank you @ro0NL.

@fabpot fabpot merged commit a20a6cc into symfony:3.4 May 7, 2019
fabpot added a commit that referenced this pull request May 7, 2019
…ion (ro0NL)

This PR was squashed before being merged into the 3.4 branch (closes #31337).

Discussion
----------

[Intl] Apply localeDisplayPattern and fix locale generation

| Q             | A
| ------------- | ---
| Branch?       | 3.4
| Bug fix?      | yes
| New feature?  | no
| BC breaks?    | no (few data changes)
| Deprecations? | no
| Tests pass?   | yes (including intl-data group)
| Fixed tickets | #...   <!-- #-prefixed issue number(s), if any -->
| License       | MIT
| Doc PR        | symfony/symfony-docs#... <!-- required for new features -->

See https://github.com/unicode-org/icu/blob/e2d85306162d3a0691b070b4f0a73e4012433444/icu4c/source/data/lang/en.txt#L1281-L1285

Technically, this should be applied here: https://github.com/symfony/symfony/blob/2b923a7c03e1ed1540c9dba224242defd9aa75cd/src/Symfony/Component/Intl/Data/Generator/LocaleDataGenerator.php#L211

This PR aims to implement it, but before it got to this point i noticed a false positive during generation (AFAIK). The current state solves this issue first.

While at it, i cleaned up dead-code in `update-data.php` to reduce the noise.

Commits
-------

a20a6cc recompile
29e8aba [Intl] Apply localeDisplayPattern and fix locale generation
@ro0NL ro0NL deleted the locale-format branch May 7, 2019 10:00
fabpot added a commit that referenced this pull request May 7, 2019
…neration (ro0NL)

This PR was squashed before being merged into the 4.2 branch (closes #31402).

Discussion
----------

[Intl][4.2] Apply localeDisplayPattern and fix locale generation

| Q             | A
| ------------- | ---
| Branch?       | 4.2
| Bug fix?      | yes
| New feature?  | no
| BC breaks?    | no     <!-- see https://symfony.com/bc -->
| Deprecations? | no
| Tests pass?   | yes    <!-- please add some, will be required by reviewers -->
| Fixed tickets | #...   <!-- #-prefixed issue number(s), if any -->
| License       | MIT
| Doc PR        | symfony/symfony-docs#... <!-- required for new features -->

Follow up of #31337 for 4.2

Commits
-------

c902c8a recompile
a3164de re-apply translator parents.json generation
294ae7a [Intl] Apply localeDisplayPattern and fix locale generation
fabpot added a commit that referenced this pull request May 7, 2019
…neration (ro0NL)

This PR was squashed before being merged into the 4.3-dev branch (closes #31403).

Discussion
----------

[Intl][4.3] Apply localeDisplayPattern and fix locale generation

| Q             | A
| ------------- | ---
| Branch?       | master
| Bug fix?      | yes
| New feature?  | no
| BC breaks?    | no     <!-- see https://symfony.com/bc -->
| Deprecations? | no
| Tests pass?   | yes    <!-- please add some, will be required by reviewers -->
| Fixed tickets | #...   <!-- #-prefixed issue number(s), if any -->
| License       | MIT
| Doc PR        | symfony/symfony-docs#... <!-- required for new features -->

Follow up of #31337 & #31402  for 4.3 / master

Commits
-------

ed04f24 recompile
504db27 re-apply api changes for TimezoneDataGenerator
c443e78 re-apply translator parents.json generation
bf50b61 [Intl] Apply localeDisplayPattern and fix locale generation
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants