Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[RFC] Exclude locales from language codes #33146

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ro0NL opened this issue Aug 13, 2019 · 20 comments
Closed

[RFC] Exclude locales from language codes #33146

ro0NL opened this issue Aug 13, 2019 · 20 comments
Labels
Intl RFC RFC = Request For Comments (proposals about features that you want to be discussed)

Comments

@ro0NL
Copy link
Contributor

ro0NL commented Aug 13, 2019

In #33136 @TerjeBr discovered a few weird "language codes", e.g.

"en_AU": "Australian English",
"en_CA": "Canadian English",
"en_GB": "British English",
"en_US": "American English",

These are a combination of language code + country code, thus a locale actually.

I don't think one should be able to select e.g. en_AU as a language code, that would be unexpected data. In this case one should simply select en (English) for a language, and e.g. compose it with a known country code to get a locale, or select a pre-defined locale from the list.

So ultimately this data would be replaced by the locale domain, e.g.

This means we lose some specific language name translations, but i tend to believe we should do it. (IMHO it's not worth to preserve Australian English somewhere when translating a locale, English (Australia) could be sufficient)

@TerjeBr
Copy link

TerjeBr commented Aug 13, 2019

I disagree with this.

I agree it could be useful to exclude it from a language drop down list in a form component.
But the codes should not be excluded from the list of languages. At the very least it should be possible to get the language name with Languages::getName().

Here is the full list of the 22 entries we are talking about:

  'en_US' => 'American English',
  'en_AU' => 'Australian English',
  'de_AT' => 'Austrian German',
  'pt_BR' => 'Brazilian Portuguese',
  'en_GB' => 'British English',
  'en_CA' => 'Canadian English',
  'fr_CA' => 'Canadian French',
  'sw_CD' => 'Congo Swahili',
  'fa_AF' => 'Dari',
  'pt_PT' => 'European Portuguese',
  'es_ES' => 'European Spanish',
  'nl_BE' => 'Flemish',
  'es_419' => 'Latin American Spanish',
  'es_MX' => 'Mexican Spanish',
  'ar_001' => 'Modern Standard Arabic',
  'ro_MD' => 'Moldavian',
  'sr_ME' => 'Montenegrin',
  'zh_Hans' => 'Simplified Chinese',
  'fr_CH' => 'Swiss French',
  'de_CH' => 'Swiss High German',
  'zh_Hant' => 'Traditional Chinese',
  'nds_NL' => 'West Low German',

How can you get f.ex. "Dari", "Traditional Chinese" or "Simplified Chinese" without this list?

It is also very useful as a reference to which language and locale combinations that actually defines a separate language.

@fabpot
Copy link
Member

fabpot commented Aug 13, 2019

fr_CA is different from fr_FR and fr_CH. That's not the same language.

@TerjeBr
Copy link

TerjeBr commented Aug 13, 2019

Or would you list f.ex. "Flemish" as "Dutch (Belgium)"

@ro0NL
Copy link
Contributor Author

ro0NL commented Aug 13, 2019

Or would you list f.ex. "Flemish" as "Dutch (Belgium)"

Using "Dutch (Belgium) / French (Belgium) is common yes (at least in technical implementations), but i agree these names (e.g. Flemish) are also common in general. So the data has value, fair :)

At the very least it should be possible to get the language name with Languages::getName().

This takes a ISO language code, en_AU is not a language code (en is). So im skeptical about solving it in the Languages domain.

However, we could leverage this data when translating a locale, e.g. a lookup for specialized language names around here:

$name = str_replace(['(', ')'], ['[', ']'], $reader->readEntry($tempDir.'/lang', $displayLocale, ['Languages', \Locale::getPrimaryLanguage($locale)]));

I agree for the locale selector Flemish (Belgium) would be better then Dutch (Belgium), for the language selector i'd expect Dutch only (https://en.wikipedia.org/wiki/Flemish - it's a dialect really)

@TerjeBr
Copy link

TerjeBr commented Aug 13, 2019

What about these lines in the list of locales:

"zh_Hans": "Chinese (Simplified)",
"zh_Hans_CN": "Chinese (Simplified, China)",
"zh_Hans_HK": "Chinese (Simplified, Hong Kong SAR China)",
"zh_Hans_MO": "Chinese (Simplified, Macao SAR China)",
"zh_Hans_SG": "Chinese (Simplified, Singapore)",
"zh_Hant": "Chinese (Traditional)",
"zh_Hant_HK": "Chinese (Traditional, Hong Kong SAR China)",
"zh_Hant_MO": "Chinese (Traditional, Macao SAR China)",
"zh_Hant_TW": "Chinese (Traditional, Taiwan)",

It clearly shows that zh_Hans and zh_Hant should be considered separate language codes that can be combined with a separate country code.

@ro0NL
Copy link
Contributor Author

ro0NL commented Aug 13, 2019

Hans/Hant is the script code, to me for both the language is "Chinese".

Same for en_US vs en_GB, we consider it the "English" language.

@TerjeBr
Copy link

TerjeBr commented Aug 13, 2019

'es_419' => 'Latin American Spanish',
'ar_001' => 'Modern Standard Arabic',

are also entries that does not follow the language_country pattern.

I think all this is data that is worth exposing to the programmers.
Then they can easily filter out what they don't need. But it is impossible to add what is not there in the first place.

@ro0NL
Copy link
Contributor Author

ro0NL commented Aug 13, 2019

are also entries that does not follow the language_country pattern.

they do :) 419 /001 are valid region codes as well, so part of the locale still.

@TerjeBr
Copy link

TerjeBr commented Aug 13, 2019

Yes, but my point is that this is special language/region combinations that has their own names as languages in the ISO 639-1 specification. I think we should keep them. Otherwise it will also be a BC break.

@ro0NL
Copy link
Contributor Author

ro0NL commented Aug 13, 2019

true, we removed codes before though. But yes, it's tricky and technically a BC break / but maybe a bugfix also 😓

From a best practice point of view, i'd argue the usefulness of the language selector itself. Using the locale selector avoids these issues and will provide the language code as well (but based on context).

@javiereguiluz javiereguiluz added the RFC RFC = Request For Comments (proposals about features that you want to be discussed) label Aug 13, 2019
@TerjeBr
Copy link

TerjeBr commented Aug 13, 2019

I am sorry, I do not follow you. What do you mean by "language selector", "language code" and "locale selector"?

@ro0NL
Copy link
Contributor Author

ro0NL commented Aug 13, 2019

see e.g. #28833

when we exclude "language codes" from the Intl component, we directly impact the form type (what i call the language selector)

@TerjeBr
Copy link

TerjeBr commented Aug 13, 2019

Well, the form type can implement it's own filter. This is useful in many other programming situations than just a language selector in a form.

What is then the "locale selector" you are talking about?

@TerjeBr
Copy link

TerjeBr commented Aug 13, 2019

At the very least it should be possible to get the language name with Languages::getName().

This takes a ISO language code, en_AU is not a language code (en is). So im skeptical about solving it in the Languages domain.

But ISO 639-1 claims that en_AU is a language code.

@ro0NL
Copy link
Contributor Author

ro0NL commented Aug 13, 2019

What is then the "locale selector" you are talking about?

The locale form type :)

not sure the form should filter the list itself, it's the same data domain. So the question should be answered in the Intl component IMHO.

But ISO 639-1 claims that en_AU is a language code.

Can you share a reference for that?

@TerjeBr
Copy link

TerjeBr commented Aug 13, 2019

Also, locale names are not the same as language names. If I want a name for a language, I would not expect to have to look for it among the locale names.

@ro0NL
Copy link
Contributor Author

ro0NL commented Aug 13, 2019

locale names are not the same as language names

but a locale name includes the language name

If I want a name for a language, I would not expect to have to look for it among the locale names.

what about updating Languages::getName to take a locale then, it seems reasonably because a language code also qualifies as a locale.

@TerjeBr
Copy link

TerjeBr commented Aug 13, 2019

What is then the "locale selector" you are talking about?

The locale form type :)

Ok. Good that we got that cleared up.

not sure the form should filter the list itself, it's the same data domain. So the question should be answered in the Intl component IMHO.

I think it should filter the list, if it finds it useful to have a shorter list. Then other use cases can benefit from the full list.

But ISO 639-1 claims that en_AU is a language code.

Can you share a reference for that?

Sorry, I guess you are right about that. It is not a part of the standard. I assumed it was because it was in the data we got from the source bundle.

I guess this is the authorative list for the ISO 639 standard: http://www.loc.gov/standards/iso639-2/php/code_list.php

@TerjeBr
Copy link

TerjeBr commented Aug 13, 2019

but a locale name includes the language name

Ok. Fair enough. (But a bit roundabout way to get the language name.)

what about updating Languages::getName to take a locale then, it seems reasonably because a language code also qualifies as a locale.

Yes, I find that reasonable. It should take a language code or a locale and return the best fit for a language that it can find.

@TerjeBr
Copy link

TerjeBr commented Aug 13, 2019

We could also either add an argument to getNames or add another method, so we can both get an exhaustive list of all possible arguments to getName and also (less important in my opinion) a more reasonable limited list for use in a language selector.

@xabbuh xabbuh added the Intl label Aug 14, 2019
@fabpot fabpot closed this as completed Sep 27, 2019
fabpot added a commit that referenced this issue Sep 27, 2019
…lized language names) (ro0NL)

This PR was merged into the 4.4 branch.

Discussion
----------

[Intl] Excludes locale from language codes (split localized language names)

| Q             | A
| ------------- | ---
| Branch?       | 4.4
| Bug fix?      | no
| New feature?  | yes
| BC breaks?    | no     <!-- see https://symfony.com/bc -->
| Deprecations? | no
| Tests pass?   | yes    <!-- please add some, will be required by reviewers -->
| Fixed tickets | #33146
| License       | MIT
| Doc PR        | symfony/symfony-docs#... <!-- required for new features -->

(includes #33140)

Commits
-------

1a9f517 [Intl] Excludes locale from language codes (split localized language names)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Intl RFC RFC = Request For Comments (proposals about features that you want to be discussed)
Projects
None yet
Development

No branches or pull requests

5 participants