Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[Intl] Languages and ISO 639-2 three-letter language codes #33136

Closed
@TerjeBr

Description

@TerjeBr

In my previous PR Support ISO 3166-1 Alpha-3 country codes I added support not only for alpha-3 country codes, but also extended the Languages class to support ISO 639-2 three-letter language codes. My focus was on the country codes, and with the country codes it was easy to get it right because all countries have one alpha2 code and one alpha3 code.

For the languages things is a bit more complicated. The extension to the Languages class was made to just mirror the Countries class, and not enough thought went into it. Here are some of the problems I can now see (after thinking about it) that we have with the Languages:

Wrong terminology

For languages we have ISO 639-1 that cover two-letter codes, and ISO 639-2 that cover three letter codes. Nowhere in the ISO specifications do they talk about "alpha2" or "alpha3". Those terms are borrowed from the ISO 3166-1 spec that covers country/region codes. So anyone familiar with ISO 639-1 and ISO 639-2 will wonder why are we in the code have method names and variable names with "alpha2" and "alpha3" in them.

Missing languages

This is a more serious issue. Not all languages have an ISO 639-1 two letter code. Here are some examples of the exeptions:

  • 409 languages (out of 615) has only a three letter code, but no corresponding two letter code. F.ex.
 ace => Achinese
 ach => Acoli
 arz => Egyptian Arabic
  • 22 languages must be described with more than three letters
 en_AU => Australian English
 de_AT => Austrian German
 zh_Hans => Simplified Chinese
  • 4 languages has only a two letter code, and no three letter code
 no => Norwegian
 sh => Serbo-Croatian
 tl => Tagalog
 tw => Twi
  • Only 180 languages has a two-letter to three letter mapping.

Implications for Languages methods

  • Languages::getAlpha3Code: This is the only one that is not new. It throws MissingResourceException for all but the 180 languages in the last category. Does not seem right to me. May be it should accept as input all longer than 2 codes and return them unchanged if it is a valid language code?
  • Languages::getAlpha2Code: Same considerations as above. Currently throws MissingResourceException for all but the 180 languages that has a 2 letter code.
  • Languages::getAlpha3Codes(): Currently only returns 180 codes.
  • Languages::alpha3CodeExists: Now it only returns true for the above mentioned 180 codes.
  • Languages::getAlpha3Name: Throws MissingResourceException if the language is not among the 180.
  • Languages::getAlpha3Names: Only returns a list of 180 languages. By contrast Languages::getNames returns a list of 615 languages.

What to do

I am seeking input in this issue from others on what to do about this. Please comment below.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions