Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Multilanguage support for markers used in data #4096

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
pekkaklarck opened this issue Sep 24, 2021 · 24 comments
Closed

Multilanguage support for markers used in data #4096

pekkaklarck opened this issue Sep 24, 2021 · 24 comments
Labels
acknowledge To be acknowledged in release notes alpha 1 enhancement help wanted Extra help appreciated priority: critical
Milestone

Comments

@pekkaklarck
Copy link
Member

Currently we can only use English with section headers, settings, and other such markers in the data. For example:

*** Settings ***
Documentation     Some doc here.

Supporting other languages would be great. For example, the above could look like this in Finnish:

*** Asetukset ***
Dokumentaatio    Jotain dokua täällä.

Getting this done requires:

  1. Deciding how to enable support for different languages. Probably with a command line option like --language FI.
  2. Deciding how to handle multiple languages in one execution. Could be --language FI --language EN to allow customizing how a certain language is used e.g. like --language FI:some:config.
  3. Deciding what to translate. Section headers and settings for sure, but what about control structures like IF/ELSE? Perhaps we should allow translating everything and let translators to select do they want to use common IF/ELSE or some local variant. Could also be customizable when enabling support for a certain language.
  4. Getting translations for built-in support for common languages. These can be added in future releases.
  5. Deciding how to use languages that don't have built-in support. Perhaps --language mylang.txt.

I hope the wider community can help with translations when we get that far. Comments about the design decisions above are appreciated as well.

I consider making Robot Framework more native in non-English speaking languages pretty important and thus assign this to RF 5.0 scope with critical priority. The old issue about non-English Given/When/Then prefixes (#519) is highly related and already in RF 5.0 scope.

There are many other related enhancements like translating log and report, translating error message, translating keywords used in generic libraries, and so on. They all require their own issues and probably need to wait for future RF versions.

@pekkaklarck pekkaklarck added this to the v5.0 milestone Sep 24, 2021
@bhirsz
Copy link
Contributor

bhirsz commented Sep 24, 2021

Leaving myself here so I will remember to add option for robotidy to autotranslate tokens. And it would be interesting to add support for polish. Although I don't have idea how to translate some of the keywords such as [Teardown] (probably not [Zburzyć] :D). Maybe to avoid problems with not accurate translations it would be good idea to let people vote on certain translations? Or use https://crowdin.com .

@pekkaklarck
Copy link
Member Author

Yeah, coming up with good translations for some of these terms is going to be hard. No idea what Teardown could be in Finnish either. A major problem related to this is that changing terms would be backwards incompatible. Perhaps we should support English always so that hard-to-translate terms could be initially skipped.

@Snooz82
Copy link
Member

Snooz82 commented Oct 1, 2021

Great Idea.

@pekkaklarck pekkaklarck modified the milestones: v5.0, v5.1 Jan 17, 2022
@pekkaklarck
Copy link
Member Author

We decided to move this one and highly related non-English Given/When/Then support (#519) to RF 5.1. That wasn't done because we don't consider these issues important, on the contrary, but simply because RF 5.0 already contains lot of great new features and we want to get it out as soon as possible. These translation related issues will be the highest priority issues in RF 5.1 and the plan is to start its development right after RF 5.0 has been released.

@yanne
Copy link
Member

yanne commented Apr 9, 2022

We need to do some technical choices before proceeding with the implementation.
The goals here are:

  • offer simple way for users to provide translations
  • use a file format which can be parsed easily with the Python standard library
  • introduce no performance penalty

Since this is mostly a lookup problem (does a given string, say *** testitapaukset ***, represent a valid token), some kind of configuration file is most likely appropriate. A full-blown translation framework like gettext might be too complicated.

In addition, it needs to be decided whether we support multiple forms for some tokens. For example, currently the table headers allow both singular and plural English form (test case and test cases). For the sake of simplicity and clarity, it might be better to allow just one form in the translations.

Based on all the above, some options for the configuration file format are:

Of these, INI files and JSON are the ones having parsers in the Python standard library.
JSON is quite unwieldy to edit by hand and it does not support comments, and additionally no benefit is gained by it's support for a wider range of data types than INI files.

@mkorpela
Copy link
Member

mkorpela commented Apr 9, 2022

I would really like to also results be considered. How does log.html look like
and could we allow to some extent results automatic translations.

@mkorpela
Copy link
Member

mkorpela commented Apr 9, 2022

@yanne one thing I would also think are the existing standards and work beyond Python standard libs.
JSON is pretty much used in projects like i18next and all over the content of internet (like in headless cms).
It might actually result in best tooling for content creation for the translations.

@pekkaklarck
Copy link
Member Author

Translating log and report is important but it would most likely be better to update their underlying tech first. We are planning to do such tech update with Libdoc already in RF 5.1 (#4304) and if that goes well then log/report could be next.

@d-biehl
Copy link
Contributor

d-biehl commented Apr 9, 2022

Regarding technical choices:

What about the python gettext API, it comes with python and is a well established standard, editing and handling PO files is easy and there are several tools wich helps you to write your translations.

@MoreFamed
Copy link

Yeah, coming up with good translations for some of these terms is going to be hard. No idea what Teardown could be in Finnish either. A major problem related to this is that changing terms would be backwards incompatible. Perhaps we should support English always so that hard-to-translate terms could be initially skipped.

This supporting English forever would solve (or work-around) the problem of deciding what to translate and what to leave in English. I rather wish RF not to enforce using translations even if they are available---then everyone can decide whether e.g. whether they will write RETURN in English or in their language. (This is Robocop what I probably wish to (optionally) check whether a translation was used wherever possible.)

@pekkaklarck
Copy link
Member Author

gettext is an option later but an overkill now. We just need to match strings during parsing.

The plan is to support English as an alternative automatically. In the future we may make that configurable if there are needs. Notice also that at least in the beginning we don't plan to support translating control structures like RETURN or IF. Support for that can be added later.

@pekkaklarck
Copy link
Member Author

pekkaklarck commented May 24, 2022

📢 This comment lists everything that can be translated in RF 5.1.

Markers to translate, covered by this issue, are just headers and settings. With headers we support both singular and plural formats in English and we can support them also with other languages. For test related headers and settings we have task related aliases.

Other translations in RF 5.1 are Given/When/Then prefixes used in behavior-driven development (BDD) which is coveted by issue #519. With these prefixes it is probably a good idea to look at what translations Cucumber uses.

Headers:

  • Settings, Setting
  • Variables, Variable
  • Test Cases, Test Case
  • Tasks, Task
  • Keywords, Keyword

Settings in the Settings section:

Settings for tests/tasks:

  • Documentation
  • Tags
  • Setup
  • Teardown
  • Template
  • Timeout

Settings for keywords:

  • Documentation
  • Arguments
  • Teardown
  • Timeout
  • Tags
  • Return (Deprecated and won't support translations)

BDD prefixes (#519)

  • Given
  • When
  • Then
  • And
  • But

@pekkaklarck
Copy link
Member Author

pekkaklarck commented May 24, 2022

We obviously need help from the community with translations. We have decided to use the #localization channel on our Slack as the main collaboration forum and separate channels can be created for each language there as well. If you are interested to help but haven't yet joined out Slack community, you can get an invite here.

@leeuwe
Copy link
Member

leeuwe commented Jun 4, 2022

If anyone wants to join in on the translation of RF please submit your translation here https://robotframework.crowdin.com/robot-framework

Current translation status:
Crowdin

yanne added a commit that referenced this issue Jun 8, 2022
yanne added a commit that referenced this issue Jun 8, 2022
@leeuwe
Copy link
Member

leeuwe commented Jun 13, 2022

This zip file contains:

pekkaklarck added a commit that referenced this issue Jun 14, 2022
This was needed for testing purposes (and tests were added as well).
Need to be still updated based on what's agreed on at
https://robotframework.crowdin.com/robot-framework
@pekkaklarck
Copy link
Member Author

pekkaklarck commented Jun 14, 2022

I added initial translations for Finnish for testing and demonstration purposes in bfec5a5. Everything that was needed is the new Fi class in the languages module. The same approach obviously works also with other languages.

I would prefer languages to be added via pull requests by their respective authors to have contact info for them automatically stored. It's fine to leave some of the terms untranslated, the English terms will nevertheless work. You many also decide not to translate [Return] because it will be deprecated and removed in the somewhat near future anyway.

@pekkaklarck
Copy link
Member Author

Notice that the class in the languages module will be the name of the language to use from the command line. Matching name is case-insensitive so, for example, the aforementioned Fi class can be used like --language fi.

Language variants such as Brazilian Portuguese could have a class name like PtBr and be used like --language ptbr. If the usage would be more convenient like pt-br or pt_br, we can ignore - and/or _ when matching. Alternatively we could make it possible for each class to provide a custom name to be used with them. Do you @HelioGuilherme66 have a preference?

@emanlove
Copy link
Member

emanlove commented Jun 14, 2022

I would suggest we use and stick to whatever the commonly accepted syntax for describing languages is. I think is the ISO standard - ISO 639-1 and then the additional macrolanguages. I would stick to the hyphen and be strict in interpreting it.

@HelioGuilherme66
Copy link
Member

HelioGuilherme66 commented Jun 14, 2022

Better use the hyphen then. The following should be accepted: pt (this is the preferable, nice to have a lookup for regional LANG code page, and switch to it, otherwise, default would be pt-pt), pt-pt and pt-br.
There may be more regional variants, like for example, Angola.
(I haven't looked at code, but maybe a specific Robot Framework language environment variable could exist):

No --language option ==> Lookup ROBOT_LANG ==> English (en) If None else ROBOT_LANG

With --language pt ==> Lookup ROBOT_LANG ==> pt-pt If None else ROBOT_LANG (user would need --language en to bypass ROBOT_LANG)

With --language pt-pt ==> no lookup for ROBOT_LANG ==> pt-pt

With --language pt-br ==> no lookup for ROBOT_LANG ==> pt-br

With --language fi ==> no lookup for ROBOT_LANG ==> fi (applicable for all languages with no regional variants)

@pekkaklarck
Copy link
Member Author

I don't think we need an environment variable to control the language. We already support ROBOT_OPTS and giving it a value like --language pt has the same effect as the proposed ROBOT_LANG with value pt. I'd also rather think about adding support for config files than add more environment variables. I'm not entirely against the idea, though, so you can submit a separate issue to discuss it further.

@pekkaklarck
Copy link
Member Author

I don't want to make handling language codes too complicated at least in RF 5.1. For example, I don't want to do any extra lookups if someone uses pt and not pt-pt. We just need to decide which one to use or make it so that both mean the same thing.

The hyphen in pt-br is easiest to handle so that we simply ignore it when matching the given language code to the available ones. Then the class name can be just PtBr and matching it works. Alternatively we could require the class name to be Pt_Br in this case or require the class to have an attribute telling the language code to use. I don't think these have any concrete benefits over simply ignoring the hyphen in matching. That obviously means that ptbr works in addition to pt-br, but I don't think that really matters.

@pekkaklarck
Copy link
Member Author

Big thanks for @leeuwe for setting up Crowdin to have a place where the community can work together with translations. Thanks also to everyone who has already participated in translation efforts.

If you want to test translations created at Crowdin, you need to download the language file (seems to be named Core.yml) and convert it to Python that Robot Framework understands. I have created a script that can do conversion automatically. Just download the script and run it like this:

python crowdin.py Core.yml lang.py

The generated Python code can be used directly with Robot Framework like robot --language lang.py tests.robot. The generated class can also be added to Robot's languages module via a pull request to get it included as a built-in language.

@pekkaklarck
Copy link
Member Author

The support for localized markers ought to be done now. We still need to add actual translations and document all this, but that's covered by #4390.

@pekkaklarck
Copy link
Member Author

@d-biehl helped making the public robot.api.Languages API used by external tools like editor plugins better. See #4494 for details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
acknowledge To be acknowledged in release notes alpha 1 enhancement help wanted Extra help appreciated priority: critical
Projects
None yet
Development

No branches or pull requests

10 participants