fluent: add initial implementation#11928
fluent: add initial implementation#11928danielrainer wants to merge 6 commits intofish-shell:masterfrom
Conversation
f8ca253 to
fb6d7ad
Compare
|
The new commits run |
|
The new commits run `cargo run --package fish-fluent-check` in
Haven't looked into the code but any reason why we can't use "cargo test" ?
(I guess we'd need to declare the *.ftl files as inputs but we already do something similar for po/ in fish-gettext-maps)
I wonder if it's realistic to (long-term) move away from test_driver.py completely and do everything with `cargo test` (not sure if that's idiomatic.. handling subprocesses without a shell can be tricky).
I suppose we could use test_driver easier if we separated the "build" from the "run" step.
BTW we could also write an xtask as a common interface for running tests
cargo test-fish tests/checks/abbr.fish
cargo test-fish tests/pexpects/bind.py
cargo test-fluent
Just an idea, for discoverability.
`check.sh` and CI.
It probably wouldn't hurt to start sharing logic between the two. But of course that's unrelated.
|
fb6d7ad to
4346737
Compare
That's a good idea. I added a test which just runs
I think replacing
I'm generally in favor of replacing (or at least wrapping) the various shell scripts we have with xtasks, to have a unified interface for running everything. Regarding the interface, I'm not sure if it's better to have several different cargo aliases (e.g. |
11d1099 to
2e9b78a
Compare
|
I changed |
|
I changed `unic-langid` to `0.9.5`, since `0.9.6` requires Rust
1.82+.
yeah let's update to Debian Stable's 1.85 (and see if someone complains).
AFAIK this means we still support "macOS 10.12 Sierra (First released 2016)".
|
Sounds good. That version also allows us to migrate to the 2024 Rust edition if we want that. I just checked some of the other distros, and it seems that Fedora keeps up with stable on all releases and Ubuntu has Rust 1.85 as the default in 25.10. Should I open a PR? |
|
> yeah let's update to Debian Stable's 1.85 (and see if someone complains).
Sounds good. That version also allows us to migrate to the 2024 Rust
edition if we want that. I just checked some of the other distros,
and it seems that Fedora keeps up with stable on all releases and
Ubuntu has Rust 1.85 as the default in 25.10. Should I open a PR?
sure, both upgrading to 1.85 and 2024 edition sounds good
This means we can get rid of a bunch of
lints with the `// for old clippy` comment
and things like `// TODO: if-let-chains`.
Here's our poor man's renovatebot: #11960
|
Changing the edition is not that straightforward and might require a fairly large commit, including manual updates, so for now I'll just address the things which have become available in Rust 2021 with the MSRV update. |
2e9b78a to
1f55024
Compare
|
Parts of fluent are licensed apache2-only, which is incompatible with fish's GPLv2, so this is, as far as I can tell, legally unmergeable as-is.
|
|
The code we use is from https://github.com/projectfluent/fluent-rs, which includes both an apache2 and a MIT license file. I'm no expert on the legal situation here, but it seems to me that using the software under the terms of the MIT License is allowed and that this license does not require adding copyright information to our binaries. AFAICT, we don't include copyright info for any of our dependencies, only for software where the fish repo itself contains code derived from that software. |
|
Relevant issue: projectfluent/fluent-rs#31 |
|
It's not about including copyright information, it's that some of the dependencies for fluent-rs are still apache2-only. Apache2 is incompatible with GPLv2 because IIRC it includes a patent grant and the GPLv2 has a "no further restrictions" clause. So the combined product has a license that can't be followed. |
|
If that's the case, why can fluent-rs be MIT-licensed, but we can't use it under that license? |
|
Because the MIT license and the Apache license don't conflict (neither of them is "viral" the way the GPL is). The GPLv2 and the Apache license do, though. We can be using the fluent-rs crate under the MIT, but we can't be using some of its dependencies. Edit: The offending dependencies are: |
|
So MIT-licensed projects can use fluent-rs and its dependencies, including the apache-licensed ones, but we can't because fish is GPLv2 licensed? Should we ask the two apache-licensed projects about making their projects available under a license that allows us to use their software in fish? |
1f55024 to
00de838
Compare
Most of fluent-rs is already dual-licensed. This crate is not, which can make it harder for GPL-2-only projects to use fluent-rs. Fix that by allowing use under MIT lincense. All (or almost all?) nontrivial contributions seem to be from the same author, so this should be easy? Ref: projectfluent/fluent-rs#34 Ref: fish-shell/fish-shell#11928
Most of fluent-rs is already dual-licensed. This crate is not, which can make it harder for GPL-2-only projects to use fluent-rs. Fix that by allowing use under MIT lincense. All (or almost all?) nontrivial contributions seem to be from the same author, so this should be easy? Ref: projectfluent/fluent-rs#34 Ref: fish-shell/fish-shell#11928 Closes projectfluent#30
Most of fluent-rs is already dual-licensed. This crate is not, which can make it harder for GPL-2-only projects to use fluent-rs. Fix that by allowing use under MIT lincense. All (or almost all?) nontrivial contributions seem to be from the same author, so this should be easy? Ref: projectfluent/fluent-rs#34 Ref: fish-shell/fish-shell#11928 Closes projectfluent#30
|
Isn't self_cell Apache OR GPLv2? If so wouldn't it be compatible? |
Multiple gettext-extraction proc macro instances can run at the same time due to Rust's compilation model. In the previous implementation, where every instance appended to the same file, this has resulted in corruption of the file. This was reported and discussed in fish-shell#11928 (comment) for the equivalent macro for Fluent message ID extraction. The underlying problem is the same. The best way we have found to avoid such race condition is to write each entry to a new file, and concatenate them together before using them. It's not a beautiful approach, but it should be fairly robust and portable.
Multiple gettext-extraction proc macro instances can run at the same time due to Rust's compilation model. In the previous implementation, where every instance appended to the same file, this has resulted in corruption of the file. This was reported and discussed in fish-shell#11928 (comment) for the equivalent macro for Fluent message ID extraction. The underlying problem is the same. The best way we have found to avoid such race condition is to write each entry to a new file, and concatenate them together before using them. It's not a beautiful approach, but it should be fairly robust and portable.
Multiple gettext-extraction proc macro instances can run at the same time due to Rust's compilation model. In the previous implementation, where every instance appended to the same file, this has resulted in corruption of the file. This was reported and discussed in fish-shell#11928 (comment) for the equivalent macro for Fluent message ID extraction. The underlying problem is the same. The best way we have found to avoid such race condition is to write each entry to a new file, and concatenate them together before using them. It's not a beautiful approach, but it should be fairly robust and portable.
Multiple gettext-extraction proc macro instances can run at the same time due to Rust's compilation model. In the previous implementation, where every instance appended to the same file, this has resulted in corruption of the file. This was reported and discussed in fish-shell#11928 (comment) for the equivalent macro for Fluent message ID extraction. The underlying problem is the same. The best way we have found to avoid such race condition is to write each entry to a new file, and concatenate them together before using them. It's not a beautiful approach, but it should be fairly robust and portable. Closes fish-shell#12125
Multiple gettext-extraction proc macro instances can run at the same time due to Rust's compilation model. In the previous implementation, where every instance appended to the same file, this has resulted in corruption of the file. This was reported and discussed in fish-shell#11928 (comment) for the equivalent macro for Fluent message ID extraction. The underlying problem is the same. The best way we have found to avoid such race condition is to write each entry to a new file, and concatenate them together before using them. It's not a beautiful approach, but it should be fairly robust and portable. Closes fish-shell#12125
Multiple gettext-extraction proc macro instances can run at the same time due to Rust's compilation model. In the previous implementation, where every instance appended to the same file, this has resulted in corruption of the file. This was reported and discussed in #11928 (comment) for the equivalent macro for Fluent message ID extraction. The underlying problem is the same. The best way we have found to avoid such race condition is to write each entry to a new file, and concatenate them together before using them. It's not a beautiful approach, but it should be fairly robust and portable. Closes #12125
cd43fd7 to
0966c7c
Compare
|
Rebased on latest master and reworked a bit. See PR description for outstanding work. Asan does not seem to like the intentional leaks for creating |
|
Maybe the first step is to resurrect |
cd13c9f to
fbe929d
Compare
bd00c0c to
13f9192
Compare
|
Updates:
|
13f9192 to
d7e5239
Compare
|
Rebased on latest master. The only change to this PR is that unnecessary allocations for FTL file name strings have been removed. The lsan suppression is still necessary. |
d7e5239 to
368f3ed
Compare
368f3ed to
51b41a0
Compare
The extracted function takes the parts which are used by gettext-extract, as well as the upcoming fluent-extract, and puts it into its own crate. This will allow having simpler proc macros for both localization systems, since it minimizes duplicated code.
51b41a0 to
fdba4c3
Compare
Add an implementation allowing to use Fluent for localization in Rust.
Fluent is significantly more expressive than gettext. It uses message
IDs which, unlike in gettext, are not necessarily the default message
string. This allows for proper support of messages which happen to be
identical in English, but not in other languages. In gettext, this could
be solved to some extent with contexts, but our gettext implementation
does not support that. In Fluent, arguments to the message are specified
as key-value pairs, which gives translators more semantic information
and allows reordering the arguments in the translation, which is
impossible with gettext. Fluent also allows for more complex grammatical
features, such as different plural forms, grammatical cases, and
adapting phrases to the correct gender.
This commit only introduces the infrastructure for using Fluent instead
of gettext, with the goal of eventually replacing gettext for
localization in Rust. Making use of the new infrastructure is left to
follow-up commits.
To localize a message with Fluent, the new `localize!` macro should be
used. Its first argument is a Fluent message ID. This can either be a
string literal, or a constant defined via the `fluent_ids!` macro. The
remaining arguments are key-value pairs, with the keys being Fluent
argument/variable identifiers, and the values their corresponding values
in the localization.
Instead of using one key-value pair for variables, it is also possible
to pass a reference to a single `FluentArgs` struct, which is defined in
the `fluent` crate. This might be useful if repeated invocations of
`localize!` with similar variable values are desired.
The following example demonstrates the syntax:
`localize!("some-id", string_arg = "a string", number = 42)`
The result will be a `String`, formatted according to the rules in the
relevant FTL file. On errors, this macro panics.
At runtime, Fluent will look up the message ID in a Fluent Translation
List (FTL) file, according to the user's language settings. These files
are stored in `localization/fluent`. There is one file per language.
Language selection works the same as for our gettext implementation.
Because the source code does not contain a default version of the
message, it is the developer's responsibility to add an entry for the
message to the `en.ftl` file. Otherwise, localizing the message would
fail at runtime. To prevent this, automated checks are added which
extract the Fluent IDs defined in the source code and compare it to the
ones defined in `en.ftl`. It is considered an error if these two sets of
IDs are not identical. Checking this at build time allows us to rely on
always having the message available in English. Similar to gettext msgid
extraction, there is a proc-macro defined in `fluent-extraction` which
extracts message IDs into a directory specified via the
`FISH_FLUENT_ID_DIR` environment variable if the `fluent-extract`
feature is active. To avoid recompilation, `build_tools/check.sh` caches
the extracted IDs. In CI, no corresponding caching mechanism exists, so
there the test checking the IDs will invoke Cargo to build fish,
extracting the IDs. The `fish-fluent-check` crate performs these ID
checks.
`rust-embed` is used to make the FTL files available to the binary at
runtime. Files will only be parsed if they are specified in the language
precedence list, so users don't have to pay the parsing cost for
languages they don't want to see. `en.ftl` is always parsed, since it is
our implicit last fallback option.
Because the Fluent ecosystem currently lacks some tooling, we use our
own. It is implemented in an external library crate (currently hosted as
a personal repo), and made available via `cargo xtask fluent`
subcommands. The currently supported commands are:
- `check`: Checks the FTL files, ensuring that they can be parsed
without errors, that no duplicate IDs are specified, that they are
formatted correctly, and that there are no extra IDs, i.e. IDs not
present in `en.ftl`, which is expected to be complete. More rigorous
checks could be added, such as checking whether the same set of
variables are used for a certain ID in all languages. The complexity
of Fluent's syntax makes this non-trivial, which is the reason it's
not already implemented.
- `format`: Formats the specified FTL files (or all by default). Also
has a mode suitable for editor integration to format files from the
editor. Examples for setting that up in Vim are provided in the
`CONTRIBUTING.rst` docs.
- `rename`: Renames IDs or associated variables across all FTL files.
- `show-missing`: Shows which IDs don't have a translation yet.
The external crate contains one additional tool for converting messages
from gettext to Fluent. This is intentionally not added to fish, since
it is only useful for the transition. Once we have ported all messages
to Fluent we won't have a use for it anymore. If you are interested in
using it to port messages, it's the `po-convert` binary in the
`fluent-ftl-tools` package. The CLI is somewhat convoluted, but can be
simplified by wrapping it with a script which hard-codes the path to the
relevant PO and FTL file directories. Then, the remaining information
which needs to be specified is:
- a line number in a PO file to identify the message to be ported
- the new message ID
- the name of each variable, in the order the formatting specifiers
appear in the gettext msgid.
Specifying the line number and invoking the wrapper script can be
partially automated by using a custom editor shortcut.
The tool will port the msgstr for each language which has one defined,
and always for English, where it can use the msgid if no msgstr exists.
The tool does not edit Rust code, but suggests a Rust code snippet on
stdout based on the specified message ID and variable names.
This tooling relies on features of the `fluent` package which are not
exported by default, so we use a fork which changes that until our PR
for adding it upstream is accepted.
This migrates the fish version info message from gettext to Fluent. It can be used to see Fluent-based localization in action. Because this commit adds new FTL files, these languages show up in the Fluent language precedence, requiring an update to the corresponding tests.
Reword zh_CN as suggested in fish-shell#11833 (comment) fish -c 'for LC_MESSAGES in fr zh_CN zh_TW argparse h- end'
fdba4c3 to
f056b74
Compare
|
This is a fairly significant update. Tooling is now integrated and documented in I think now would be a good time for both developers and translators to check out the PR and test the functionality relevant to them. Reviewing the updated "Contributing Translations" section in |
Introduces Fluent localization. Refer to the commit messages for details.
TODOs:
fluent: add first messageports the version string to Fluent, but only for thefishexecutable, notfish_indent, norfish_key_reader. Should all three use the same ID, or do we want a different ID for each?