Thanks to visit codestin.com
Credit goes to github.com

Skip to content

fluent: add initial implementation#11928

Draft
danielrainer wants to merge 6 commits intofish-shell:masterfrom
danielrainer:fluent_localization
Draft

fluent: add initial implementation#11928
danielrainer wants to merge 6 commits intofish-shell:masterfrom
danielrainer:fluent_localization

Conversation

@danielrainer
Copy link

@danielrainer danielrainer commented Oct 10, 2025

Introduces Fluent localization. Refer to the commit messages for details.

TODOs:

  • Replace gettext calls by new API, at least for a few instances so we can see it work in action.
  • naming convention for message IDs
  • coordination for porting messages. If people modify PO files while we port messages to Fluent, we would have to redo some porting effort manually. To avoid this, we should either coordinate with translators to ensure no major changes happen to PO files while we port messages to Fluent, or we should add tooling which records translations and is then able to apply them to PO files to automatically port the new changes to Fluent. In total, we have a bit over 500 messages in the Rust sources, so porting them all is not trivial, but given decent tooling it should not take that long either.
  • How much ID sharing do we want? Fluent gives us the opportunity to define different IDs even if the English versions of the localized strings are identical. This would allow translators to have more possibilities to use correct grammar. I think this only really affects messages which have placeholders, plus messages which have the same English value, but non-identical semantics of the value, e.g. if the same word is used as a noun in one place and as a verb in another. One concrete example is the version string. Currently, the commit fluent: add first message ports the version string to Fluent, but only for the fish executable, not fish_indent, nor fish_key_reader. Should all three use the same ID, or do we want a different ID for each?

@danielrainer
Copy link
Author

The new commits run cargo run --package fish-fluent-check in check.sh and CI. I added it as separate jobs because running cargo in a test started from the test driver is not ideal, both because cargo can use multiple threads, which might result in more timeouts in CI, and because cargo prints to stderr, which is not ideal for checks, especially since the checks fail by panicking, which also prints to stderr. We could work around the latter issue by redirecting the output of the cargo command, checking cargo's exit status, print the redirected output on error, and delete the file we redirected to afterwards. (The last step might be handled by the test driver if we put it into the test's temporary HOME). Then, we should also exit with cargo's exit status, although the test driver doesn't care about that at the moment.

@krobelus
Copy link
Contributor

krobelus commented Oct 16, 2025 via email

@danielrainer
Copy link
Author

any reason why we can't use "cargo test" ?

That's a good idea. I added a test which just runs main in crates/fluent-check/src/main.rs. Since main indicates errors via panics, no additional logic is needed. For check.sh, we can still pass in an env var indicating where to look for extracted Fluent IDs, so the test does not need to recompile fish. In CI, no such mechanism exists for now, so there the test will recompile fish, but that also happened in the previous implementation, the difference being that before it only happened once in a dedicated job, whereas now it happens in every job which runs the tests.

I wonder if it's realistic to (long-term) move away from test_driver.py completely and do everything with cargo test

I think replacing test_driver.py with Rust code should be doable without too much effort, at least if it continues to be a dedicated program, instead of separate cargo tests for every script file. The harder part would be replacing littlecheck and pexpect. (I haven't looked into the latter at all.) From memory, I think the main challenges with having a cargo test per test script would be:

  • sharing compilation of the test helper
  • setting up parametrization which automatically creates a test case for every relevant script file

we could also write an xtask as a common interface for running tests

I'm generally in favor of replacing (or at least wrapping) the various shell scripts we have with xtasks, to have a unified interface for running everything. Regarding the interface, I'm not sure if it's better to have several different cargo aliases (e.g. cargo test-fish), or use cargo xtask for everything and add subcommands to that as desired. I chose the latter approach for ensuring that the fish version env var is set correctly for every cargo invocation, but I think it makes sense in general. E.g., that would make it easy to have cargo xtask help, which would be difficult to implement with multiple cargo aliases. It also reduces the likelihood of choosing an alias which might clash with a future built-in cargo command.

@danielrainer danielrainer force-pushed the fluent_localization branch 2 times, most recently from 11d1099 to 2e9b78a Compare October 16, 2025 21:34
@danielrainer
Copy link
Author

I changed unic-langid to 0.9.5, since 0.9.6 requires Rust 1.82+. While it might not make much of a difference for the concrete features in this specific instance, not updating our MSRV makes it increasingly painful to manage dependencies. It limits our ability to update existing dependencies and we miss out on many improvements made in more recent Rust versions. I'd really appreciate it if we don't wait until we have a concrete, urgent need to update some dependency which would then force us into a rushed MSRV update. Instead, we should finally come up with a sensible policy of updating our MSRV that's not just "we'll stick with 1.70 indefinitely". See #11679

@krobelus
Copy link
Contributor

krobelus commented Oct 16, 2025 via email

@danielrainer
Copy link
Author

yeah let's update to Debian Stable's 1.85 (and see if someone complains).

Sounds good. That version also allows us to migrate to the 2024 Rust edition if we want that. I just checked some of the other distros, and it seems that Fedora keeps up with stable on all releases and Ubuntu has Rust 1.85 as the default in 25.10. Should I open a PR?

@krobelus
Copy link
Contributor

krobelus commented Oct 16, 2025 via email

@danielrainer
Copy link
Author

both upgrading to 1.85 and 2024 edition sounds good

#11961

Changing the edition is not that straightforward and might require a fairly large commit, including manual updates, so for now I'll just address the things which have become available in Rust 2021 with the MSRV update.

@faho
Copy link
Member

faho commented Oct 20, 2025

Parts of fluent are licensed apache2-only, which is incompatible with fish's GPLv2, so this is, as far as I can tell, legally unmergeable as-is.

cargo deny check licenses would catch that.

@danielrainer
Copy link
Author

The code we use is from https://github.com/projectfluent/fluent-rs, which includes both an apache2 and a MIT license file. I'm no expert on the legal situation here, but it seems to me that using the software under the terms of the MIT License is allowed and that this license does not require adding copyright information to our binaries. AFAICT, we don't include copyright info for any of our dependencies, only for software where the fish repo itself contains code derived from that software.

@danielrainer
Copy link
Author

Relevant issue: projectfluent/fluent-rs#31

@faho
Copy link
Member

faho commented Oct 20, 2025

It's not about including copyright information, it's that some of the dependencies for fluent-rs are still apache2-only.

Apache2 is incompatible with GPLv2 because IIRC it includes a patent grant and the GPLv2 has a "no further restrictions" clause. So the combined product has a license that can't be followed.

@danielrainer
Copy link
Author

If that's the case, why can fluent-rs be MIT-licensed, but we can't use it under that license?

@faho
Copy link
Member

faho commented Oct 20, 2025

Because the MIT license and the Apache license don't conflict (neither of them is "viral" the way the GPL is). The GPLv2 and the Apache license do, though.

We can be using the fluent-rs crate under the MIT, but we can't be using some of its dependencies.

Edit: The offending dependencies are:

@danielrainer
Copy link
Author

danielrainer commented Oct 20, 2025

So MIT-licensed projects can use fluent-rs and its dependencies, including the apache-licensed ones, but we can't because fish is GPLv2 licensed?

Should we ask the two apache-licensed projects about making their projects available under a license that allows us to use their software in fish? fluent-langneg at least seems to be mostly written by people who also contribute to fluent-rs, so I would be surprised if they would object to dual-licensing. self_cell seems to be a fairly small project, both in terms of contributors and code size. If they are unwilling to use a compatible license, we could ask fluent-rs whether they would consider replacing the dependency.

krobelus added a commit to krobelus/fluent-langneg-rs that referenced this pull request Oct 21, 2025
Most of fluent-rs is already dual-licensed.  This crate is not,
which can make it harder for GPL-2-only projects to use fluent-rs.

Fix that by allowing use under MIT lincense. All (or almost
all?) nontrivial contributions seem to be from the same author,
so this should be easy?

Ref: projectfluent/fluent-rs#34
Ref: fish-shell/fish-shell#11928
krobelus added a commit to krobelus/fluent-langneg-rs that referenced this pull request Oct 21, 2025
Most of fluent-rs is already dual-licensed.  This crate is not,
which can make it harder for GPL-2-only projects to use fluent-rs.

Fix that by allowing use under MIT lincense. All (or almost
all?) nontrivial contributions seem to be from the same author,
so this should be easy?

Ref: projectfluent/fluent-rs#34
Ref: fish-shell/fish-shell#11928

Closes projectfluent#30
krobelus added a commit to krobelus/fluent-langneg-rs that referenced this pull request Oct 22, 2025
Most of fluent-rs is already dual-licensed.  This crate is not,
which can make it harder for GPL-2-only projects to use fluent-rs.

Fix that by allowing use under MIT lincense. All (or almost
all?) nontrivial contributions seem to be from the same author,
so this should be easy?

Ref: projectfluent/fluent-rs#34
Ref: fish-shell/fish-shell#11928

Closes projectfluent#30
@emilio
Copy link

emilio commented Oct 22, 2025

Isn't self_cell Apache OR GPLv2? If so wouldn't it be compatible?

danielrainer pushed a commit to danielrainer/fish-shell that referenced this pull request Dec 7, 2025
Multiple gettext-extraction proc macro instances can run at the same
time due to Rust's compilation model. In the previous implementation,
where every instance appended to the same file, this has resulted in
corruption of the file. This was reported and discussed in
fish-shell#11928 (comment)
for the equivalent macro for Fluent message ID extraction. The
underlying problem is the same.

The best way we have found to avoid such race condition is to write each
entry to a new file, and concatenate them together before using them.
It's not a beautiful approach, but it should be fairly robust and
portable.
danielrainer pushed a commit to danielrainer/fish-shell that referenced this pull request Dec 8, 2025
Multiple gettext-extraction proc macro instances can run at the same
time due to Rust's compilation model. In the previous implementation,
where every instance appended to the same file, this has resulted in
corruption of the file. This was reported and discussed in
fish-shell#11928 (comment)
for the equivalent macro for Fluent message ID extraction. The
underlying problem is the same.

The best way we have found to avoid such race condition is to write each
entry to a new file, and concatenate them together before using them.
It's not a beautiful approach, but it should be fairly robust and
portable.
danielrainer pushed a commit to danielrainer/fish-shell that referenced this pull request Dec 8, 2025
Multiple gettext-extraction proc macro instances can run at the same
time due to Rust's compilation model. In the previous implementation,
where every instance appended to the same file, this has resulted in
corruption of the file. This was reported and discussed in
fish-shell#11928 (comment)
for the equivalent macro for Fluent message ID extraction. The
underlying problem is the same.

The best way we have found to avoid such race condition is to write each
entry to a new file, and concatenate them together before using them.
It's not a beautiful approach, but it should be fairly robust and
portable.
krobelus pushed a commit to krobelus/fish-shell that referenced this pull request Dec 10, 2025
Multiple gettext-extraction proc macro instances can run at the same
time due to Rust's compilation model. In the previous implementation,
where every instance appended to the same file, this has resulted in
corruption of the file. This was reported and discussed in
fish-shell#11928 (comment)
for the equivalent macro for Fluent message ID extraction. The
underlying problem is the same.

The best way we have found to avoid such race condition is to write each
entry to a new file, and concatenate them together before using them.
It's not a beautiful approach, but it should be fairly robust and
portable.

Closes fish-shell#12125
krobelus pushed a commit to krobelus/fish-shell that referenced this pull request Dec 10, 2025
Multiple gettext-extraction proc macro instances can run at the same
time due to Rust's compilation model. In the previous implementation,
where every instance appended to the same file, this has resulted in
corruption of the file. This was reported and discussed in
fish-shell#11928 (comment)
for the equivalent macro for Fluent message ID extraction. The
underlying problem is the same.

The best way we have found to avoid such race condition is to write each
entry to a new file, and concatenate them together before using them.
It's not a beautiful approach, but it should be fairly robust and
portable.

Closes fish-shell#12125
krobelus pushed a commit that referenced this pull request Dec 10, 2025
Multiple gettext-extraction proc macro instances can run at the same
time due to Rust's compilation model. In the previous implementation,
where every instance appended to the same file, this has resulted in
corruption of the file. This was reported and discussed in
#11928 (comment)
for the equivalent macro for Fluent message ID extraction. The
underlying problem is the same.

The best way we have found to avoid such race condition is to write each
entry to a new file, and concatenate them together before using them.
It's not a beautiful approach, but it should be fairly robust and
portable.

Closes #12125
@danielrainer
Copy link
Author

Rebased on latest master and reworked a bit. See PR description for outstanding work.

Asan does not seem to like the intentional leaks for creating &'static strs. A similar approach is already in use for gettext, and I remember getting leak warnings previously when refactoring the gettext code as well. Not sure how to best address these failures.

@krobelus
Copy link
Contributor

Maybe the first step is to resurrect docker/jammy-asan.Dockerfile (ideally making it work with moving ubuntu versions). Assuming this is more convenient than github actions.
If we're reasonably sure it's an asan bug, we can maybe suppress allocations from the relevant functions via build_tools/lsan_suppressions.txt

@danielrainer danielrainer force-pushed the fluent_localization branch 3 times, most recently from cd13c9f to fbe929d Compare January 5, 2026 15:15
@danielrainer danielrainer mentioned this pull request Jan 5, 2026
1 task
@danielrainer danielrainer force-pushed the fluent_localization branch 3 times, most recently from bd00c0c to 13f9192 Compare January 8, 2026 18:40
@danielrainer
Copy link
Author

danielrainer commented Jan 8, 2026

Updates:

  • The detected leak is ignored as a false positive now.
  • ToFluentValue trait added, which allows using wide-string types and chars with the Fluent macros without having to add a to_string() call. For char, it might make sense to add an upstream implementation (impl From<char> for FluentValue)
  • fluent_ids! macro added for declaring Fluent IDs to be used in multiple places. Similar to the localizable_consts! macro we have for gettext.
  • Some commits were squashed together.

@danielrainer
Copy link
Author

Rebased on latest master. The only change to this PR is that unnecessary allocations for FTL file name strings have been removed. The lsan suppression is still necessary.

The extracted function takes the parts which are used by
gettext-extract, as well as the upcoming fluent-extract, and puts it
into its own crate. This will allow having simpler proc macros for both
localization systems, since it minimizes duplicated code.
Daniel Rainer and others added 5 commits February 28, 2026 03:32
Add an implementation allowing to use Fluent for localization in Rust.

Fluent is significantly more expressive than gettext. It uses message
IDs which, unlike in gettext, are not necessarily the default message
string. This allows for proper support of messages which happen to be
identical in English, but not in other languages. In gettext, this could
be solved to some extent with contexts, but our gettext implementation
does not support that. In Fluent, arguments to the message are specified
as key-value pairs, which gives translators more semantic information
and allows reordering the arguments in the translation, which is
impossible with gettext. Fluent also allows for more complex grammatical
features, such as different plural forms, grammatical cases, and
adapting phrases to the correct gender.

This commit only introduces the infrastructure for using Fluent instead
of gettext, with the goal of eventually replacing gettext for
localization in Rust. Making use of the new infrastructure is left to
follow-up commits.

To localize a message with Fluent, the new `localize!` macro should be
used. Its first argument is a Fluent message ID. This can either be a
string literal, or a constant defined via the `fluent_ids!` macro. The
remaining arguments are key-value pairs, with the keys being Fluent
argument/variable identifiers, and the values their corresponding values
in the localization.
Instead of using one key-value pair for variables, it is also possible
to pass a reference to a single `FluentArgs` struct, which is defined in
the `fluent` crate. This might be useful if repeated invocations of
`localize!` with similar variable values are desired.
The following example demonstrates the syntax:
`localize!("some-id", string_arg = "a string", number = 42)`
The result will be a `String`, formatted according to the rules in the
relevant FTL file. On errors, this macro panics.

At runtime, Fluent will look up the message ID in a Fluent Translation
List (FTL) file, according to the user's language settings. These files
are stored in `localization/fluent`. There is one file per language.
Language selection works the same as for our gettext implementation.
Because the source code does not contain a default version of the
message, it is the developer's responsibility to add an entry for the
message to the `en.ftl` file. Otherwise, localizing the message would
fail at runtime. To prevent this, automated checks are added which
extract the Fluent IDs defined in the source code and compare it to the
ones defined in `en.ftl`. It is considered an error if these two sets of
IDs are not identical. Checking this at build time allows us to rely on
always having the message available in English. Similar to gettext msgid
extraction, there is a proc-macro defined in `fluent-extraction` which
extracts message IDs into a directory specified via the
`FISH_FLUENT_ID_DIR` environment variable if the `fluent-extract`
feature is active. To avoid recompilation, `build_tools/check.sh` caches
the extracted IDs. In CI, no corresponding caching mechanism exists, so
there the test checking the IDs will invoke Cargo to build fish,
extracting the IDs. The `fish-fluent-check` crate performs these ID
checks.

`rust-embed` is used to make the FTL files available to the binary at
runtime. Files will only be parsed if they are specified in the language
precedence list, so users don't have to pay the parsing cost for
languages they don't want to see. `en.ftl` is always parsed, since it is
our implicit last fallback option.

Because the Fluent ecosystem currently lacks some tooling, we use our
own. It is implemented in an external library crate (currently hosted as
a personal repo), and made available via `cargo xtask fluent`
subcommands. The currently supported commands are:
- `check`: Checks the FTL files, ensuring that they can be parsed
  without errors, that no duplicate IDs are specified, that they are
  formatted correctly, and that there are no extra IDs, i.e. IDs not
  present in `en.ftl`, which is expected to be complete. More rigorous
  checks could be added, such as checking whether the same set of
  variables are used for a certain ID in all languages. The complexity
  of Fluent's syntax makes this non-trivial, which is the reason it's
  not already implemented.
- `format`: Formats the specified FTL files (or all by default). Also
  has a mode suitable for editor integration to format files from the
  editor. Examples for setting that up in Vim are provided in the
  `CONTRIBUTING.rst` docs.
- `rename`: Renames IDs or associated variables across all FTL files.
- `show-missing`: Shows which IDs don't have a translation yet.

The external crate contains one additional tool for converting messages
from gettext to Fluent. This is intentionally not added to fish, since
it is only useful for the transition. Once we have ported all messages
to Fluent we won't have a use for it anymore. If you are interested in
using it to port messages, it's the `po-convert` binary in the
`fluent-ftl-tools` package. The CLI is somewhat convoluted, but can be
simplified by wrapping it with a script which hard-codes the path to the
relevant PO and FTL file directories. Then, the remaining information
which needs to be specified is:
- a line number in a PO file to identify the message to be ported
- the new message ID
- the name of each variable, in the order the formatting specifiers
  appear in the gettext msgid.
Specifying the line number and invoking the wrapper script can be
partially automated by using a custom editor shortcut.
The tool will port the msgstr for each language which has one defined,
and always for English, where it can use the msgid if no msgstr exists.
The tool does not edit Rust code, but suggests a Rust code snippet on
stdout based on the specified message ID and variable names.
This tooling relies on features of the `fluent` package which are not
exported by default, so we use a fork which changes that until our PR
for adding it upstream is accepted.
This migrates the fish version info message from gettext to Fluent. It
can be used to see Fluent-based localization in action.

Because this commit adds new FTL files, these languages show up in the
Fluent language precedence, requiring an update to the corresponding
tests.
Reword zh_CN as suggested in
fish-shell#11833 (comment)

    fish -c 'for LC_MESSAGES in fr zh_CN zh_TW
        argparse h-
    end'
@danielrainer
Copy link
Author

This is a fairly significant update. Tooling is now integrated and documented in CONTRIBUTING.rst. I also ported some more messages and removed features which turned out to not be particularly useful, such as macros wrapping localize!. We might also want to drop support for passing &FluentArgs to localize!. I'm not sure yet if we have a use case for it.

I think now would be a good time for both developers and translators to check out the PR and test the functionality relevant to them. Reviewing the updated "Contributing Translations" section in CONTRIBUTING.rst should be a good start. Then, working with the relevant subcommands of cargo xtask fluent should provide an impression of the capabilities of the existing tooling.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants