Codestin Search App

danielrainer · 2025-11-24T23:31:03Z

Based on the discussion in
#11967

Introduce a status language builtin, which has subcommands for
controlling and inspecting fish's message localization status.

The motivation for this is that using only the established environment
variables LANGUAGE, LC_ALL, LC_MESSAGES, and LANG can cause
problems when fish interprets them differently from GNU gettext.
In addition, these are not well-suited for users who want to override
their normal localization settings only for fish, since fish would
propagate the values of these variables to its child processes.

Configuration via these variables still works as before, but now there
is the status language set command, which allows overriding the
localization configuration.
If status language set is used, the language precedence list will be
taken from its remaining arguments.
Warnings will be shown for invalid arguments.
Once this command was used, the localization related environment
variables are ignored.
To go back to taking the configuration from the environment variables
after status language set was executed, users can run status language unset.

Running status language without arguments shows information about the
current message localization status, allowing users to better understand
how their settings are interpreted by fish.

The status language list-available command shows which languages are
available to choose from, which is used for completions.

This commit eliminates dependencies from the gettext_impl module to
code in fish's main crate, allowing for extraction of this module into
its own crate in a future commit.

krobelus · 2025-11-30T10:07:39Z

src/builtins/status.rs

+                {
+                    streams.err.append(L!("fish was built with the `localize-messages` feature disabled. The `status locale` command is unavailable.\n"));
+                    return Err(STATUS_CMD_ERROR);
+                } else {


the else after return is a bit ugly,
if it's implied by cfg_if, that's another (superficial) reason for moving
from #[cfg(not(feature = "localize-messages"))] to if !cfg(feature = "localize-messages")

As it is implemented now, it wouldn't compile otherwise, because some of the gettext-related functions are only defined if the localize-messages feature is active. We could change that, but I think it is more robust to completely remove these functions when localization is disabled, since otherwise it could happen that someone tries to use the useless variant without realizing in a context where localize-messages is disabled.

I agree that the else after return is ugly, but I don't know a better way to implement this.

src/builtins/status.rs

src/wutil/gettext.rs

krobelus · 2025-11-30T10:07:40Z

src/wutil/gettext.rs

+        }
+        wgettext_fmt!(
+            "Language specifiers appear repeatedly: %s\n",
+            format!("{:?}", self.duplicates)


:? uses Debug so it uses Rust syntax ([x, y] for slices, Rust string escape sequences etc).
We do that for FLOG output but for user-facing things it would probably make more sense to use shell-like syntax (space separated).

I think that's crate::common::escape().
(For things that might be legacy-encoded keys, we use char_to_symbol and its extension for byte slices (DisplayBytes))

Not sure if we have a convenient way to call it.
I guess it's not super important, but maybe we should add something like join_escaped_strings next to join_strings
(or lift it to an iterator-based interface)

I primarily chose Debug syntax here because it is a simple way of showing a Vec. However, it also has the important advantage over plain, space-separated strings that it allows to distinguish between something like the following two variants:

set LANGUAGE a b set LANGUAGE 'a b'

This is especially important for the malformed case. Maybe we could use space-separation + quoting?

space separating escaped strings is unambiguous for all possible input values,
but of course the fact that it's space-separation doesn't become obvious until there's at least two elements.
I think in this particular case, it's already obvious from the left-hand-side that it can be multiple values.
So as long as all of them are like this, shell syntax seems more appropriate.

If people want to parse it, they can even use read -lat tokens

Historically we haven't had a lot of need to output real machine-readable (JSON/TOML) data,
but one related thing that feels icky is the parsing of status build-info output in share/functions/__fish_posix_shell.fish (especially the spaces in the key name).
I'm sure we can find a more robust solution without needing full JSON.

space separating escaped strings is unambiguous for all possible input values

So a b vs a\ b for my examples? That should work, although it does have the problem that it's less obvious that it's a list, even if there are multiple elements, IMO, and I don't find it particularly aesthetically pleasing. Having a format that supports automated parsing is certainly nice, though.

Another option would be putting each entry on its own line. That would make it obvious that there are multiple entries if there is more than one line. A single line would still be ambiguous without quoting/escaping. Parsing individual lines would be easier, but automatically determining which lines to parse would be harder.

status build-info

That's something where a machine-readable format would certainly help. If we can't find a format that's good enough for machine and human consumption we could also add something like a --json flag to the relevant commands.

by default, string escape a\ b outputs 'a b' which looks a bit nicer I guess.
I don't have super strong opinions but we should be consistent across builtins/functions.
I haven't looked at much related things; functions for example uses comma-separation if stdout is a TTY,
and newline-separation if it isn't.

For status locale, being machine-readable is probably not a priority.
Separate lines would be okay if it looks better in the TTY (and we have enough vertical space).
We could make it automatically output JSON if stdout is not a TTY so people are heavily discouraged from parsing the human-readable output.. but JSON doesn't really sound like fish. Might be better to add more "get" subcommands that print one item per line (if we ever need them).

Might be better to add more "get" subcommands that print one item per line

That's probably the best approach. Maybe one of

status locale get language-precedence

status locale messages get language-precedence

status locale get messages language-precedence

Depending on whether we want to keep options open for supporting other locale categories. Also not sure if we want several levels of subcommands or compress them into a single level by hyphenating.

For the default, not-necessarily-machine-readable format, if we go with space-separated escaped strings, should we use src/common.rs:escape?

That's probably the best approach. Maybe one of

status locale get language-precedence status locale messages get language-precedence status locale get messages language-precedence

I realized that status message-locale sounds a bit outdated, the modern term is probably
UI language, so we could call it status language?
The numeric thing could go into status number-format, if ever.

Then we would have status language precedence-list and status language list,
though I don't think we have a need for those, so I wouldn't add them until we do.

Depending on whether we want to keep options open for supporting other locale categories. Also
not sure if we want several levels of subcommands or compress them into a single level by
hyphenating.

Both could work depending on how many subcommands there will be,
but it sounds like we don't need either yet.
Nested subcommands are new for us, so we'd want to make sure we don't break things
like completion and error messages. Maybe we can introduce a proper data structure for a subcommand.
In future, we should maybe switch to clap but I don't know how much work it would be to migrate to that without breaking relevant things.

For the default, not-necessarily-machine-readable format, if we go with space-separated
escaped strings, should we use src/common.rs:escape?

yes. So we'll need that for displaying SetLocaleLints,
and for having status language print the precedence list,
(and the list of all available languages? I think that would be fine to add for discoverability,
and it should be obvious that it's not really meant to be parsed)

I made several changes in the version I just pushed:

Lists of languages are now formatted as space-separated lists of shell-escaped strings as you suggested. I put a util function for this into src/wutil/gettext.rs, but maybe there is a more suitable place for it, or the functionality already exists somewhere else.

The command is renamed to language. I decided to stick with the multiple levels of subcommands, because I think it makes sense to group this, especially because it allows showing a smaller, more relevant set of completions.

Completions are added. These introduce some custom logic, which I think improve over the logic used for other status completions. We might want to extract these functions into global functions such that they are more widely available. The intended behavior is that once status language has been entered, completions only suggest the relevant subcommands, and if status language set has been entered, only the available languages are suggested. Ideally we would filter out languages which have already been specified, but that's not implemented in the current version.

The status language list-available command is added. Primarily useful for completions.

We could add another subcommand for listing the active language precedence in a machine-readable format, but I don't see much use for that now, so as you say, I don't think we should add it now.

If you're ok with the current interface, I'll write some docs for it.

Docs are added now.

src/wutil/gettext.rs

krobelus · 2025-11-30T10:07:40Z

src/wutil/gettext.rs

+        )),
+    }
+    localizable_consts!(
+        LANGUAGE_LIST_VARIABLE_ORIGIN "The language list is set based on the %s environment variable.\n"


so today this is only about message locale,
but it could be used for other locale-related features (today that's only LC_NUMERIC AFAIK)?
I suppose "status locale" is probably the appropriate name even if we don't end up adding more than messages.

Yes, that's something I also thought about. For getting the status, the naming is not that important IMO, but for setting it we should decide now if we want to use this builtin for localization stuff other than the message locale. If so, maybe we should be more specific in the subcommand names now, using e.g. status locale set-messages (or override-messages as you suggested). Maybe also status locale get-messages in addition to status locale where the former would then show only message-related locale info, whereas the latter would show all locale information.

src/wutil/gettext.rs

tests/checks/message-localization.fish

krobelus · 2025-12-09T16:10:17Z

in general, moving from variables to dedicated commands is a good idea. Commands can print errors, their docs are easier to find and the serialization format is "shell commands" rather than a custom DSL. "complete" has always worked that way, "abbr" does too since we added options. With upcoming color variable changes (3e17b96) it would probably make sense to do it for those variables too.

Remove the format validation and fallback handling. Instead, only check string equality and ignore items which don't have a catalog with the exact same name. This simplifies the implementation and is easy to understand for users. The new approach also does not depend on our naming scheme for catalogs (when using the builtin command).

so "status language-override de_DE" will fail but "status language-override de" will work? I'm not sure I get it. I agree that possibly changing the builtin command syntax later (even if breaking) is fine, especially since we can always add extra logic for full backwards compatibility if we want to.

danielrainer · 2025-12-09T16:51:27Z

"status language-override de_DE" will fail but "status language-override de" will work?

Exactly. Since we can give proper feedback via warnings/error messages for commands, and we can add a way to list the available options, I think it makes sense to go with the simplest possible option, with no fallback logic. That should be easy to understand and use, even if my explanation in the comment might not have been clear. Adding completions for the command should also help.

share/completions/status.fish

src/builtins/shared.rs

doc_src/cmds/status.rst

krobelus · 2025-12-14T09:24:33Z

doc_src/cmds/status.rst

+
+    **unset**:
+    Undoes the effects of the **set** subcommand.
+    Language settings will be taken from environment variables again.


I wonder if a user would expect "unset" to exit with status 1 if there was no override.
I guess that might make sense, but I don't think it's important.
We have such smart failure returns in many builtins (set -e somevar, string join \n),
which can be very surprising when writing a nontrivial amount of fish script (fortunately few users actually need to do that)

Hm, I'm not sure I'd like the unset command to exit non-zero if it doesn't actually fail. If we want to provide a way to check if a language override is active, we should provide a dedicated command for that which does not modify any state. Is there any use to knowing whether the language was set previously with no way of accessing the value it was set to? With a command like this, I think seeing a non-zero exit status could be quite confusing.

I'd also argue that the other commands you mention shouldn't exit non-zero, but changing that has the potential to break things.

krobelus · 2025-12-14T09:24:33Z

src/wutil/gettext.rs

+                fn is_c_locale(locale: &str) -> bool {
+                    locale.starts_with('C')
+                }
+                if is_c_locale(locale) {


Maybe inline this function?
BTW I always get confused how this avoids false positives,
until I remember that valid locale specifiers start with lowercase letters.
So we don't need to check that there's a word boundary after C.
Maybe worth a comment. Looks like there exists a POSIX locale, I wonder if we should support it.

// Locale specifiers start lowercase, only known exceptions are 'C' and 'POSIX'. if locale.starts_with('C') {

The reason I haven't inlined it is to provide some context why we check the first character, but I guess a comment could do the same and be even more explicit.

According to https://pubs.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap07.html#tag_07_02, the POSIX and C locales should be synonyms. I think supporting POSIX is a good idea, but I'm not sure if it's better to add support in this commit, or do it separately. From a quick regex search, I haven't found any other places which read locale variables and check if it's the C locale. If there are indeed no other instances of this, I think adding support for the POSIX locale in this commit would be fine.

I added a comment to explain this, and added support for the POSIX locale. Still haven't found any other place in the code base which reads locale variables to check if the C locale is active, so I think it makes sense to include this change here.

src/wutil/gettext.rs

krobelus · 2025-12-14T09:24:33Z

src/wutil/gettext.rs

+    let localization_state = gettext_impl::status_language();
+    let mut result = WString::new();
+    localizable_consts!(
+        LANGUAGE_LIST_VARIABLE_ORIGIN "The language list is set based on the %s environment variable.\n"


technically the variable needn't be part of the environment, I was wondering if we should drop "environment".
Though in practice it will and should be 99% of the time, so it's probably better to leave this.

Fair point. I don't have much of a preference here.

krobelus · 2025-12-14T09:24:33Z

src/wutil/gettext.rs

+        }
+    };
+    result.push_utfstr(&wgettext!(
+        "The language list is set to the following value:"


maybe a brief Language list: or Active languages: would sound better?
This would imply a change to the "origin" line, maybe even collapse them to oneline:

# if no variable is set Active languages (default): # if LC_ALL=C Active languages (from $LC_ALL): # etc. Active languages (from $LC_ALL): de Active languages (from $LC_MESSAGES): de Active languages (from $LANG): de Active languages (from $LANGUAGE): de fr Active languages (from `status language set`): de fr

Maybe that's too concise but you get the idea.

The fact that English is implicit is weird, especially in the first two cases.
But that's unrelated to the decision on wording, and probably not surprising.

Yes, I like having this more concise as well.

Regarding implicit English, I agree that it's somewhat strange. With gettext, we also have the issue that there is a difference between messages taken from the source vs taken from the en catalog. The difference is fairly minor, mainly some fancy quotes in the en catalog IIRC. When we switch to Fluent, we need to decide how to handle this. I think using the msgids for English would be fine, and using the msgstrs where we have them would also be fine if we are not worried about some characters not rendering correctly on certain systems. If we really want to have a separate English catalog, we could also introduce something like default.ftl in addition to en.ftl, but I prefer the other options.

doc_src/cmds/status.rst

Based on the discussion in fish-shell#11967 Introduce a `status language` builtin, which has subcommands for controlling and inspecting fish's message localization status. The motivation for this is that using only the established environment variables `LANGUAGE`, `LC_ALL`, `LC_MESSAGES`, and `LANG` can cause problems when fish interprets them differently from GNU gettext. In addition, these are not well-suited for users who want to override their normal localization settings only for fish, since fish would propagate the values of these variables to its child processes. Configuration via these variables still works as before, but now there is the `status language set` command, which allows overriding the localization configuration. If `status language set` is used, the language precedence list will be taken from its remaining arguments. Warnings will be shown for invalid arguments. Once this command was used, the localization related environment variables are ignored. To go back to taking the configuration from the environment variables after `status language set` was executed, users can run `status language unset`. Running `status language` without arguments shows information about the current message localization status, allowing users to better understand how their settings are interpreted by fish. The `status language list-available` command shows which languages are available to choose from, which is used for completions. This commit eliminates dependencies from the `gettext_impl` module to code in fish's main crate, allowing for extraction of this module into its own crate in a future commit. Closes fish-shell#12106

krobelus · 2025-12-18T10:19:28Z

src/wutil/gettext.rs

+                // locale name.
+                // https://pubs.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap07.html#tag_07_02
+                fn is_c_locale(locale: &str) -> bool {
+                    locale.starts_with('C') || locale.starts_with("POSIX")


(this should technically have been in a separate commit. Same for the change to no stop canonicalizing LANGUAGE entries)

Fair point about the POSIX locale, but the LANGUAGE variable entries, as well as the locale variables are still being canonicalized (in the sense that we strip off suffixes for locale variables and fall back from ll_CC to ll). Only the values set via status language set require exact matches.

krobelus · 2025-12-18T10:19:28Z

tests/checks/message-localization.fish

+
+status language unset
+status language
+# CHECK: Active languages (from variable LC_MESSAGES):


the "from" phrasing is maybe not perfect, maybe something like "source: " would work better.

I'll queue this now since this is not worse than my suggestion, and I guess we don't have further changes planned.

Yes, I also prefer source: . I can push the relevant changes if you'd like.

maybe like this (also hiding the command from translators)

diff --git a/src/localization/mod.rs b/src/localization/mod.rs index 814f600e9c..eac6ee206c 100644 --- a/src/localization/mod.rs +++ b/src/localization/mod.rs @@ -142,7 +142,7 @@ let localization_state = fish_gettext::status_language(); let mut result = WString::new(); localizable_consts!( - LANGUAGE_LIST_VARIABLE_ORIGIN "from variable %s" + LANGUAGE_LIST_VARIABLE_ORIGIN "%s variable" ); let origin_string = match localization_state.precedence_origin { LanguagePrecedenceOrigin::Default => wgettext!("default").to_owned(), @@ -153,10 +153,13 @@ wgettext_fmt!(LANGUAGE_LIST_VARIABLE_ORIGIN, "LANGUAGE") } LanguagePrecedenceOrigin::StatusLanguage => { - wgettext!("from command `status language set`").to_owned() + wgettext_fmt!("%s command", "`status language set`") } }; - result.push_utfstr(&wgettext_fmt!("Active languages (%s):", origin_string)); + result.push_utfstr(&wgettext_fmt!( + "Active languages (source: %s):", + origin_string + )); append_space_separated_list(&mut result, &localization_state.language_precedence); result.push('\n');

Yes, that seems better. Are you going to apply the patch yourself?

Ref: #12106 (comment)

danielrainer force-pushed the status_locale branch from a747278 to 9c4d8dd Compare November 24, 2025 23:54

danielrainer mentioned this pull request Nov 25, 2025

gettext: move gettext_impl into dedicated crate #12108

Closed

krobelus reviewed Nov 30, 2025

View reviewed changes

danielrainer force-pushed the status_locale branch 2 times, most recently from f5ee93c to e36bf34 Compare November 30, 2025 17:09

danielrainer force-pushed the status_locale branch from e36bf34 to d00db3c Compare December 10, 2025 19:09

krobelus reviewed Dec 11, 2025

View reviewed changes

share/completions/status.fish Outdated Show resolved Hide resolved

src/builtins/shared.rs Show resolved Hide resolved

danielrainer force-pushed the status_locale branch from d00db3c to 135ed67 Compare December 11, 2025 21:09

danielrainer changed the title ~~l10n: implement status locale builtin~~ l10n: implement status language builtin Dec 11, 2025

krobelus added this to the fish 4.3 milestone Dec 14, 2025

krobelus mentioned this pull request Dec 14, 2025

Update enums to PascalCase, global constants to SCREAMING_SNAKE_CASE, FLOG/FLOGF to flog/flogf #12156

Closed

3 tasks

krobelus reviewed Dec 14, 2025

View reviewed changes

krobelus approved these changes Dec 14, 2025

View reviewed changes

krobelus reviewed Dec 14, 2025

View reviewed changes

doc_src/cmds/status.rst Outdated Show resolved Hide resolved

danielrainer force-pushed the status_locale branch 3 times, most recently from 4be3219 to 8431fe3 Compare December 17, 2025 14:41

danielrainer force-pushed the status_locale branch from 8431fe3 to 607f469 Compare December 17, 2025 17:17

krobelus reviewed Dec 18, 2025

View reviewed changes

krobelus closed this in aa8f5fc Dec 18, 2025

danielrainer deleted the status_locale branch December 19, 2025 03:13

krobelus mentioned this pull request Dec 25, 2025

Cygwin system test #12171

Open

krobelus added a commit that referenced this pull request Dec 25, 2025

Tweak language in "status language" output

fed0269

Ref: #12106 (comment)

Uh oh!

Conversation

danielrainer commented Nov 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

krobelus commented Dec 9, 2025 via email

Uh oh!

danielrainer commented Dec 9, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

danielrainer commented Nov 24, 2025 •

edited

Loading