Thanks to visit codestin.com
Credit goes to github.com

Skip to content

l10n: implement status language builtin#12106

Closed
danielrainer wants to merge 1 commit intofish-shell:masterfrom
danielrainer:status_locale
Closed

l10n: implement status language builtin#12106
danielrainer wants to merge 1 commit intofish-shell:masterfrom
danielrainer:status_locale

Conversation

@danielrainer
Copy link

@danielrainer danielrainer commented Nov 24, 2025

Based on the discussion in
#11967

Introduce a status language builtin, which has subcommands for
controlling and inspecting fish's message localization status.

The motivation for this is that using only the established environment
variables LANGUAGE, LC_ALL, LC_MESSAGES, and LANG can cause
problems when fish interprets them differently from GNU gettext.
In addition, these are not well-suited for users who want to override
their normal localization settings only for fish, since fish would
propagate the values of these variables to its child processes.

Configuration via these variables still works as before, but now there
is the status language set command, which allows overriding the
localization configuration.
If status language set is used, the language precedence list will be
taken from its remaining arguments.
Warnings will be shown for invalid arguments.
Once this command was used, the localization related environment
variables are ignored.
To go back to taking the configuration from the environment variables
after status language set was executed, users can run status language unset.

Running status language without arguments shows information about the
current message localization status, allowing users to better understand
how their settings are interpreted by fish.

The status language list-available command shows which languages are
available to choose from, which is used for completions.

This commit eliminates dependencies from the gettext_impl module to
code in fish's main crate, allowing for extraction of this module into
its own crate in a future commit.

{
streams.err.append(L!("fish was built with the `localize-messages` feature disabled. The `status locale` command is unavailable.\n"));
return Err(STATUS_CMD_ERROR);
} else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the else after return is a bit ugly,
if it's implied by cfg_if, that's another (superficial) reason for moving
from #[cfg(not(feature = "localize-messages"))] to if !cfg(feature = "localize-messages")

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As it is implemented now, it wouldn't compile otherwise, because some of the gettext-related functions are only defined if the localize-messages feature is active. We could change that, but I think it is more robust to completely remove these functions when localization is disabled, since otherwise it could happen that someone tries to use the useless variant without realizing in a context where localize-messages is disabled.

I agree that the else after return is ugly, but I don't know a better way to implement this.

}
wgettext_fmt!(
"Language specifiers appear repeatedly: %s\n",
format!("{:?}", self.duplicates)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:? uses Debug so it uses Rust syntax ([x, y] for slices, Rust string escape sequences etc).
We do that for FLOG output but for user-facing things it would probably make more sense to use shell-like syntax (space separated).

I think that's crate::common::escape().
(For things that might be legacy-encoded keys, we use char_to_symbol and its extension for byte slices (DisplayBytes))

Not sure if we have a convenient way to call it.
I guess it's not super important, but maybe we should add something like join_escaped_strings next to join_strings
(or lift it to an iterator-based interface)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I primarily chose Debug syntax here because it is a simple way of showing a Vec. However, it also has the important advantage over plain, space-separated strings that it allows to distinguish between something like the following two variants:

set LANGUAGE a b
set LANGUAGE 'a b'

This is especially important for the malformed case. Maybe we could use space-separation + quoting?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

space separating escaped strings is unambiguous for all possible input values,
but of course the fact that it's space-separation doesn't become obvious until there's at least two elements.
I think in this particular case, it's already obvious from the left-hand-side that it can be multiple values.
So as long as all of them are like this, shell syntax seems more appropriate.

If people want to parse it, they can even use read -lat tokens

Historically we haven't had a lot of need to output real machine-readable (JSON/TOML) data,
but one related thing that feels icky is the parsing of status build-info output in share/functions/__fish_posix_shell.fish (especially the spaces in the key name).
I'm sure we can find a more robust solution without needing full JSON.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

space separating escaped strings is unambiguous for all possible input values

So a b vs a\ b for my examples? That should work, although it does have the problem that it's less obvious that it's a list, even if there are multiple elements, IMO, and I don't find it particularly aesthetically pleasing. Having a format that supports automated parsing is certainly nice, though.

Another option would be putting each entry on its own line. That would make it obvious that there are multiple entries if there is more than one line. A single line would still be ambiguous without quoting/escaping. Parsing individual lines would be easier, but automatically determining which lines to parse would be harder.

status build-info

That's something where a machine-readable format would certainly help. If we can't find a format that's good enough for machine and human consumption we could also add something like a --json flag to the relevant commands.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by default, string escape a\ b outputs 'a b' which looks a bit nicer I guess.
I don't have super strong opinions but we should be consistent across builtins/functions.
I haven't looked at much related things; functions for example uses comma-separation if stdout is a TTY,
and newline-separation if it isn't.

For status locale, being machine-readable is probably not a priority.
Separate lines would be okay if it looks better in the TTY (and we have enough vertical space).
We could make it automatically output JSON if stdout is not a TTY so people are heavily discouraged from parsing the human-readable output.. but JSON doesn't really sound like fish. Might be better to add more "get" subcommands that print one item per line (if we ever need them).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be better to add more "get" subcommands that print one item per line

That's probably the best approach. Maybe one of

  • status locale get language-precedence
  • status locale messages get language-precedence
  • status locale get messages language-precedence

Depending on whether we want to keep options open for supporting other locale categories. Also not sure if we want several levels of subcommands or compress them into a single level by hyphenating.

For the default, not-necessarily-machine-readable format, if we go with space-separated escaped strings, should we use src/common.rs:escape?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's probably the best approach. Maybe one of

status locale get language-precedence
status locale messages get language-precedence
status locale get messages language-precedence

I realized that status message-locale sounds a bit outdated, the modern term is probably
UI language, so we could call it status language?
The numeric thing could go into status number-format, if ever.

Then we would have status language precedence-list and status language list,
though I don't think we have a need for those, so I wouldn't add them until we do.

Depending on whether we want to keep options open for supporting other locale categories. Also
not sure if we want several levels of subcommands or compress them into a single level by
hyphenating.

Both could work depending on how many subcommands there will be,
but it sounds like we don't need either yet.
Nested subcommands are new for us, so we'd want to make sure we don't break things
like completion and error messages. Maybe we can introduce a proper data structure for a subcommand.
In future, we should maybe switch to clap but I don't know how much work it would be to migrate to that without breaking relevant things.

For the default, not-necessarily-machine-readable format, if we go with space-separated
escaped strings, should we use src/common.rs:escape?

yes. So we'll need that for displaying SetLocaleLints,
and for having status language print the precedence list,
(and the list of all available languages? I think that would be fine to add for discoverability,
and it should be obvious that it's not really meant to be parsed)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made several changes in the version I just pushed:

  • Lists of languages are now formatted as space-separated lists of shell-escaped strings as you suggested. I put a util function for this into src/wutil/gettext.rs, but maybe there is a more suitable place for it, or the functionality already exists somewhere else.
  • The command is renamed to language. I decided to stick with the multiple levels of subcommands, because I think it makes sense to group this, especially because it allows showing a smaller, more relevant set of completions.
  • Completions are added. These introduce some custom logic, which I think improve over the logic used for other status completions. We might want to extract these functions into global functions such that they are more widely available. The intended behavior is that once status language has been entered, completions only suggest the relevant subcommands, and if status language set has been entered, only the available languages are suggested. Ideally we would filter out languages which have already been specified, but that's not implemented in the current version.
  • The status language list-available command is added. Primarily useful for completions.
  • We could add another subcommand for listing the active language precedence in a machine-readable format, but I don't see much use for that now, so as you say, I don't think we should add it now.

If you're ok with the current interface, I'll write some docs for it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docs are added now.

)),
}
localizable_consts!(
LANGUAGE_LIST_VARIABLE_ORIGIN "The language list is set based on the %s environment variable.\n"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so today this is only about message locale,
but it could be used for other locale-related features (today that's only LC_NUMERIC AFAIK)?
I suppose "status locale" is probably the appropriate name even if we don't end up adding more than messages.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's something I also thought about. For getting the status, the naming is not that important IMO, but for setting it we should decide now if we want to use this builtin for localization stuff other than the message locale. If so, maybe we should be more specific in the subcommand names now, using e.g. status locale set-messages (or override-messages as you suggested). Maybe also status locale get-messages in addition to status locale where the former would then show only message-related locale info, whereas the latter would show all locale information.

@danielrainer danielrainer force-pushed the status_locale branch 2 times, most recently from f5ee93c to e36bf34 Compare November 30, 2025 17:09
@krobelus
Copy link
Contributor

krobelus commented Dec 9, 2025 via email

@danielrainer
Copy link
Author

"status language-override de_DE" will fail but "status language-override de" will work?

Exactly. Since we can give proper feedback via warnings/error messages for commands, and we can add a way to list the available options, I think it makes sense to go with the simplest possible option, with no fallback logic. That should be easy to understand and use, even if my explanation in the comment might not have been clear. Adding completions for the command should also help.

@danielrainer danielrainer changed the title l10n: implement status locale builtin l10n: implement status language builtin Dec 11, 2025
@krobelus krobelus added this to the fish 4.3 milestone Dec 14, 2025

**unset**:
Undoes the effects of the **set** subcommand.
Language settings will be taken from environment variables again.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if a user would expect "unset" to exit with status 1 if there was no override.
I guess that might make sense, but I don't think it's important.
We have such smart failure returns in many builtins (set -e somevar, string join \n),
which can be very surprising when writing a nontrivial amount of fish script (fortunately few users actually need to do that)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I'm not sure I'd like the unset command to exit non-zero if it doesn't actually fail. If we want to provide a way to check if a language override is active, we should provide a dedicated command for that which does not modify any state. Is there any use to knowing whether the language was set previously with no way of accessing the value it was set to? With a command like this, I think seeing a non-zero exit status could be quite confusing.

I'd also argue that the other commands you mention shouldn't exit non-zero, but changing that has the potential to break things.

fn is_c_locale(locale: &str) -> bool {
locale.starts_with('C')
}
if is_c_locale(locale) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe inline this function?
BTW I always get confused how this avoids false positives,
until I remember that valid locale specifiers start with lowercase letters.
So we don't need to check that there's a word boundary after C.
Maybe worth a comment. Looks like there exists a POSIX locale, I wonder if we should support it.

// Locale specifiers start lowercase, only known exceptions are 'C' and 'POSIX'.
if locale.starts_with('C') {

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I haven't inlined it is to provide some context why we check the first character, but I guess a comment could do the same and be even more explicit.

According to https://pubs.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap07.html#tag_07_02, the POSIX and C locales should be synonyms. I think supporting POSIX is a good idea, but I'm not sure if it's better to add support in this commit, or do it separately. From a quick regex search, I haven't found any other places which read locale variables and check if it's the C locale. If there are indeed no other instances of this, I think adding support for the POSIX locale in this commit would be fine.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a comment to explain this, and added support for the POSIX locale. Still haven't found any other place in the code base which reads locale variables to check if the C locale is active, so I think it makes sense to include this change here.

let localization_state = gettext_impl::status_language();
let mut result = WString::new();
localizable_consts!(
LANGUAGE_LIST_VARIABLE_ORIGIN "The language list is set based on the %s environment variable.\n"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

technically the variable needn't be part of the environment, I was wondering if we should drop "environment".
Though in practice it will and should be 99% of the time, so it's probably better to leave this.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point. I don't have much of a preference here.

}
};
result.push_utfstr(&wgettext!(
"The language list is set to the following value:"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe a brief Language list: or Active languages: would sound better?
This would imply a change to the "origin" line, maybe even collapse them to oneline:

# if no variable is set
Active languages (default):
# if LC_ALL=C
Active languages (from $LC_ALL):
# etc.
Active languages (from $LC_ALL): de
Active languages (from $LC_MESSAGES): de
Active languages (from $LANG): de
Active languages (from $LANGUAGE): de fr
Active languages (from `status language set`): de fr

Maybe that's too concise but you get the idea.

The fact that English is implicit is weird, especially in the first two cases.
But that's unrelated to the decision on wording, and probably not surprising.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I like having this more concise as well.

Regarding implicit English, I agree that it's somewhat strange. With gettext, we also have the issue that there is a difference between messages taken from the source vs taken from the en catalog. The difference is fairly minor, mainly some fancy quotes in the en catalog IIRC. When we switch to Fluent, we need to decide how to handle this. I think using the msgids for English would be fine, and using the msgstrs where we have them would also be fine if we are not worried about some characters not rendering correctly on certain systems. If we really want to have a separate English catalog, we could also introduce something like default.ftl in addition to en.ftl, but I prefer the other options.

@danielrainer danielrainer force-pushed the status_locale branch 3 times, most recently from 4be3219 to 8431fe3 Compare December 17, 2025 14:41
Based on the discussion in
fish-shell#11967

Introduce a `status language` builtin, which has subcommands for
controlling and inspecting fish's message localization status.

The motivation for this is that using only the established environment
variables `LANGUAGE`, `LC_ALL`, `LC_MESSAGES`, and `LANG` can cause
problems when fish interprets them differently from GNU gettext.
In addition, these are not well-suited for users who want to override
their normal localization settings only for fish, since fish would
propagate the values of these variables to its child processes.

Configuration via these variables still works as before, but now there
is the `status language set` command, which allows overriding the
localization configuration.
If `status language set` is used, the language precedence list will be
taken from its remaining arguments.
Warnings will be shown for invalid arguments.
Once this command was used, the localization related environment
variables are ignored.
To go back to taking the configuration from the environment variables
after `status language set` was executed, users can run `status language
unset`.

Running `status language` without arguments shows information about the
current message localization status, allowing users to better understand
how their settings are interpreted by fish.

The `status language list-available` command shows which languages are
available to choose from, which is used for completions.

This commit eliminates dependencies from the `gettext_impl` module to
code in fish's main crate, allowing for extraction of this module into
its own crate in a future commit.

Closes fish-shell#12106
// locale name.
// https://pubs.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap07.html#tag_07_02
fn is_c_locale(locale: &str) -> bool {
locale.starts_with('C') || locale.starts_with("POSIX")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(this should technically have been in a separate commit. Same for the change to no stop canonicalizing LANGUAGE entries)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point about the POSIX locale, but the LANGUAGE variable entries, as well as the locale variables are still being canonicalized (in the sense that we strip off suffixes for locale variables and fall back from ll_CC to ll). Only the values set via status language set require exact matches.


status language unset
status language
# CHECK: Active languages (from variable LC_MESSAGES):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the "from" phrasing is maybe not perfect, maybe something like "source: " would work better.

I'll queue this now since this is not worse than my suggestion, and I guess we don't have further changes planned.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I also prefer source: . I can push the relevant changes if you'd like.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe like this (also hiding the command from translators)

diff --git a/src/localization/mod.rs b/src/localization/mod.rs
index 814f600e9c..eac6ee206c 100644
--- a/src/localization/mod.rs
+++ b/src/localization/mod.rs
@@ -142,7 +142,7 @@
     let localization_state = fish_gettext::status_language();
     let mut result = WString::new();
     localizable_consts!(
-        LANGUAGE_LIST_VARIABLE_ORIGIN "from variable %s"
+        LANGUAGE_LIST_VARIABLE_ORIGIN "%s variable"
     );
     let origin_string = match localization_state.precedence_origin {
         LanguagePrecedenceOrigin::Default => wgettext!("default").to_owned(),
@@ -153,10 +153,13 @@
             wgettext_fmt!(LANGUAGE_LIST_VARIABLE_ORIGIN, "LANGUAGE")
         }
         LanguagePrecedenceOrigin::StatusLanguage => {
-            wgettext!("from command `status language set`").to_owned()
+            wgettext_fmt!("%s command", "`status language set`")
         }
     };
-    result.push_utfstr(&wgettext_fmt!("Active languages (%s):", origin_string));
+    result.push_utfstr(&wgettext_fmt!(
+        "Active languages (source: %s):",
+        origin_string
+    ));
     append_space_separated_list(&mut result, &localization_state.language_precedence);
     result.push('\n');

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that seems better. Are you going to apply the patch yourself?

@krobelus krobelus closed this in aa8f5fc Dec 18, 2025
@danielrainer danielrainer deleted the status_locale branch December 19, 2025 03:13
@krobelus krobelus mentioned this pull request Dec 25, 2025
krobelus added a commit that referenced this pull request Dec 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants