@@ -438,8 +438,10 @@ Miscellaneous options
438438 * Set the :attr: `~sys.flags.dev_mode ` attribute of :attr: `sys.flags ` to
439439 ``True ``
440440
441- * ``-X utf8 `` enables the UTF-8 mode, whereas ``-X utf8=0 `` disables the
442- UTF-8 mode.
441+ * ``-X utf8 `` enables UTF-8 mode for operating system interfaces, overriding
442+ the default locale-aware mode. ``-X utf8=0 `` explicitly disables UTF-8
443+ mode (even when it would otherwise activate automatically).
444+ See :envvar: `PYTHONUTF8 ` for more details.
443445
444446 It also allows passing arbitrary values and retrieving them through the
445447 :data: `sys._xoptions ` dictionary.
@@ -789,36 +791,49 @@ conflict.
789791.. envvar :: PYTHONCOERCECLOCALE
790792
791793 If set to the value ``0 ``, causes the main Python command line application
792- to skip coercing the legacy ASCII-based C locale to a more capable UTF-8
793- based alternative.
794+ to skip coercing the legacy ASCII-based C and POSIX locales to a more
795+ capable UTF-8 based alternative.
794796
795- If this variable is *not * set, or is set to a value other than ``0 ``, and
796- the current locale reported for the ``LC_CTYPE `` category is the default
797- ``C `` locale, then the Python CLI will attempt to configure the following
798- locales for the ``LC_CTYPE `` category in the order listed before loading the
799- interpreter runtime:
797+ If this variable is *not * set (or is set to a value other than ``0 ``), the
798+ ``LC_ALL `` locale override environment variable is also not set, and the
799+ current locale reported for the ``LC_CTYPE `` category is either the default
800+ ``C `` locale, or else the explicitly ASCII-based ``POSIX `` locale, then the
801+ Python CLI will attempt to configure the following locales for the
802+ ``LC_CTYPE `` category in the order listed before loading the interpreter
803+ runtime:
800804
801805 * ``C.UTF-8 ``
802806 * ``C.utf8 ``
803807 * ``UTF-8 ``
804808
805809 If setting one of these locale categories succeeds, then the ``LC_CTYPE ``
806810 environment variable will also be set accordingly in the current process
807- environment before the Python runtime is initialized. This ensures the
808- updated setting is seen in subprocesses, as well as in operations that
809- query the environment rather than the current C locale (such as Python's
810- own :func: `locale.getdefaultlocale `).
811+ environment before the Python runtime is initialized. This ensures that in
812+ addition to being seen by both the interpreter itself and other locale-aware
813+ components running in the same process (such as the GNU ``readline ``
814+ library), the updated setting is also seen in subprocesses (regardless of
815+ whether or not those processes are running a Python interpreter), as well as
816+ in operations that query the environment rather than the current C locale
817+ (such as Python's own :func: `locale.getdefaultlocale `).
811818
812819 Configuring one of these locales (either explicitly or via the above
813- implicit locale coercion) will automatically set the error handler for
814- :data: `sys.stdin ` and :data: `sys.stdout ` to ``surrogateescape ``. This
815- behavior can be overridden using :envvar: `PYTHONIOENCODING ` as usual.
820+ implicit locale coercion) automatically enables the ``surrogateescape ``
821+ :ref: `error handler <error-handlers >` for :data: `sys.stdin ` and
822+ :data: `sys.stdout ` (:data: `sys.stderr ` continues to use ``backslashreplace ``
823+ as it does in any other locale). This stream handling behavior can be
824+ overridden using :envvar: `PYTHONIOENCODING ` as usual.
816825
817826 For debugging purposes, setting ``PYTHONCOERCECLOCALE=warn `` will cause
818827 Python to emit warning messages on ``stderr `` if either the locale coercion
819828 activates, or else if a locale that *would * have triggered coercion is
820829 still active when the Python runtime is initialized.
821830
831+ Also note that even when locale coercion is disabled, or when it fails to
832+ find a suitable target locale, :envvar: `PYTHONUTF8 ` will still activate by
833+ default in legacy ASCII-based locales. Both features must be disabled in
834+ order to force the interpreter to use ``ASCII `` instead of ``UTF-8 `` for
835+ system interfaces.
836+
822837 Availability: \* nix
823838
824839 .. versionadded :: 3.7
@@ -834,10 +849,56 @@ conflict.
834849
835850.. envvar :: PYTHONUTF8
836851
837- If set to ``1 ``, enable the UTF-8 mode. If set to ``0 ``, disable the UTF-8
838- mode. Any other non-empty string cause an error.
852+ If set to ``1 ``, enables the interpreter's UTF-8 mode, where ``UTF-8 `` is
853+ used as the text encoding for system interfaces, regardless of the
854+ current locale setting.
855+
856+ This means that:
857+
858+ * :func: `sys.getfilesystemencoding() ` returns ``'UTF-8' `` (the locale
859+ encoding is ignored).
860+ * :func: `locale.getpreferredencoding() ` returns ``'UTF-8' `` (the locale
861+ encoding is ignored, and the function's ``do_setlocale `` parameter has no
862+ effect).
863+ * :data: `sys.stdin `, :data: `sys.stdout `, and :data: `sys.stderr ` all use
864+ UTF-8 as their text encoding, with the ``surrogateescape ``
865+ :ref: `error handler <error-handlers >` being enabled for :data: `sys.stdin `
866+ and :data: `sys.stdout ` (:data: `sys.stderr ` continues to use
867+ ``backslashreplace `` as it does in the default locale-aware mode)
868+
869+ As a consequence of the changes in those lower level APIs, other higher
870+ level APIs also exhibit different default behaviours:
871+
872+ * Command line arguments, environment variables and filenames are decoded
873+ to text using the UTF-8 encoding.
874+ * :func: `os.fsdecode() ` and :func: `os.fsencode() ` use the UTF-8 encoding.
875+ * :func: `open() `, :func: `io.open() `, and :func: `codecs.open() ` use the UTF-8
876+ encoding by default. However, they still use the strict error handler by
877+ default so that attempting to open a binary file in text mode is likely
878+ to raise an exception rather than producing nonsense data.
879+
880+ Note that the standard stream settings in UTF-8 mode can be overridden by
881+ :envvar: `PYTHONIOENCODING ` (just as they can be in the default locale-aware
882+ mode).
883+
884+ If set to ``0 ``, the interpreter runs in its default locale-aware mode.
885+
886+ Setting any other non-empty string causes an error during interpreter
887+ initialisation.
888+
889+ If this environment variable is not set at all, then the interpreter defaults
890+ to using the current locale settings, *unless * the current locale is
891+ identified as a legacy ASCII-based locale
892+ (as descibed for :envvar: `PYTHONCOERCECLOCALE `), and locale coercion is
893+ either disabled or fails. In such legacy locales, the interpreter will
894+ default to enabling UTF-8 mode unless explicitly instructed not to do so.
895+
896+ Also available as the :option: `-X ` ``utf8 `` option.
897+
898+ Availability: \* nix
839899
840900 .. versionadded :: 3.7
901+ See :pep: `540 ` for more details.
841902
842903
843904Debug-mode variables
0 commit comments