Thanks to visit codestin.com
Credit goes to github.com

Skip to content

bpo-47000: Make io.text_encoding() respects UTF-8 mode #32003

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Apr 4, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 8 additions & 3 deletions Doc/library/io.rst
Original file line number Diff line number Diff line change
Expand Up @@ -198,12 +198,13 @@ High-level Module Interface
This is a helper function for callables that use :func:`open` or
:class:`TextIOWrapper` and have an ``encoding=None`` parameter.

This function returns *encoding* if it is not ``None`` and ``"locale"`` if
*encoding* is ``None``.
This function returns *encoding* if it is not ``None``.
Otherwise, it returns ``"locale"`` or ``"utf-8"`` depending on
:ref:`UTF-8 Mode <utf8-mode>`.

This function emits an :class:`EncodingWarning` if
:data:`sys.flags.warn_default_encoding <sys.flags>` is true and *encoding*
is None. *stacklevel* specifies where the warning is emitted.
is ``None``. *stacklevel* specifies where the warning is emitted.
For example::

def read_text(path, encoding=None):
Expand All @@ -218,6 +219,10 @@ High-level Module Interface

.. versionadded:: 3.10

.. versionchanged:: 3.11
:func:`text_encoding` returns "utf-8" when UTF-8 mode is enabled and
*encoding* is ``None``.


.. exception:: BlockingIOError

Expand Down
1 change: 1 addition & 0 deletions Include/internal/pycore_global_strings.h
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,7 @@ struct _Py_global_strings {
STRUCT_FOR_STR(newline, "\n")
STRUCT_FOR_STR(open_br, "{")
STRUCT_FOR_STR(percent, "%")
STRUCT_FOR_STR(utf_8, "utf-8")
} literals;

struct {
Expand Down
1 change: 1 addition & 0 deletions Include/internal/pycore_runtime_init.h
Original file line number Diff line number Diff line change
Expand Up @@ -672,6 +672,7 @@ extern "C" {
INIT_STR(newline, "\n"), \
INIT_STR(open_br, "{"), \
INIT_STR(percent, "%"), \
INIT_STR(utf_8, "utf-8"), \
}, \
.identifiers = { \
INIT_ID(False), \
Expand Down
10 changes: 7 additions & 3 deletions Lib/_pyio.py
Original file line number Diff line number Diff line change
Expand Up @@ -44,8 +44,9 @@ def text_encoding(encoding, stacklevel=2):
"""
A helper function to choose the text encoding.

When encoding is not None, just return it.
Otherwise, return the default text encoding (i.e. "locale").
When encoding is not None, this function returns it.
Otherwise, this function returns the default text encoding
(i.e. "locale" or "utf-8" depends on UTF-8 mode).

This function emits an EncodingWarning if *encoding* is None and
sys.flags.warn_default_encoding is true.
Expand All @@ -55,7 +56,10 @@ def text_encoding(encoding, stacklevel=2):
However, please consider using encoding="utf-8" for new APIs.
"""
if encoding is None:
encoding = "locale"
if sys.flags.utf8_mode:
encoding = "utf-8"
else:
encoding = "locale"
if sys.flags.warn_default_encoding:
import warnings
warnings.warn("'encoding' argument not specified.",
Expand Down
11 changes: 11 additions & 0 deletions Lib/test/test_io.py
Original file line number Diff line number Diff line change
Expand Up @@ -4289,6 +4289,17 @@ def test_check_encoding_warning(self):
self.assertTrue(
warnings[1].startswith(b"<string>:8: EncodingWarning: "))

def test_text_encoding(self):
# PEP 597, bpo-47000. io.text_encoding() returns "locale" or "utf-8"
# based on sys.flags.utf8_mode
code = "import io; print(io.text_encoding(None))"

proc = assert_python_ok('-X', 'utf8=0', '-c', code)
self.assertEqual(b"locale", proc.out.strip())

proc = assert_python_ok('-X', 'utf8=1', '-c', code)
self.assertEqual(b"utf-8", proc.out.strip())

@support.cpython_only
# Depending if OpenWrapper was already created or not, the warning is
# emitted or not. For example, the attribute is already created when this
Expand Down
6 changes: 3 additions & 3 deletions Lib/test/test_utf8_mode.py
Original file line number Diff line number Diff line change
Expand Up @@ -161,7 +161,7 @@ def test_io(self):
filename = __file__

out = self.get_output('-c', code, filename, PYTHONUTF8='1')
self.assertEqual(out, 'UTF-8/strict')
self.assertEqual(out.lower(), 'utf-8/strict')

def _check_io_encoding(self, module, encoding=None, errors=None):
filename = __file__
Expand All @@ -183,10 +183,10 @@ def _check_io_encoding(self, module, encoding=None, errors=None):
PYTHONUTF8='1')

if not encoding:
encoding = 'UTF-8'
encoding = 'utf-8'
if not errors:
errors = 'strict'
self.assertEqual(out, f'{encoding}/{errors}')
self.assertEqual(out.lower(), f'{encoding}/{errors}')

def check_io_encoding(self, module):
self._check_io_encoding(module, encoding="latin1")
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Make :func:`io.text_encoding` returns "utf-8" when UTF-8 mode is enabled.
16 changes: 12 additions & 4 deletions Modules/_io/_iomodule.c
Original file line number Diff line number Diff line change
Expand Up @@ -457,8 +457,9 @@ _io.text_encoding

A helper function to choose the text encoding.

When encoding is not None, just return it.
Otherwise, return the default text encoding (i.e. "locale").
When encoding is not None, this function returns it.
Otherwise, this function returns the default text encoding
(i.e. "locale" or "utf-8" depends on UTF-8 mode).

This function emits an EncodingWarning if encoding is None and
sys.flags.warn_default_encoding is true.
Expand All @@ -469,7 +470,7 @@ However, please consider using encoding="utf-8" for new APIs.

static PyObject *
_io_text_encoding_impl(PyObject *module, PyObject *encoding, int stacklevel)
/*[clinic end generated code: output=91b2cfea6934cc0c input=bf70231213e2a7b4]*/
/*[clinic end generated code: output=91b2cfea6934cc0c input=4999aa8b3d90f3d4]*/
{
if (encoding == NULL || encoding == Py_None) {
PyInterpreterState *interp = _PyInterpreterState_GET();
Expand All @@ -479,7 +480,14 @@ _io_text_encoding_impl(PyObject *module, PyObject *encoding, int stacklevel)
return NULL;
}
}
return &_Py_ID(locale);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ericsnowcurrently We need to incref this until we implement immortal object. Am I correct?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sort of. Currently all statically allocated global objects have their refcount initialized to a really large value. So we haven't been bothering with the incref if we know the object is one of those, which it is in this case. Incref'ing it won't hurt. However, it is effectively unnecessary.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am afraid breaking refleak buildbot.
Would you tell my why this return &_Py_ID(locale); don't break refleak test?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The refcount of static objects is not directly included in _Py_RefTotal, which is used for refleak detection. (_Py_RefTotal will still ensure that refcount operations are balanced though, regardless of the objects involved.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(_Py_RefTotal will still ensure that refcount operations are balanced though, regardless of the objects involved.)

So skipping INCREF here might cause in-balance of _Py_RefTotal. Doesn't it break refleak test?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. It didn't cause a problem before. I'd have to look into why not.

const PyPreConfig *preconfig = &_PyRuntime.preconfig;
if (preconfig->utf8_mode) {
_Py_DECLARE_STR(utf_8, "utf-8");
encoding = &_Py_STR(utf_8);
}
else {
encoding = &_Py_ID(locale);
}
}
Py_INCREF(encoding);
return encoding;
Expand Down
7 changes: 4 additions & 3 deletions Modules/_io/clinic/_iomodule.c.h

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

5 changes: 4 additions & 1 deletion Python/sysmodule.c
Original file line number Diff line number Diff line change
Expand Up @@ -841,7 +841,10 @@ static PyObject *
sys_getdefaultencoding_impl(PyObject *module)
/*[clinic end generated code: output=256d19dfcc0711e6 input=d416856ddbef6909]*/
{
return PyUnicode_FromString(PyUnicode_GetDefaultEncoding());
_Py_DECLARE_STR(utf_8, "utf-8");
PyObject *ret = &_Py_STR(utf_8);
Py_INCREF(ret);
return ret;
}

/*[clinic input]
Expand Down