Thanks to visit codestin.com
Credit goes to github.com

Skip to content

gh-82045: Correct and deduplicate "isprintable" docs; add test. #130118

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Feb 14, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 2 additions & 7 deletions Doc/c-api/unicode.rst
Original file line number Diff line number Diff line change
Expand Up @@ -256,13 +256,8 @@ the Python configuration.

.. c:function:: int Py_UNICODE_ISPRINTABLE(Py_UCS4 ch)

Return ``1`` or ``0`` depending on whether *ch* is a printable character.
Nonprintable characters are those characters defined in the Unicode character
database as "Other" or "Separator", excepting the ASCII space (0x20) which is
considered printable. (Note that printable characters in this context are
those which should not be escaped when :func:`repr` is invoked on a string.
It has no bearing on the handling of strings written to :data:`sys.stdout` or
:data:`sys.stderr`.)
Return ``1`` or ``0`` depending on whether *ch* is a printable character,
in the sense of :meth:`str.isprintable`.


These APIs can be used for fast direct character conversions:
Expand Down
20 changes: 13 additions & 7 deletions Doc/library/stdtypes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2012,13 +2012,19 @@ expression support in the :mod:`re` module).

.. method:: str.isprintable()

Return ``True`` if all characters in the string are printable or the string is
empty, ``False`` otherwise. Nonprintable characters are those characters defined
in the Unicode character database as "Other" or "Separator", excepting the
ASCII space (0x20) which is considered printable. (Note that printable
characters in this context are those which should not be escaped when
:func:`repr` is invoked on a string. It has no bearing on the handling of
strings written to :data:`sys.stdout` or :data:`sys.stderr`.)
Return true if all characters in the string are printable, false if it
contains at least one non-printable character.

Here "printable" means the character is suitable for :func:`repr` to use in
its output; "non-printable" means that :func:`repr` on built-in types will
hex-escape the character. It has no bearing on the handling of strings
written to :data:`sys.stdout` or :data:`sys.stderr`.

The printable characters are those which in the Unicode character database
(see :mod:`unicodedata`) have a general category in group Letter, Mark,
Number, Punctuation, or Symbol (L, M, N, P, or S); plus the ASCII space 0x20.
Nonprintable characters are those in group Separator or Other (Z or C),
except the ASCII space.


.. method:: str.isspace()
Expand Down
9 changes: 9 additions & 0 deletions Lib/test/test_str.py
Original file line number Diff line number Diff line change
Expand Up @@ -853,6 +853,15 @@ def test_isprintable(self):
self.assertTrue('\U0001F46F'.isprintable())
self.assertFalse('\U000E0020'.isprintable())

@support.requires_resource('cpu')
def test_isprintable_invariant(self):
for codepoint in range(sys.maxunicode + 1):
char = chr(codepoint)
category = unicodedata.category(char)
self.assertEqual(char.isprintable(),
category[0] not in ('C', 'Z')
or char == ' ')

def test_surrogates(self):
for s in ('a\uD800b\uDFFF', 'a\uDFFFb\uD800',
'a\uD800b\uDFFFa', 'a\uDFFFb\uD800a'):
Expand Down
7 changes: 3 additions & 4 deletions Objects/clinic/unicodeobject.c.h

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

16 changes: 4 additions & 12 deletions Objects/unicodectype.c
Original file line number Diff line number Diff line change
Expand Up @@ -142,18 +142,10 @@ int _PyUnicode_IsNumeric(Py_UCS4 ch)
return (ctype->flags & NUMERIC_MASK) != 0;
}

/* Returns 1 for Unicode characters to be hex-escaped when repr()ed,
0 otherwise.
All characters except those characters defined in the Unicode character
database as following categories are considered printable.
* Cc (Other, Control)
* Cf (Other, Format)
* Cs (Other, Surrogate)
* Co (Other, Private Use)
* Cn (Other, Not Assigned)
* Zl Separator, Line ('\u2028', LINE SEPARATOR)
* Zp Separator, Paragraph ('\u2029', PARAGRAPH SEPARATOR)
* Zs (Separator, Space) other than ASCII space('\x20').
/* Returns 1 for Unicode characters that repr() may use in its output,
and 0 for characters to be hex-escaped.

See documentation of `str.isprintable` for details.
*/
int _PyUnicode_IsPrintable(Py_UCS4 ch)
{
Expand Down
7 changes: 3 additions & 4 deletions Objects/unicodeobject.c
Original file line number Diff line number Diff line change
Expand Up @@ -12452,15 +12452,14 @@ unicode_isidentifier_impl(PyObject *self)
/*[clinic input]
str.isprintable as unicode_isprintable

Return True if the string is printable, False otherwise.
Return True if all characters in the string are printable, False otherwise.

A string is printable if all of its characters are considered printable in
repr() or if it is empty.
A character is printable if repr() may use it in its output.
[clinic start generated code]*/

static PyObject *
unicode_isprintable_impl(PyObject *self)
/*[clinic end generated code: output=3ab9626cd32dd1a0 input=98a0e1c2c1813209]*/
/*[clinic end generated code: output=3ab9626cd32dd1a0 input=4e56bcc6b06ca18c]*/
{
Py_ssize_t i, length;
int kind;
Expand Down
Loading