-
-
Notifications
You must be signed in to change notification settings - Fork 32.3k
gh-46236: PyUnicode docs improvements #129966
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
38c64c2
Move deprecated PyUnicode API docs to new section
encukou 1a75655
Document PyUnicode_IS_ASCII, PyUnicode_CHECK_INTERNED
encukou fc5322c
PyUnicode_New docs: Clarify requirements for "fresh" strings
encukou e12ad3a
PyUnicodeWriter_DecodeUTF8Stateful: Link "error-handlers"
encukou e73652e
Apply suggestions from code review
encukou d21b676
Merge branch 'main' into pyunicode-docs
encukou File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -31,6 +31,12 @@ Unicode Type | |
These are the basic Unicode object types used for the Unicode implementation in | ||
Python: | ||
|
||
.. c:var:: PyTypeObject PyUnicode_Type | ||
|
||
This instance of :c:type:`PyTypeObject` represents the Python Unicode type. It | ||
is exposed to Python code as :py:class:`str`. | ||
|
||
|
||
.. c:type:: Py_UCS4 | ||
Py_UCS2 | ||
Py_UCS1 | ||
|
@@ -42,19 +48,6 @@ Python: | |
.. versionadded:: 3.3 | ||
|
||
|
||
.. c:type:: Py_UNICODE | ||
|
||
This is a typedef of :c:type:`wchar_t`, which is a 16-bit type or 32-bit type | ||
depending on the platform. | ||
|
||
.. versionchanged:: 3.3 | ||
In previous versions, this was a 16-bit type or a 32-bit type depending on | ||
whether you selected a "narrow" or "wide" Unicode version of Python at | ||
build time. | ||
|
||
.. deprecated-removed:: 3.13 3.15 | ||
|
||
|
||
.. c:type:: PyASCIIObject | ||
PyCompactUnicodeObject | ||
PyUnicodeObject | ||
|
@@ -66,12 +59,6 @@ Python: | |
.. versionadded:: 3.3 | ||
|
||
|
||
.. c:var:: PyTypeObject PyUnicode_Type | ||
|
||
This instance of :c:type:`PyTypeObject` represents the Python Unicode type. It | ||
is exposed to Python code as ``str``. | ||
|
||
|
||
The following APIs are C macros and static inlined functions for fast checks and | ||
access to internal read-only data of Unicode objects: | ||
|
||
|
@@ -87,16 +74,6 @@ access to internal read-only data of Unicode objects: | |
subtype. This function always succeeds. | ||
|
||
|
||
.. c:function:: int PyUnicode_READY(PyObject *unicode) | ||
|
||
Returns ``0``. This API is kept only for backward compatibility. | ||
|
||
.. versionadded:: 3.3 | ||
|
||
.. deprecated:: 3.10 | ||
This API does nothing since Python 3.12. | ||
|
||
|
||
.. c:function:: Py_ssize_t PyUnicode_GET_LENGTH(PyObject *unicode) | ||
|
||
Return the length of the Unicode string, in code points. *unicode* has to be a | ||
|
@@ -149,12 +126,16 @@ access to internal read-only data of Unicode objects: | |
.. c:function:: void PyUnicode_WRITE(int kind, void *data, \ | ||
Py_ssize_t index, Py_UCS4 value) | ||
|
||
Write into a canonical representation *data* (as obtained with | ||
:c:func:`PyUnicode_DATA`). This function performs no sanity checks, and is | ||
intended for usage in loops. The caller should cache the *kind* value and | ||
*data* pointer as obtained from other calls. *index* is the index in | ||
the string (starts at 0) and *value* is the new code point value which should | ||
be written to that location. | ||
Write the code point *value* to the given zero-based *index* in a string. | ||
|
||
The *kind* value and *data* pointer must have been obtained from a | ||
string using :c:func:`PyUnicode_KIND` and :c:func:`PyUnicode_DATA` | ||
respectively. You must hold a reference to that string while calling | ||
:c:func:`!PyUnicode_WRITE`. All requirements of | ||
:c:func:`PyUnicode_WriteChar` also apply. | ||
|
||
The function performs no checks for any of its requirements, | ||
and is intended for usage in loops. | ||
|
||
.. versionadded:: 3.3 | ||
|
||
|
@@ -196,6 +177,14 @@ access to internal read-only data of Unicode objects: | |
is not ready. | ||
|
||
|
||
.. c:function:: unsigned int PyUnicode_IS_ASCII(PyObject *unicode) | ||
|
||
Return true if the string only contains ASCII characters. | ||
Equivalent to :py:meth:`str.isascii`. | ||
|
||
.. versionadded:: 3.2 | ||
|
||
|
||
Unicode Character Properties | ||
"""""""""""""""""""""""""""" | ||
|
||
|
@@ -330,11 +319,29 @@ APIs: | |
to be placed in the string. As an approximation, it can be rounded up to the | ||
nearest value in the sequence 127, 255, 65535, 1114111. | ||
|
||
This is the recommended way to allocate a new Unicode object. Objects | ||
created using this function are not resizable. | ||
|
||
On error, set an exception and return ``NULL``. | ||
|
||
After creation, the string can be filled by :c:func:`PyUnicode_WriteChar`, | ||
:c:func:`PyUnicode_CopyCharacters`, :c:func:`PyUnicode_Fill`, | ||
:c:func:`PyUnicode_WRITE` or similar. | ||
Since strings are supposed to be immutable, take care to not “use” the | ||
result while it is being modified. In particular, before it's filled | ||
with its final contents, a string: | ||
|
||
- must not be hashed, | ||
- must not be :c:func:`converted to UTF-8 <PyUnicode_AsUTF8AndSize>`, | ||
or another non-"canonical" representation, | ||
- must not have its reference count changed, | ||
- must not be shared with code that might do one of the above. | ||
|
||
This list is not exhaustive. Avoiding these uses is your responsibility; | ||
Python does not always check these requirements. | ||
|
||
To avoid accidentally exposing a partially-written string object, prefer | ||
using the :c:type:`PyUnicodeWriter` API, or one of the ``PyUnicode_From*`` | ||
functions below. | ||
|
||
|
||
.. versionadded:: 3.3 | ||
|
||
|
||
|
@@ -636,6 +643,9 @@ APIs: | |
possible. Returns ``-1`` and sets an exception on error, otherwise returns | ||
the number of copied characters. | ||
|
||
The string must not have been “used” yet. | ||
See :c:func:`PyUnicode_New` for details. | ||
|
||
.. versionadded:: 3.3 | ||
|
||
|
||
|
@@ -648,6 +658,9 @@ APIs: | |
Fail if *fill_char* is bigger than the string maximum character, or if the | ||
string has more than 1 reference. | ||
|
||
The string must not have been “used” yet. | ||
See :c:func:`PyUnicode_New` for details. | ||
|
||
Return the number of written character, or return ``-1`` and raise an | ||
exception on error. | ||
|
||
|
@@ -657,15 +670,16 @@ APIs: | |
.. c:function:: int PyUnicode_WriteChar(PyObject *unicode, Py_ssize_t index, \ | ||
Py_UCS4 character) | ||
|
||
Write a character to a string. The string must have been created through | ||
:c:func:`PyUnicode_New`. Since Unicode strings are supposed to be immutable, | ||
the string must not be shared, or have been hashed yet. | ||
Write a *character* to the string *unicode* at the zero-based *index*. | ||
Return ``0`` on success, ``-1`` on error with an exception set. | ||
|
||
This function checks that *unicode* is a Unicode object, that the index is | ||
not out of bounds, and that the object can be modified safely (i.e. that it | ||
its reference count is one). | ||
not out of bounds, and that the object's reference count is one). | ||
See :c:func:`PyUnicode_WRITE` for a version that skips these checks, | ||
making them your responsibility. | ||
|
||
Return ``0`` on success, ``-1`` on error with an exception set. | ||
The string must not have been “used” yet. | ||
See :c:func:`PyUnicode_New` for details. | ||
|
||
.. versionadded:: 3.3 | ||
|
||
|
@@ -1649,6 +1663,20 @@ They all return ``NULL`` or ``-1`` if an exception occurs. | |
Strings interned this way are made :term:`immortal`. | ||
|
||
|
||
.. c:function:: unsigned int PyUnicode_CHECK_INTERNED(PyObject *str) | ||
|
||
Return a non-zero value if *str* is interned, zero if not. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Most documentation uses "Return true" (60 occurrences), some use "Return non-zero" (13 occurrences) and one uses "Return a non-zero". In this case using "Return a non-zero" looks justified, as it may encode additional information. |
||
The *str* argument must be a string; this is not checked. | ||
This function always succeeds. | ||
|
||
.. impl-detail:: | ||
|
||
A non-zero return value may carry additional information | ||
about *how* the string is interned. | ||
The meaning of such non-zero values, as well as each specific string's | ||
intern-related details, may change between CPython versions. | ||
|
||
|
||
PyUnicodeWriter | ||
^^^^^^^^^^^^^^^ | ||
|
||
|
@@ -1769,8 +1797,8 @@ object. | |
*size* is the string length in bytes. If *size* is equal to ``-1``, call | ||
``strlen(str)`` to get the string length. | ||
|
||
*errors* is an error handler name, such as ``"replace"``. If *errors* is | ||
``NULL``, use the strict error handler. | ||
*errors* is an :ref:`error handler <error-handlers>` name, such as | ||
``"replace"``. If *errors* is ``NULL``, use the strict error handler. | ||
|
||
If *consumed* is not ``NULL``, set *\*consumed* to the number of decoded | ||
bytes on success. | ||
|
@@ -1781,3 +1809,49 @@ object. | |
On error, set an exception, leave the writer unchanged, and return ``-1``. | ||
|
||
See also :c:func:`PyUnicodeWriter_WriteUTF8`. | ||
|
||
Deprecated API | ||
^^^^^^^^^^^^^^ | ||
|
||
The following API is deprecated. | ||
|
||
.. c:type:: Py_UNICODE | ||
|
||
This is a typedef of :c:type:`wchar_t`, which is a 16-bit type or 32-bit type | ||
depending on the platform. | ||
Please use :c:type:`wchar_t` directly instead. | ||
|
||
.. versionchanged:: 3.3 | ||
In previous versions, this was a 16-bit type or a 32-bit type depending on | ||
whether you selected a "narrow" or "wide" Unicode version of Python at | ||
build time. | ||
|
||
.. deprecated-removed:: 3.13 3.15 | ||
|
||
|
||
.. c:function:: int PyUnicode_READY(PyObject *unicode) | ||
|
||
Do nothing and return ``0``. | ||
This API is kept only for backward compatibility, but there are no plans | ||
to remove it. | ||
|
||
.. versionadded:: 3.3 | ||
|
||
.. deprecated:: 3.10 | ||
This API does nothing since Python 3.12. | ||
Previously, this needed to be called for each string created using | ||
the old API (:c:func:`!PyUnicode_FromUnicode` or similar). | ||
|
||
|
||
.. c:function:: unsigned int PyUnicode_IS_READY(PyObject *unicode) | ||
|
||
Do nothing and return ``1``. | ||
This API is kept only for backward compatibility, but there are no plans | ||
to remove it. | ||
|
||
.. versionadded:: 3.3 | ||
|
||
.. deprecated:: next | ||
This API does nothing since Python 3.12. | ||
Previously, this could be called to check if | ||
:c:func:`PyUnicode_READY` is necessary. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Return true" is common for such functions (see for example
PyUnicode_Check()
).