Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit b965b39

Browse files
committed
Elaborate on representations and canonical/legacy unicode objects
1 parent e6b99a1 commit b965b39

1 file changed

Lines changed: 15 additions & 1 deletion

File tree

Doc/c-api/unicode.rst

Lines changed: 15 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,21 @@ for strings where all code points are below 128, 256, or 65536; otherwise, code
1818
points must be below 1114112 (which is the full Unicode range).
1919

2020
:c:type:`Py_UNICODE*` and UTF-8 representations are created on demand and cached
21-
in the Unicode object.
21+
in the Unicode object. The :c:type:`Py_UNICODE*` representation is deprecated
22+
and inefficient; it should be avoided in performance- or memory-sensitive
23+
situations.
24+
25+
Due to the transition between the old APIs and the new APIs, unicode objects
26+
can internally be in two states depending on how they were created:
27+
28+
* "canonical" unicode objects are all objects created by a non-deprecated
29+
unicode API. They use the most efficient representation allowed by the
30+
implementation.
31+
32+
* "legacy" unicode objects have been created through one of the deprecated
33+
APIs (typically :c:func:`PyUnicode_FromUnicode`) and only bear the
34+
:c:type:`Py_UNICODE*` representation; you will have to call
35+
:c:func:`PyUnicode_READY` on them before calling any other API.
2236

2337

2438
Unicode Type

0 commit comments

Comments
 (0)