Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit 0d81c13

Browse files
author
Victor Stinner
committed
Issue #13617: Document that the result of the conversion of a Unicode object to
wchar*, Py_UNICODE* and bytes may contain embedded null characters/bytes. Patch written by Arnaud Calmettes.
2 parents 0f694d7 + 6fbd525 commit 0d81c13

2 files changed

Lines changed: 25 additions & 13 deletions

File tree

Doc/ACKS.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,7 @@ [email protected]), and we'll be glad to correct the problem.
3333
* Keith Briggs
3434
* Ian Bruntlett
3535
* Lee Busby
36+
* Arnaud Calmettes
3637
* Lorenzo M. Catucci
3738
* Carl Cerecke
3839
* Mauro Cicognini

Doc/c-api/unicode.rst

Lines changed: 24 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -649,9 +649,11 @@ Extension modules can continue using them, as they will not be removed in Python
649649
.. c:function:: Py_UNICODE* PyUnicode_AsUnicode(PyObject *unicode)
650650
651651
Return a read-only pointer to the Unicode object's internal
652-
:c:type:`Py_UNICODE` buffer, *NULL* if *unicode* is not a Unicode object.
653-
This will create the :c:type:`Py_UNICODE` representation of the object if it
654-
is not yet available.
652+
:c:type:`Py_UNICODE` buffer, or *NULL* on error. This will create the
653+
:c:type:`Py_UNICODE*` representation of the object if it is not yet
654+
available. Note that the resulting :c:type:`Py_UNICODE` string may contain
655+
embedded null characters, which would cause the string to be truncated when
656+
used in most C functions.
655657
656658
Please migrate to using :c:func:`PyUnicode_AsUCS4`,
657659
:c:func:`PyUnicode_Substring`, :c:func:`PyUnicode_ReadChar` or similar new
@@ -668,7 +670,9 @@ Extension modules can continue using them, as they will not be removed in Python
668670
.. c:function:: Py_UNICODE* PyUnicode_AsUnicodeAndSize(PyObject *unicode, Py_ssize_t *size)
669671
670672
Like :c:func:`PyUnicode_AsUnicode`, but also saves the :c:func:`Py_UNICODE`
671-
array length in *size*.
673+
array length in *size*. Note that the resulting :c:type:`Py_UNICODE*` string
674+
may contain embedded null characters, which would cause the string to be
675+
truncated when used in most C functions.
672676
673677
.. versionadded:: 3.3
674678
@@ -677,8 +681,10 @@ Extension modules can continue using them, as they will not be removed in Python
677681
678682
Create a copy of a Unicode string ending with a nul character. Return *NULL*
679683
and raise a :exc:`MemoryError` exception on memory allocation failure,
680-
otherwise return a new allocated buffer (use :c:func:`PyMem_Free` to free the
681-
buffer).
684+
otherwise return a new allocated buffer (use :c:func:`PyMem_Free` to free
685+
the buffer). Note that the resulting :c:type:`Py_UNICODE*` string may
686+
contain embedded null characters, which would cause the string to be
687+
truncated when used in most C functions.
682688
683689
.. versionadded:: 3.2
684690
@@ -817,7 +823,8 @@ used, passing :c:func:`PyUnicode_FSDecoder` as the conversion function:
817823
818824
Encode a Unicode object to :c:data:`Py_FileSystemDefaultEncoding` with the
819825
``'surrogateescape'`` error handler, or ``'strict'`` on Windows, and return
820-
:class:`bytes`.
826+
:class:`bytes`. Note that the resulting :class:`bytes` object may contain
827+
null bytes.
821828
822829
If :c:data:`Py_FileSystemDefaultEncoding` is not set, fall back to the
823830
locale encoding.
@@ -850,10 +857,12 @@ wchar_t Support
850857
Copy the Unicode object contents into the :c:type:`wchar_t` buffer *w*. At most
851858
*size* :c:type:`wchar_t` characters are copied (excluding a possibly trailing
852859
0-termination character). Return the number of :c:type:`wchar_t` characters
853-
copied or -1 in case of an error. Note that the resulting :c:type:`wchar_t`
860+
copied or -1 in case of an error. Note that the resulting :c:type:`wchar_t*`
854861
string may or may not be 0-terminated. It is the responsibility of the caller
855-
to make sure that the :c:type:`wchar_t` string is 0-terminated in case this is
856-
required by the application.
862+
to make sure that the :c:type:`wchar_t*` string is 0-terminated in case this is
863+
required by the application. Also, note that the :c:type:`wchar_t*` string
864+
might contain null characters, which would cause the string to be truncated
865+
when used with most C functions.
857866
858867
859868
.. c:function:: wchar_t* PyUnicode_AsWideCharString(PyObject *unicode, Py_ssize_t *size)
@@ -863,9 +872,11 @@ wchar_t Support
863872
of wide characters (excluding the trailing 0-termination character) into
864873
*\*size*.
865874
866-
Returns a buffer allocated by :c:func:`PyMem_Alloc` (use :c:func:`PyMem_Free`
867-
to free it) on success. On error, returns *NULL*, *\*size* is undefined and
868-
raises a :exc:`MemoryError`.
875+
Returns a buffer allocated by :c:func:`PyMem_Alloc` (use
876+
:c:func:`PyMem_Free` to free it) on success. On error, returns *NULL*,
877+
*\*size* is undefined and raises a :exc:`MemoryError`. Note that the
878+
resulting :c:type:`wchar_t` string might contain null characters, which
879+
would cause the string to be truncated when used with most C functions.
869880
870881
.. versionadded:: 3.2
871882

0 commit comments

Comments
 (0)