@@ -10,11 +10,12 @@ Unicode Objects and Codecs
1010Unicode Objects
1111^^^^^^^^^^^^^^^
1212
13+ Unicode Type
14+ """"""""""""
15+
1316These are the basic Unicode object types used for the Unicode implementation in
1417Python:
1518
16- .. % --- Unicode Type -------------------------------------------------------
17-
1819
1920.. ctype :: Py_UNICODE
2021
@@ -89,12 +90,13 @@ access internal read-only data of Unicode objects:
8990 Clear the free list. Return the total number of freed items.
9091
9192
93+ Unicode Character Properties
94+ """"""""""""""""""""""""""""
95+
9296Unicode provides many different character properties. The most often needed ones
9397are available through these macros which are mapped to C functions depending on
9498the Python configuration.
9599
96- .. % --- Unicode character properties ---------------------------------------
97-
98100
99101.. cfunction :: int Py_UNICODE_ISSPACE(Py_UNICODE ch)
100102
@@ -192,11 +194,13 @@ These APIs can be used for fast direct character conversions:
192194 Return the character *ch * converted to a double. Return ``-1.0 `` if this is not
193195 possible. This macro does not raise exceptions.
194196
197+
198+ Plain Py_UNICODE
199+ """"""""""""""""
200+
195201To create Unicode objects and access their basic sequence properties, use these
196202APIs:
197203
198- .. % --- Plain Py_UNICODE ---------------------------------------------------
199-
200204
201205.. cfunction :: PyObject* PyUnicode_FromUnicode(const Py_UNICODE *u, Py_ssize_t size)
202206
@@ -346,9 +350,47 @@ Python can interface directly to this type using the following functions.
346350Support is optimized if Python's own :ctype: `Py_UNICODE ` type is identical to
347351the system's :ctype: `wchar_t `.
348352
349- .. % --- wchar_t support for platforms which support it ---------------------
353+
354+ File System Encoding
355+ """"""""""""""""""""
356+
357+ To encode and decode file names and other environment strings,
358+ :cdata: `Py_FileSystemEncoding ` should be used as the encoding, and
359+ ``"surrogateescape" `` should be used as the error handler (:pep: `383 `). To
360+ encode file names during argument parsing, the ``"O&" `` converter should be
361+ used, passsing :func: PyUnicode_FSConverter as the conversion function:
362+
363+ .. cfunction :: int PyUnicode_FSConverter(PyObject* obj, void* result)
364+
365+ Convert *obj * into *result *, using :cdata: `Py_FileSystemDefaultEncoding `,
366+ and the ``"surrogateescape" `` error handler. *result * must be a
367+ ``PyObject* ``, return a :func: `bytes ` object which must be released if it
368+ is no longer used.
369+
370+ .. versionadded :: 3.1
371+
372+ .. cfunction :: PyObject* PyUnicode_DecodeFSDefaultAndSize(const char *s, Py_ssize_t size)
373+
374+ Decode a null-terminated string using :cdata: `Py_FileSystemDefaultEncoding `
375+ and the ``"surrogateescape" `` error handler.
376+
377+ If :cdata: `Py_FileSystemDefaultEncoding ` is not set, fall back to UTF-8.
378+
379+ Use :func: `PyUnicode_DecodeFSDefaultAndSize ` if you know the string length.
380+
381+ .. cfunction :: PyObject* PyUnicode_DecodeFSDefault(const char *s)
382+
383+ Decode a string using :cdata: `Py_FileSystemDefaultEncoding ` and
384+ the ``"surrogateescape" `` error handler.
385+
386+ If :cdata: `Py_FileSystemDefaultEncoding ` is not set, fall back to UTF-8.
350387
351388
389+ wchar_t Support
390+ """""""""""""""
391+
392+ wchar_t support for platforms which support it:
393+
352394.. cfunction :: PyObject* PyUnicode_FromWideChar(const wchar_t *w, Py_ssize_t size)
353395
354396 Create a Unicode object from the :ctype: `wchar_t ` buffer *w * of the given size.
@@ -395,9 +437,11 @@ built-in codecs is "strict" (:exc:`ValueError` is raised).
395437The codecs all use a similar interface. Only deviation from the following
396438generic ones are documented for simplicity.
397439
398- These are the generic codec APIs:
399440
400- .. % --- Generic Codecs -----------------------------------------------------
441+ Generic Codecs
442+ """"""""""""""
443+
444+ These are the generic codec APIs:
401445
402446
403447.. cfunction :: PyObject* PyUnicode_Decode(const char *s, Py_ssize_t size, const char *encoding, const char *errors)
@@ -426,9 +470,11 @@ These are the generic codec APIs:
426470 using the Python codec registry. Return *NULL * if an exception was raised by
427471 the codec.
428472
429- These are the UTF-8 codec APIs:
430473
431- .. % --- UTF-8 Codecs -------------------------------------------------------
474+ UTF-8 Codecs
475+ """"""""""""
476+
477+ These are the UTF-8 codec APIs:
432478
433479
434480.. cfunction :: PyObject* PyUnicode_DecodeUTF8(const char *s, Py_ssize_t size, const char *errors)
@@ -458,9 +504,11 @@ These are the UTF-8 codec APIs:
458504 object. Error handling is "strict". Return *NULL * if an exception was
459505 raised by the codec.
460506
461- These are the UTF-32 codec APIs:
462507
463- .. % --- UTF-32 Codecs ------------------------------------------------------ */
508+ UTF-32 Codecs
509+ """""""""""""
510+
511+ These are the UTF-32 codec APIs:
464512
465513
466514.. cfunction :: PyObject* PyUnicode_DecodeUTF32(const char *s, Py_ssize_t size, const char *errors, int *byteorder)
@@ -525,9 +573,10 @@ These are the UTF-32 codec APIs:
525573 Return *NULL * if an exception was raised by the codec.
526574
527575
528- These are the UTF-16 codec APIs:
576+ UTF-16 Codecs
577+ """""""""""""
529578
530- .. % --- UTF-16 Codecs ------------------------------------------------------ */
579+ These are the UTF-16 codec APIs:
531580
532581
533582.. cfunction :: PyObject* PyUnicode_DecodeUTF16(const char *s, Py_ssize_t size, const char *errors, int *byteorder)
@@ -591,9 +640,11 @@ These are the UTF-16 codec APIs:
591640 order. The string always starts with a BOM mark. Error handling is "strict".
592641 Return *NULL * if an exception was raised by the codec.
593642
594- These are the "Unicode Escape" codec APIs:
595643
596- .. % --- Unicode-Escape Codecs ----------------------------------------------
644+ Unicode-Escape Codecs
645+ """""""""""""""""""""
646+
647+ These are the "Unicode Escape" codec APIs:
597648
598649
599650.. cfunction :: PyObject* PyUnicode_DecodeUnicodeEscape(const char *s, Py_ssize_t size, const char *errors)
@@ -615,9 +666,11 @@ These are the "Unicode Escape" codec APIs:
615666 string object. Error handling is "strict". Return *NULL * if an exception was
616667 raised by the codec.
617668
618- These are the "Raw Unicode Escape" codec APIs:
619669
620- .. % --- Raw-Unicode-Escape Codecs ------------------------------------------
670+ Raw-Unicode-Escape Codecs
671+ """""""""""""""""""""""""
672+
673+ These are the "Raw Unicode Escape" codec APIs:
621674
622675
623676.. cfunction :: PyObject* PyUnicode_DecodeRawUnicodeEscape(const char *s, Py_ssize_t size, const char *errors)
@@ -639,11 +692,13 @@ These are the "Raw Unicode Escape" codec APIs:
639692 Python string object. Error handling is "strict". Return *NULL * if an exception
640693 was raised by the codec.
641694
695+
696+ Latin-1 Codecs
697+ """"""""""""""
698+
642699These are the Latin-1 codec APIs: Latin-1 corresponds to the first 256 Unicode
643700ordinals and only these are accepted by the codecs during encoding.
644701
645- .. % --- Latin-1 Codecs -----------------------------------------------------
646-
647702
648703.. cfunction :: PyObject* PyUnicode_DecodeLatin1(const char *s, Py_ssize_t size, const char *errors)
649704
@@ -664,11 +719,13 @@ ordinals and only these are accepted by the codecs during encoding.
664719 object. Error handling is "strict". Return *NULL * if an exception was
665720 raised by the codec.
666721
722+
723+ ASCII Codecs
724+ """"""""""""
725+
667726These are the ASCII codec APIs. Only 7-bit ASCII data is accepted. All other
668727codes generate errors.
669728
670- .. % --- ASCII Codecs -------------------------------------------------------
671-
672729
673730.. cfunction :: PyObject* PyUnicode_DecodeASCII(const char *s, Py_ssize_t size, const char *errors)
674731
@@ -689,9 +746,11 @@ codes generate errors.
689746 object. Error handling is "strict". Return *NULL * if an exception was
690747 raised by the codec.
691748
692- These are the mapping codec APIs:
693749
694- .. % --- Character Map Codecs -----------------------------------------------
750+ Character Map Codecs
751+ """"""""""""""""""""
752+
753+ These are the mapping codec APIs:
695754
696755This codec is special in that it can be used to implement many different codecs
697756(and this is in fact what was done to obtain most of the standard codecs
@@ -760,7 +819,9 @@ use the Win32 MBCS converters to implement the conversions. Note that MBCS (or
760819DBCS) is a class of encodings, not just one. The target encoding is defined by
761820the user settings on the machine running the codec.
762821
763- .. % --- MBCS codecs for Windows --------------------------------------------
822+
823+ MBCS codecs for Windows
824+ """""""""""""""""""""""
764825
765826
766827.. cfunction :: PyObject* PyUnicode_DecodeMBCS(const char *s, Py_ssize_t size, const char *errors)
@@ -790,20 +851,9 @@ the user settings on the machine running the codec.
790851 object. Error handling is "strict". Return *NULL * if an exception was
791852 raised by the codec.
792853
793- For decoding file names and other environment strings, :cdata: `Py_FileSystemEncoding `
794- should be used as the encoding, and ``"surrogateescape" `` should be used as the error
795- handler. For encoding file names during argument parsing, the ``O& `` converter should
796- be used, passsing PyUnicode_FSConverter as the conversion function:
797-
798- .. cfunction :: int PyUnicode_FSConverter(PyObject* obj, void* result)
799-
800- Convert *obj * into *result *, using the file system encoding, and the ``surrogateescape ``
801- error handler. *result * must be a ``PyObject* ``, yielding a bytes or bytearray object
802- which must be released if it is no longer used.
803-
804- .. versionadded :: 3.1
805854
806- .. % --- Methods & Slots ----------------------------------------------------
855+ Methods & Slots
856+ """""""""""""""
807857
808858
809859.. _unicodemethodsandslots :
0 commit comments