Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit f567727

Browse files
committed
Merge with 3.3
2 parents 39295e7 + c7b6c50 commit f567727

5 files changed

Lines changed: 41 additions & 15 deletions

File tree

Doc/library/codecs.rst

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -78,7 +78,11 @@ It defines the following functions:
7878
reference (for encoding only)
7979
* ``'backslashreplace'``: replace with backslashed escape sequences (for
8080
encoding only)
81-
* ``'surrogateescape'``: replace with surrogate U+DCxx, see :pep:`383`
81+
* ``'surrogateescape'``: on decoding, replace with code points in the Unicode
82+
Private Use Area ranging from U+DC80 to U+DCFF. These private code
83+
points will then be turned back into the same bytes when the
84+
``surrogateescape`` error handler is used when encoding the data.
85+
(See :pep:`383` for more.)
8286

8387
as well as any other error handling name defined via :func:`register_error`.
8488

Doc/library/functions.rst

Lines changed: 30 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -905,16 +905,36 @@ are always available. They are listed here in alphabetical order.
905905
the list of supported encodings.
906906

907907
*errors* is an optional string that specifies how encoding and decoding
908-
errors are to be handled--this cannot be used in binary mode. Pass
909-
``'strict'`` to raise a :exc:`ValueError` exception if there is an encoding
910-
error (the default of ``None`` has the same effect), or pass ``'ignore'`` to
911-
ignore errors. (Note that ignoring encoding errors can lead to data loss.)
912-
``'replace'`` causes a replacement marker (such as ``'?'``) to be inserted
913-
where there is malformed data. When writing, ``'xmlcharrefreplace'``
914-
(replace with the appropriate XML character reference) or
915-
``'backslashreplace'`` (replace with backslashed escape sequences) can be
916-
used. Any other error handling name that has been registered with
917-
:func:`codecs.register_error` is also valid.
908+
errors are to be handled--this cannot be used in binary mode.
909+
A variety of standard error handlers are available, though any
910+
error handling name that has been registered with
911+
:func:`codecs.register_error` is also valid. The standard names
912+
are:
913+
914+
* ``'strict'`` to raise a :exc:`ValueError` exception if there is
915+
an encoding error. The default value of ``None`` has the same
916+
effect.
917+
918+
* ``'ignore'`` ignores errors. Note that ignoring encoding errors
919+
can lead to data loss.
920+
921+
* ``'replace'`` causes a replacement marker (such as ``'?'``) to be inserted
922+
where there is malformed data.
923+
924+
* ``'surrogateescape'`` will represent any incorrect bytes as code
925+
points in the Unicode Private Use Area ranging from U+DC80 to
926+
U+DCFF. These private code points will then be turned back into
927+
the same bytes when the ``surrogateescape`` error handler is used
928+
when writing data. This is useful for processing files in an
929+
unknown encoding.
930+
931+
* ``'xmlcharrefreplace'`` is only supported when writing to a file.
932+
Characters not supported by the encoding are replaced with the
933+
appropriate XML character reference ``&#nnn;``.
934+
935+
* ``'backslashreplace'`` (also only supported when writing)
936+
replaces unsupported characters with Python's backslashed escape
937+
sequences.
918938

919939
.. index::
920940
single: universal newlines; open() built-in function

Lib/codecs.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -105,6 +105,7 @@ class Codec:
105105
Python will use the official U+FFFD REPLACEMENT
106106
CHARACTER for the builtin Unicode codecs on
107107
decoding and '?' on encoding.
108+
'surrogateescape' - replace with private codepoints U+DCnn.
108109
'xmlcharrefreplace' - Replace with the appropriate XML
109110
character reference (only for encoding).
110111
'backslashreplace' - Replace with backslashed escape sequences

Modules/_io/_iomodule.c

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -168,8 +168,8 @@ PyDoc_STRVAR(open_doc,
168168
"'strict' to raise a ValueError exception if there is an encoding error\n"
169169
"(the default of None has the same effect), or pass 'ignore' to ignore\n"
170170
"errors. (Note that ignoring encoding errors can lead to data loss.)\n"
171-
"See the documentation for codecs.register for a list of the permitted\n"
172-
"encoding error strings.\n"
171+
"See the documentation for codecs.register or run 'help(codecs.Codec)'\n"
172+
"for a list of the permitted encoding error strings.\n"
173173
"\n"
174174
"newline controls how universal newlines works (it only applies to text\n"
175175
"mode). It can be None, '', '\\n', '\\r', and '\\r\\n'. It works as\n"

Modules/_io/textio.c

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -642,8 +642,9 @@ PyDoc_STRVAR(textiowrapper_doc,
642642
"encoding gives the name of the encoding that the stream will be\n"
643643
"decoded or encoded with. It defaults to locale.getpreferredencoding(False).\n"
644644
"\n"
645-
"errors determines the strictness of encoding and decoding (see the\n"
646-
"codecs.register) and defaults to \"strict\".\n"
645+
"errors determines the strictness of encoding and decoding (see\n"
646+
"help(codecs.Codec) or the documentation for codecs.register) and\n"
647+
"defaults to \"strict\".\n"
647648
"\n"
648649
"newline controls how line endings are handled. It can be None, '',\n"
649650
"'\\n', '\\r', and '\\r\\n'. It works as follows:\n"

0 commit comments

Comments
 (0)