Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit 20eae69

Browse files
committed
Document PEP 293.
1 parent bd5e38d commit 20eae69

1 file changed

Lines changed: 21 additions & 1 deletion

File tree

Doc/whatsnew/whatsnew23.tex

Lines changed: 21 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -492,7 +492,27 @@ \section{PEP 285: The \class{bool} Type\label{section-bool}}
492492
%======================================================================
493493
\section{PEP 293: Codec Error Handling Callbacks}
494494

495-
XXX write this section
495+
When encoding a Unicode string into a byte string, unencodable
496+
characters may be encountered. So far, Python allowed to specify the
497+
error processing as either ``strict'' (raise \code{UnicodeError},
498+
default), ``ignore'' (skip the character), or ``replace'' (with
499+
question mark). It may be desirable to specify an alternative
500+
processing of the error, e.g. by inserting an XML character reference
501+
or HTML entity reference into the converted string.
502+
503+
Python now has a flexible framework to add additional processing
504+
strategies; new error handlers can be added with
505+
\function{codecs.register_error}. Codecs then can access the error
506+
handler with \code{codecs.lookup_error}. An equivalent C API has been
507+
added for codecs written in C. The error handler gets various state
508+
information, such as the string being converted, the position in the
509+
string where the error was detected, and the target encoding. It can
510+
then either raise an exception, or return a replacement string.
511+
512+
Two additional error handlers have been implemented using this
513+
framework: ``backslashreplace'' using Python backslash quoting to
514+
represent the unencodable character, and ``xmlcharrefreplace'' emits
515+
XML character references.
496516

497517
\begin{seealso}
498518

0 commit comments

Comments
 (0)