@@ -340,11 +340,21 @@ \section{Unicode Changes}
340340
341341Python's Unicode support has been enhanced a bit in 2.2. Unicode
342342strings are usually stored as UCS-2, as 16-bit unsigned integers.
343- Python 2.2 can also be compiled to use UCS-4, 32-bit unsigned integers
344- by supplying \longprogramopt {enable-unicode=ucs4} to the configure script.
345-
346- XXX explain surrogates? I have to figure out what the changes mean to users.
347-
343+ Python 2.2 can also be compiled to use UCS-4, 32-bit unsigned
344+ integers, as its internal encoding by supplying
345+ \longprogramopt {enable-unicode=ucs4} to the configure script. When
346+ built to use UCS-4, in theory Python could handle Unicode characters
347+ from U-00000000 to U-7FFFFFFF. Being able to use UCS-4 internally is
348+ a necessary step to do that, but it's not the only step, and in Python
349+ 2.2alpha1 the work isn't complete yet. For example, the
350+ \function {unichr()} function still only accepts values from 0 to
351+ 65535, and there's no \code {\e U} notation for embedding characters
352+ greater than 65535 in a Unicode string literal. All this is the
353+ province of the still-unimplemented PEP 261, `` Support for `wide'
354+ Unicode characters'' ; consult it for further details, and please offer
355+ comments and suggestions on the proposal it describes.
356+
357+ Another change is much simpler to explain.
348358Since their introduction, Unicode strings have supported an
349359\method {encode()} method to convert the string to a selected encoding
350360such as UTF-8 or Latin-1. A symmetric
@@ -375,9 +385,16 @@ \section{Unicode Changes}
375385'furrfu'
376386\end {verbatim }
377387
378- References: http://mail.python.org/pipermail/i18n-sig/2001-June/001107.html
379- and following thread.
388+ \method {encode()} and \method {decode()} were implemented by
389+ Marc-Andr\' e Lemburg. The changes to support using UCS-4 internally
390+ were implemented by Fredrik Lundh and Martin von L\" owis.
391+
392+ \begin {seealso }
393+
394+ \seepep {261}{Support for `wide' Unicode characters}{PEP written by
395+ Paul Prescod. Not yet accepted or fully implemented.}
380396
397+ \end {seealso }
381398
382399% ======================================================================
383400\section {PEP 227: Nested Scopes }
0 commit comments