Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit f5fec3c

Browse files
committed
Fill out the Unicode section, somewhat uncertainly
1 parent 8cfa905 commit f5fec3c

1 file changed

Lines changed: 24 additions & 7 deletions

File tree

Doc/whatsnew/whatsnew22.tex

Lines changed: 24 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -340,11 +340,21 @@ \section{Unicode Changes}
340340

341341
Python's Unicode support has been enhanced a bit in 2.2. Unicode
342342
strings are usually stored as UCS-2, as 16-bit unsigned integers.
343-
Python 2.2 can also be compiled to use UCS-4, 32-bit unsigned integers
344-
by supplying \longprogramopt{enable-unicode=ucs4} to the configure script.
345-
346-
XXX explain surrogates? I have to figure out what the changes mean to users.
347-
343+
Python 2.2 can also be compiled to use UCS-4, 32-bit unsigned
344+
integers, as its internal encoding by supplying
345+
\longprogramopt{enable-unicode=ucs4} to the configure script. When
346+
built to use UCS-4, in theory Python could handle Unicode characters
347+
from U-00000000 to U-7FFFFFFF. Being able to use UCS-4 internally is
348+
a necessary step to do that, but it's not the only step, and in Python
349+
2.2alpha1 the work isn't complete yet. For example, the
350+
\function{unichr()} function still only accepts values from 0 to
351+
65535, and there's no \code{\e U} notation for embedding characters
352+
greater than 65535 in a Unicode string literal. All this is the
353+
province of the still-unimplemented PEP 261, ``Support for `wide'
354+
Unicode characters''; consult it for further details, and please offer
355+
comments and suggestions on the proposal it describes.
356+
357+
Another change is much simpler to explain.
348358
Since their introduction, Unicode strings have supported an
349359
\method{encode()} method to convert the string to a selected encoding
350360
such as UTF-8 or Latin-1. A symmetric
@@ -375,9 +385,16 @@ \section{Unicode Changes}
375385
'furrfu'
376386
\end{verbatim}
377387

378-
References: http://mail.python.org/pipermail/i18n-sig/2001-June/001107.html
379-
and following thread.
388+
\method{encode()} and \method{decode()} were implemented by
389+
Marc-Andr\'e Lemburg. The changes to support using UCS-4 internally
390+
were implemented by Fredrik Lundh and Martin von L\"owis.
391+
392+
\begin{seealso}
393+
394+
\seepep{261}{Support for `wide' Unicode characters}{PEP written by
395+
Paul Prescod. Not yet accepted or fully implemented.}
380396

397+
\end{seealso}
381398

382399
%======================================================================
383400
\section{PEP 227: Nested Scopes}

0 commit comments

Comments
 (0)