Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit ab01087

Browse files
committed
Revise the Unicode section after getting comments from MAL, GvR, and others.
Add new low-level API for interpreter introspection Bump version number.
1 parent 3550dd3 commit ab01087

1 file changed

Lines changed: 49 additions & 25 deletions

File tree

Doc/whatsnew/whatsnew22.tex

Lines changed: 49 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
% $Id$
44

55
\title{What's New in Python 2.2}
6-
\release{0.03}
6+
\release{0.04}
77
\author{A.M. Kuchling}
88
\authoraddress{\email{[email protected]}}
99
\begin{document}
@@ -339,32 +339,46 @@ \section{PEP 255: Simple Generators}
339339
\section{Unicode Changes}
340340

341341
Python's Unicode support has been enhanced a bit in 2.2. Unicode
342-
strings are usually stored as UCS-2, as 16-bit unsigned integers.
342+
strings are usually stored as UTF-16, as 16-bit unsigned integers.
343343
Python 2.2 can also be compiled to use UCS-4, 32-bit unsigned
344344
integers, as its internal encoding by supplying
345345
\longprogramopt{enable-unicode=ucs4} to the configure script. When
346-
built to use UCS-4, in theory Python could handle Unicode characters
347-
from U-00000000 to U-7FFFFFFF. Being able to use UCS-4 internally is
348-
a necessary step to do that, but it's not the only step, and in Python
349-
2.2alpha1 the work isn't complete yet. For example, the
350-
\function{unichr()} function still only accepts values from 0 to
351-
65535, and there's no \code{\e U} notation for embedding characters
352-
greater than 65535 in a Unicode string literal. All this is the
353-
province of the still-unimplemented PEP 261, ``Support for `wide'
354-
Unicode characters''; consult it for further details, and please offer
355-
comments and suggestions on the proposal it describes.
356-
357-
Another change is much simpler to explain.
358-
Since their introduction, Unicode strings have supported an
359-
\method{encode()} method to convert the string to a selected encoding
360-
such as UTF-8 or Latin-1. A symmetric
361-
\method{decode(\optional{\var{encoding}})} method has been added to
362-
both 8-bit and Unicode strings in 2.2, which assumes that the string
363-
is in the specified encoding and decodes it. This means that
364-
\method{encode()} and \method{decode()} can be called on both types of
365-
strings, and can be used for tasks not directly related to Unicode.
366-
For example, codecs have been added for UUencoding, MIME's base-64
367-
encoding, and compression with the \module{zlib} module.
346+
built to use UCS-4 (a ``wide Python''), the interpreter can natively
347+
handle Unicode characters from U+000000 to U+110000. The range of
348+
legal values for the \function{unichr()} function has been expanded;
349+
it used to only accept values up to 65535, but in 2.2 will accept
350+
values from 0 to 0x110000. Using a ``narrow Python'', an interpreter
351+
compiled to use UTF-16, values greater than 65535 will result in
352+
\function{unichr()} returning a string of length 2:
353+
354+
\begin{verbatim}
355+
>>> s = unichr(65536)
356+
>>> s
357+
u'\ud800\udc00'
358+
>>> len(s)
359+
2
360+
\end{verbatim}
361+
362+
This possibly-confusing behaviour, breaking the intuitive invariant
363+
that \function{chr()} and\function{unichr()} always return strings of
364+
length 1, may be changed later in 2.2 depending on public reaction.
365+
366+
All this is the province of the still-unimplemented PEP 261, ``Support
367+
for `wide' Unicode characters''; consult it for further details, and
368+
please offer comments and suggestions on the proposal it describes.
369+
370+
Another change is much simpler to explain. Since their introduction,
371+
Unicode strings have supported an \method{encode()} method to convert
372+
the string to a selected encoding such as UTF-8 or Latin-1. A
373+
symmetric \method{decode(\optional{\var{encoding}})} method has been
374+
added to 8-bit strings (though not to Unicode strings) in 2.2.
375+
\method{decode()} assumes that the string is in the specified encoding
376+
and decodes it, returning whatever is returned by the codec.
377+
378+
Using this new feature, codecs have been added for tasks not directly
379+
related to Unicode. For example, codecs have been added for
380+
uu-encoding, MIME's base64 encoding, and compression with the
381+
\module{zlib} module:
368382

369383
\begin{verbatim}
370384
>>> s = """Here is a lengthy piece of redundant, overly verbose,
@@ -610,6 +624,15 @@ \section{Other Changes and Fixes}
610624
been changed to use the new C-level interface. (Contributed by Fred
611625
L. Drake, Jr.)
612626

627+
\item Another low-level API, primarily of interest to implementors
628+
of Python debuggers and development tools, was added.
629+
\cfunction{PyInterpreterState_Head()} and
630+
\cfunction{PyInterpreterState_Next()} let a caller walk through all
631+
the existing interpreter objects;
632+
\cfunction{PyInterpreterState_ThreadHead()} and
633+
\cfunction{PyThreadState_Next()} allow looping over all the thread
634+
states for a given interpreter. (Contributed by David Beazley.)
635+
613636
% XXX is this explanation correct?
614637
\item When presented with a Unicode filename on Windows, Python will
615638
now correctly convert it to a string using the MBCS encoding.
@@ -668,6 +691,7 @@ \section{Acknowledgements}
668691

669692
The author would like to thank the following people for offering
670693
suggestions and corrections to various drafts of this article: Fred
671-
Bremmer, Fred L. Drake, Jr., Tim Peters, Neil Schemenauer.
694+
Bremmer, Fred L. Drake, Jr., Marc-Andr\'e Lemburg,
695+
Tim Peters, Neil Schemenauer, Guido van Rossum.
672696

673697
\end{document}

0 commit comments

Comments
 (0)