@@ -52,25 +52,27 @@ This article explains the new features in Python 3.3, compared to 3.2.
5252PEP 393: Flexible String Representation
5353=======================================
5454
55- [Abstract copied from the PEP: The Unicode string type is changed to support
56- multiple internal representations, depending on the character with the largest
57- Unicode ordinal (1, 2, or 4 bytes). This allows a space-efficient
58- representation in common cases, but gives access to full UCS-4 on all systems.
59- For compatibility with existing APIs, several representations may exist in
60- parallel; over time, this compatibility should be phased out.]
55+ The Unicode string type is changed to support multiple internal
56+ representations, depending on the character with the largest Unicode ordinal
57+ (1, 2, or 4 bytes) in the represented string . This allows a space-efficient
58+ representation in common cases, but gives access to full UCS-4 on all
59+ systems. For compatibility with existing APIs, several representations may
60+ exist in parallel; over time, this compatibility should be phased out.
6161
62- PEP 393 is fully backward compatible. The legacy API should remain
63- available at least five years. Applications using the legacy API will not
64- fully benefit of the memory reduction, or worse may use a little bit more
65- memory, because Python may have to maintain two versions of each string (in
66- the legacy format and in the new efficient storage).
62+ On the Python side, there should be no downside to this change.
6763
68- XXX Add list of changes introduced by :pep: `393 ` here:
64+ On the C API side, PEP 393 is fully backward compatible. The legacy API
65+ should remain available at least five years. Applications using the legacy
66+ API will not fully benefit of the memory reduction, or - worse - may use
67+ a bit more memory, because Python may have to maintain two versions of each
68+ string (in the legacy format and in the new efficient storage).
69+
70+ Changes introduced by :pep: `393 ` are the following:
6971
7072* Python now always supports the full range of Unicode codepoints, including
7173 non-BMP ones (i.e. from ``U+0000 `` to ``U+10FFFF ``). The distinction between
7274 narrow and wide builds no longer exists and Python now behaves like a wide
73- build.
75+ build, even under Windows .
7476
7577* The storage of Unicode strings now depends on the highest codepoint in the string:
7678
@@ -86,18 +88,20 @@ XXX Add list of changes introduced by :pep:`393` here:
8688 XXX The result should be moved in the PEP and a small summary about
8789 performances and a link to the PEP should be added here.
8890
89- * Some of the problems visible on narrow builds have been fixed, for example:
91+ * With the death of narrow builds, the problems specific to narrow builds have
92+ also been fixed, for example:
9093
9194 * :func: `len ` now always returns 1 for non-BMP characters,
9295 so ``len('\U0010FFFF') == 1 ``;
9396
9497 * surrogate pairs are not recombined in string literals,
9598 so ``'\uDBFF\uDFFF' != '\U0010FFFF' ``;
9699
97- * indexing or slicing a non-BMP characters doesn't return surrogates anymore ,
100+ * indexing or slicing non-BMP characters returns the expected value ,
98101 so ``'\U0010FFFF'[0] `` now returns ``'\U0010FFFF' `` and not ``'\uDBFF' ``;
99102
100- * several other functions in the stdlib now handle correctly non-BMP codepoints.
103+ * several other functions in the standard library now handle correctly
104+ non-BMP codepoints.
101105
102106* The value of :data: `sys.maxunicode ` is now always ``1114111 `` (``0x10FFFF ``
103107 in hexadecimal). The :c:func: `PyUnicode_GetMax ` function still returns
0 commit comments