Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit 2fd8bdb

Browse files
committed
Fix issue #15899: Make the unicode.rst doctests pass. Patch by Chris Jerdonek.
1 parent c8754a1 commit 2fd8bdb

1 file changed

Lines changed: 13 additions & 14 deletions

File tree

Doc/howto/unicode.rst

Lines changed: 13 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -257,13 +257,13 @@ converted according to the encoding's rules. Legal values for this argument are
257257
'REPLACEMENT CHARACTER'), or 'ignore' (just leave the character out of the
258258
Unicode result). The following examples show the differences::
259259

260-
>>> b'\x80abc'.decode("utf-8", "strict")
260+
>>> b'\x80abc'.decode("utf-8", "strict") #doctest: +NORMALIZE_WHITESPACE
261261
Traceback (most recent call last):
262-
File "<stdin>", line 1, in ?
263-
UnicodeDecodeError: 'utf8' codec can't decode byte 0x80 in position 0:
264-
unexpected code byte
262+
...
263+
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0:
264+
invalid start byte
265265
>>> b'\x80abc'.decode("utf-8", "replace")
266-
'?abc'
266+
'abc'
267267
>>> b'\x80abc'.decode("utf-8", "ignore")
268268
'abc'
269269

@@ -301,11 +301,11 @@ XML's character references. The following example shows the different results::
301301
>>> u = chr(40960) + 'abcd' + chr(1972)
302302
>>> u.encode('utf-8')
303303
b'\xea\x80\x80abcd\xde\xb4'
304-
>>> u.encode('ascii')
304+
>>> u.encode('ascii') #doctest: +NORMALIZE_WHITESPACE
305305
Traceback (most recent call last):
306-
File "<stdin>", line 1, in ?
306+
...
307307
UnicodeEncodeError: 'ascii' codec can't encode character '\ua000' in
308-
position 0: ordinal not in range(128)
308+
position 0: ordinal not in range(128)
309309
>>> u.encode('ascii', 'ignore')
310310
b'abcd'
311311
>>> u.encode('ascii', 'replace')
@@ -331,12 +331,11 @@ point. The ``\U`` escape sequence is similar, but expects eight hex digits,
331331
not four::
332332

333333
>>> s = "a\xac\u1234\u20ac\U00008000"
334-
^^^^ two-digit hex escape
335-
^^^^^ four-digit Unicode escape
336-
^^^^^^^^^^ eight-digit Unicode escape
337-
>>> for c in s: print(ord(c), end=" ")
338-
...
339-
97 172 4660 8364 32768
334+
... # ^^^^ two-digit hex escape
335+
... # ^^^^^^ four-digit Unicode escape
336+
... # ^^^^^^^^^^ eight-digit Unicode escape
337+
>>> [ord(c) for c in s]
338+
[97, 172, 4660, 8364, 32768]
340339

341340
Using escape sequences for code points greater than 127 is fine in small doses,
342341
but becomes an annoyance if you're using many accented characters, as you would

0 commit comments

Comments
 (0)