@@ -257,13 +257,13 @@ converted according to the encoding's rules. Legal values for this argument are
257257'REPLACEMENT CHARACTER'), or 'ignore' (just leave the character out of the
258258Unicode result). The following examples show the differences::
259259
260- >>> b'\x80abc'.decode("utf-8", "strict")
260+ >>> b'\x80abc'.decode("utf-8", "strict") #doctest: +NORMALIZE_WHITESPACE
261261 Traceback (most recent call last):
262- File "<stdin>", line 1, in ?
263- UnicodeDecodeError: 'utf8 ' codec can't decode byte 0x80 in position 0:
264- unexpected code byte
262+ ...
263+ UnicodeDecodeError: 'utf-8 ' codec can't decode byte 0x80 in position 0:
264+ invalid start byte
265265 >>> b'\x80abc'.decode("utf-8", "replace")
266- '? abc'
266+ '� abc'
267267 >>> b'\x80abc'.decode("utf-8", "ignore")
268268 'abc'
269269
@@ -301,11 +301,11 @@ XML's character references. The following example shows the different results::
301301 >>> u = chr(40960) + 'abcd' + chr(1972)
302302 >>> u.encode('utf-8')
303303 b'\xea\x80\x80abcd\xde\xb4'
304- >>> u.encode('ascii')
304+ >>> u.encode('ascii') #doctest: +NORMALIZE_WHITESPACE
305305 Traceback (most recent call last):
306- File "<stdin>", line 1, in ?
306+ ...
307307 UnicodeEncodeError: 'ascii' codec can't encode character '\ua000' in
308- position 0: ordinal not in range(128)
308+ position 0: ordinal not in range(128)
309309 >>> u.encode('ascii', 'ignore')
310310 b'abcd'
311311 >>> u.encode('ascii', 'replace')
@@ -331,12 +331,11 @@ point. The ``\U`` escape sequence is similar, but expects eight hex digits,
331331not four::
332332
333333 >>> s = "a\xac\u1234\u20ac\U00008000"
334- ^^^^ two-digit hex escape
335- ^^^^^ four-digit Unicode escape
336- ^^^^^^^^^^ eight-digit Unicode escape
337- >>> for c in s: print(ord(c), end=" ")
338- ...
339- 97 172 4660 8364 32768
334+ ... # ^^^^ two-digit hex escape
335+ ... # ^^^^^^ four-digit Unicode escape
336+ ... # ^^^^^^^^^^ eight-digit Unicode escape
337+ >>> [ord(c) for c in s]
338+ [97, 172, 4660, 8364, 32768]
340339
341340Using escape sequences for code points greater than 127 is fine in small doses,
342341but becomes an annoyance if you're using many accented characters, as you would
0 commit comments