Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit 53a9dd7

Browse files
author
Victor Stinner
committed
Issue #10546: UTF-16-LE and UTF-16-BE *do* support non-BMP characters
Fix the doc and add tests.
1 parent 84cc062 commit 53a9dd7

2 files changed

Lines changed: 14 additions & 2 deletions

File tree

Doc/library/codecs.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1114,9 +1114,9 @@ particular, the following variants typically exist:
11141114
+-----------------+--------------------------------+--------------------------------+
11151115
| utf_16 | U16, utf16 | all languages |
11161116
+-----------------+--------------------------------+--------------------------------+
1117-
| utf_16_be | UTF-16BE | all languages (BMP only) |
1117+
| utf_16_be | UTF-16BE | all languages |
11181118
+-----------------+--------------------------------+--------------------------------+
1119-
| utf_16_le | UTF-16LE | all languages (BMP only) |
1119+
| utf_16_le | UTF-16LE | all languages |
11201120
+-----------------+--------------------------------+--------------------------------+
11211121
| utf_7 | U7, unicode-1-1-utf-7 | all languages |
11221122
+-----------------+--------------------------------+--------------------------------+

Lib/test/test_codecs.py

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -544,6 +544,12 @@ def test_errors(self):
544544
self.assertRaises(UnicodeDecodeError, codecs.utf_16_le_decode,
545545
b"\xff", "strict", True)
546546

547+
def test_nonbmp(self):
548+
self.assertEqual("\U00010203".encode(self.encoding),
549+
b'\x00\xd8\x03\xde')
550+
self.assertEqual(b'\x00\xd8\x03\xde'.decode(self.encoding),
551+
"\U00010203")
552+
547553
class UTF16BETest(ReadTest):
548554
encoding = "utf-16-be"
549555

@@ -566,6 +572,12 @@ def test_errors(self):
566572
self.assertRaises(UnicodeDecodeError, codecs.utf_16_be_decode,
567573
b"\xff", "strict", True)
568574

575+
def test_nonbmp(self):
576+
self.assertEqual("\U00010203".encode(self.encoding),
577+
b'\xd8\x00\xde\x03')
578+
self.assertEqual(b'\xd8\x00\xde\x03'.decode(self.encoding),
579+
"\U00010203")
580+
569581
class UTF8Test(ReadTest):
570582
encoding = "utf-8"
571583

0 commit comments

Comments
 (0)