Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit 67f8f2f

Browse files
committed
append(): Fixing the test for convertability after consultation with
Ben. If s is a byte string, make sure it can be converted to unicode with the input codec, and from unicode with the output codec, or raise a UnicodeError exception early. Skip this test (and the unicode->byte string conversion) when the charset is our faux 8bit raw charset.
1 parent 816aebd commit 67f8f2f

1 file changed

Lines changed: 28 additions & 14 deletions

File tree

Lib/email/Header.py

Lines changed: 28 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -218,20 +218,34 @@ def append(self, s, charset=None):
218218
charset = self._charset
219219
elif not isinstance(charset, Charset):
220220
charset = Charset(charset)
221-
# Normalize and check the string
222-
if isinstance(s, StringType):
223-
# Possibly raise UnicodeError if it can't be encoded
224-
unicode(s, charset.get_output_charset())
225-
elif isinstance(s, UnicodeType):
226-
# Convert Unicode to byte string for later concatenation
227-
for charset in USASCII, charset, UTF8:
228-
try:
229-
s = s.encode(charset.get_output_charset())
230-
break
231-
except UnicodeError:
232-
pass
233-
else:
234-
assert False, 'Could not encode to utf-8'
221+
# If the charset is our faux 8bit charset, leave the string unchanged
222+
if charset <> '8bit':
223+
# We need to test that the string can be converted to unicode and
224+
# back to a byte string, given the input and output codecs of the
225+
# charset.
226+
if isinstance(s, StringType):
227+
# Possibly raise UnicodeError if the byte string can't be
228+
# converted to a unicode with the input codec of the charset.
229+
incodec = charset.input_codec or 'us-ascii'
230+
ustr = unicode(s, incodec)
231+
# Now make sure that the unicode could be converted back to a
232+
# byte string with the output codec, which may be different
233+
# than the iput coded. Still, use the original byte string.
234+
outcodec = charset.output_codec or 'us-ascii'
235+
ustr.encode(outcodec)
236+
elif isinstance(s, UnicodeType):
237+
# Now we have to be sure the unicode string can be converted
238+
# to a byte string with a reasonable output codec. We want to
239+
# use the byte string in the chunk.
240+
for charset in USASCII, charset, UTF8:
241+
try:
242+
outcodec = charset.output_codec or 'us-ascii'
243+
s = s.encode(outcodec)
244+
break
245+
except UnicodeError:
246+
pass
247+
else:
248+
assert False, 'utf-8 conversion failed'
235249
self._chunks.append((s, charset))
236250

237251
def _split(self, s, charset, firstline=False):

0 commit comments

Comments
 (0)