Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit 477efb3

Browse files
committed
#10790: make append work when output codec is different from input codec
There's still a bug here (the encode call shouldn't use the 'errors' paramter), but I'll fix that later.
1 parent ca1e7ec commit 477efb3

4 files changed

Lines changed: 26 additions & 24 deletions

File tree

Doc/library/email.header.rst

Lines changed: 9 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -94,14 +94,15 @@ Here is the :class:`Header` class description:
9494
decoded with that character set.
9595

9696
If *s* is an instance of :class:`str`, then *charset* is a hint specifying
97-
the character set of the characters in the string. In this case, when
98-
producing an :rfc:`2822`\ -compliant header using :rfc:`2047` rules, the
99-
Unicode string will be encoded using the following charsets in order:
100-
``us-ascii``, the *charset* hint, ``utf-8``. The first character set to
101-
not provoke a :exc:`UnicodeError` is used.
102-
103-
Optional *errors* is passed through to any :func:`encode` or
104-
:func:`ustr.encode` call, and defaults to "strict".
97+
the character set of the characters in the string.
98+
99+
In either case, when producing an :rfc:`2822`\ -compliant header using
100+
:rfc:`2047` rules, the string will be encoded using the output codec of
101+
the charset. If the string cannot be encoded using the output codec, a
102+
UnicodeError will be raised.
103+
104+
Optional *errors* is passed as the errors argument to the decode call
105+
if *s* is a byte string.
105106

106107

107108
.. method:: encode(splitchars=';, \\t', maxlinelen=None, linesep='\\n')

Lib/email/header.py

Lines changed: 10 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -245,32 +245,26 @@ def append(self, s, charset=None, errors='strict'):
245245
that byte string, and a UnicodeError will be raised if the string
246246
cannot be decoded with that charset. If s is a Unicode string, then
247247
charset is a hint specifying the character set of the characters in
248-
the string. In this case, when producing an RFC 2822 compliant header
249-
using RFC 2047 rules, the Unicode string will be encoded using the
250-
following charsets in order: us-ascii, the charset hint, utf-8. The
251-
first character set not to provoke a UnicodeError is used.
248+
the string. In either case, when producing an RFC 2822 compliant
249+
header using RFC 2047 rules, the string will be encoded using the
250+
output codec of the charset. If the string cannot be encoded to the
251+
output codec, a UnicodeError will be raised.
252252
253-
Optional `errors' is passed as the third argument to any unicode() or
254-
ustr.encode() call.
253+
Optional `errors' is passed as the errors argument to the decode
254+
call if s is a byte string.
255255
"""
256256
if charset is None:
257257
charset = self._charset
258258
elif not isinstance(charset, Charset):
259259
charset = Charset(charset)
260-
if isinstance(s, str):
261-
# Convert the string from the input character set to the output
262-
# character set and store the resulting bytes and the charset for
263-
# composition later.
260+
if not isinstance(s, str):
264261
input_charset = charset.input_codec or 'us-ascii'
265-
input_bytes = s.encode(input_charset, errors)
266-
else:
267-
# We already have the bytes we will store internally.
268-
input_bytes = s
262+
s = s.decode(input_charset, errors)
269263
# Ensure that the bytes we're storing can be decoded to the output
270264
# character set, otherwise an early error is thrown.
271265
output_charset = charset.output_codec or 'us-ascii'
272-
output_string = input_bytes.decode(output_charset, errors)
273-
self._chunks.append((output_string, charset))
266+
s.encode(output_charset, errors)
267+
self._chunks.append((s, charset))
274268

275269
def encode(self, splitchars=';, \t', maxlinelen=None, linesep='\n'):
276270
"""Encode a message header into an RFC-compliant format.

Lib/email/test/test_email.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3620,6 +3620,10 @@ def test_broken_base64_header(self):
36203620
s = 'Subject: =?EUC-KR?B?CSixpLDtKSC/7Liuvsax4iC6uLmwMcijIKHaILzSwd/H0SC8+LCjwLsgv7W/+Mj3I ?='
36213621
raises(errors.HeaderParseError, decode_header, s)
36223622

3623+
def test_shift_jis_charset(self):
3624+
h = Header('文', charset='shift_jis')
3625+
self.assertEqual(h.encode(), '=?iso-2022-jp?b?GyRCSjgbKEI=?=')
3626+
36233627

36243628

36253629
# Test RFC 2231 header parameters (en/de)coding

Misc/NEWS

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,9 @@ Core and Builtins
3030
Library
3131
-------
3232

33+
- Issue #10790: email.header.Header.append's charset logic now works correctly
34+
for charsets whose output codec is different from its input codec.
35+
3336
- Issue #10819: SocketIO.name property returns -1 when its closed, instead of
3437
raising a ValueError, to fix repr().
3538

0 commit comments

Comments
 (0)