Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit 8680bcc

Browse files
committed
#14380: Have MIMEText defaults to utf-8 when passed non-ASCII unicode
Previously it would just accept the unicode, which would wind up as unicode in the transfer-encoded message object, which is just wrong. Patch by Jeff Knupp.
1 parent 192195a commit 8680bcc

5 files changed

Lines changed: 31 additions & 4 deletions

File tree

Doc/library/email.mime.rst

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -175,7 +175,7 @@ Here are the classes:
175175

176176
.. currentmodule:: email.mime.text
177177

178-
.. class:: MIMEText(_text, _subtype='plain', _charset='us-ascii')
178+
.. class:: MIMEText(_text, _subtype='plain', _charset=None)
179179

180180
Module: :mod:`email.mime.text`
181181

@@ -185,5 +185,5 @@ Here are the classes:
185185
minor type and defaults to :mimetype:`plain`. *_charset* is the character
186186
set of the text and is passed as a parameter to the
187187
:class:`~email.mime.nonmultipart.MIMENonMultipart` constructor; it defaults
188-
to ``us-ascii``. No guessing or encoding is performed on the text data.
189-
188+
to ``us-ascii`` if the string contains only ``ascii`` codepoints, and
189+
``utf-8`` otherwise.

Lib/email/mime/text.py

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,4 +27,14 @@ def __init__(self, _text, _subtype='plain', _charset='us-ascii'):
2727
"""
2828
MIMENonMultipart.__init__(self, 'text', _subtype,
2929
**{'charset': _charset})
30+
31+
# If _charset was defualted, check to see see if there are non-ascii
32+
# characters present. Default to utf-8 if there are.
33+
# XXX: This can be removed once #7304 is fixed.
34+
if _charset =='us-ascii':
35+
try:
36+
_text.encode(_charset)
37+
except UnicodeEncodeError:
38+
_charset = 'utf-8'
39+
3040
self.set_payload(_text, _charset)

Lib/test/test_email/test_email.py

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -617,6 +617,19 @@ def test_unicode_header_defaults_to_utf8_encoding(self):
617617
abc
618618
"""))
619619

620+
def test_unicode_body_defaults_to_utf8_encoding(self):
621+
# Issue 14291
622+
m = MIMEText('É testabc\n')
623+
self.assertEqual(str(m),textwrap.dedent("""\
624+
MIME-Version: 1.0
625+
Content-Type: text/plain; charset="utf-8"
626+
Content-Transfer-Encoding: base64
627+
628+
w4kgdGVzdGFiYwo=
629+
"""))
630+
631+
632+
620633
# Test the email.encoders module
621634
class TestEncoders(unittest.TestCase):
622635

@@ -642,7 +655,7 @@ def test_default_cte(self):
642655
eq(msg['content-transfer-encoding'], '7bit')
643656
# Similar, but with 8bit data
644657
msg = MIMEText('hello \xf8 world')
645-
eq(msg['content-transfer-encoding'], '8bit')
658+
eq(msg['content-transfer-encoding'], 'base64')
646659
# And now with a different charset
647660
msg = MIMEText('hello \xf8 world', _charset='iso-8859-1')
648661
eq(msg['content-transfer-encoding'], 'quoted-printable')

Misc/ACKS

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -548,6 +548,7 @@ Thomas Kluyver
548548
Kim Knapp
549549
Lenny Kneler
550550
Pat Knight
551+
Jeff Knupp
551552
Greg Kochanski
552553
Damon Kohler
553554
Marko Kohtala

Misc/NEWS

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,9 @@ Core and Builtins
3434
Library
3535
-------
3636

37+
- Issue #14380: MIMEText now defaults to utf-8 when passed non-ASCII unicode
38+
with no charset specified.
39+
3740
- Issue #10340: asyncore - properly handle EINVAL in dispatcher constructor on
3841
OSX; avoid to call handle_connect in case of a disconnected socket which
3942
was not meant to connect.

0 commit comments

Comments
 (0)