Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit b83ee30

Browse files
committed
#11454: Reduce email module load time, improve surrogate check efficiency.
The new _has_surrogates code was suggested by Serhiy Storchaka. See the issue for timings, but it is far faster than any other alternative, and also removes the load time that we previously incurred from compiling the complex regex this replaces.
1 parent dd3a6a5 commit b83ee30

1 file changed

Lines changed: 10 additions & 4 deletions

File tree

Lib/email/utils.py

Lines changed: 10 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -54,10 +54,16 @@
5454
specialsre = re.compile(r'[][\\()<>@,:;".]')
5555
escapesre = re.compile(r'[\\"]')
5656

57-
# How to figure out if we are processing strings that come from a byte
58-
# source with undecodable characters.
59-
_has_surrogates = re.compile(
60-
'([^\ud800-\udbff]|\A)[\udc00-\udfff]([^\udc00-\udfff]|\Z)').search
57+
def _has_surrogates(s):
58+
"""Return True if s contains surrogate-escaped binary data."""
59+
# This check is based on the fact that unless there are surrogates, utf8
60+
# (Python's default encoding) can encode any string. This is the fastest
61+
# way to check for surrogates, see issue 11454 for timings.
62+
try:
63+
s.encode()
64+
return False
65+
except UnicodeEncodeError:
66+
return True
6167

6268
# How to deal with a string containing bytes before handing it to the
6369
# application through the 'normal' interface.

0 commit comments

Comments
 (0)