Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit 2940e71

Browse files
committed
#15220: simplify and speed up feedparser's line splitting.
Original patch submitted by QNX, modified for clarity by me (mostly comments). QNX reports a 30% speed up in average email parsing time.
1 parent f0bf84c commit 2940e71

2 files changed

Lines changed: 12 additions & 18 deletions

File tree

Lib/email/feedparser.py

Lines changed: 9 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -98,24 +98,15 @@ def push(self, data):
9898
"""Push some new data into this object."""
9999
# Handle any previous leftovers
100100
data, self._partial = self._partial + data, ''
101-
# Crack into lines, but preserve the newlines on the end of each
102-
parts = NLCRE_crack.split(data)
103-
# The *ahem* interesting behaviour of re.split when supplied grouping
104-
# parentheses is that the last element of the resulting list is the
105-
# data after the final RE. In the case of a NL/CR terminated string,
106-
# this is the empty string.
107-
self._partial = parts.pop()
108-
#GAN 29Mar09 bugs 1555570, 1721862 Confusion at 8K boundary ending with \r:
109-
# is there a \n to follow later?
110-
if not self._partial and parts and parts[-1].endswith('\r'):
111-
self._partial = parts.pop(-2)+parts.pop()
112-
# parts is a list of strings, alternating between the line contents
113-
# and the eol character(s). Gather up a list of lines after
114-
# re-attaching the newlines.
115-
lines = []
116-
for i in range(len(parts) // 2):
117-
lines.append(parts[i*2] + parts[i*2+1])
118-
self.pushlines(lines)
101+
# Crack into lines, but preserve the linesep characters on the end of each
102+
parts = data.splitlines(True)
103+
# If the last element of the list does not end in a newline, then treat
104+
# it as a partial line. We only check for '\n' here because a line
105+
# ending with '\r' might be a line that was split in the middle of a
106+
# '\r\n' sequence (see bugs 1555570 and 1721862).
107+
if parts and not parts[-1].endswith('\n'):
108+
self._partial = parts.pop()
109+
self.pushlines(parts)
119110

120111
def pushlines(self, lines):
121112
# Reverse and insert at the front of the lines.

Misc/NEWS

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -253,6 +253,9 @@ Core and Builtins
253253
Library
254254
-------
255255

256+
- Issue #15220: email.feedparser's line splitting algorithm is now simpler and
257+
faster.
258+
256259
- Issue #16743: Fix mmap overflow check on 32 bit Windows.
257260

258261
- Issue #16996: webbrowser module now uses shutil.which() to find a

0 commit comments

Comments
 (0)