Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit bc8e642

Browse files
committed
If the data read from the bytestream in readline() ends in a '\r' read one more
byte, even if the user has passed a size parameter. This extra byte shouldn't cause a buffer overflow in the tokenizer. The original plan was to return a line ending in '\r', which might be recognizable as a complete line and skip any '\n' that was read afterwards. Unfortunately this didn't work, as the tokenizer only recognizes '\n' as line ends, which in turn lead to joined lines and SyntaxErrors, so this special treatment of a split '\r\n' has been dropped. (It can only happen with a temporarily exhausted bytestream now anyway.) Fixes parts of SF bugs #1163244 and #1175396.
1 parent 49ab700 commit bc8e642

2 files changed

Lines changed: 10 additions & 12 deletions

File tree

Lib/codecs.py

Lines changed: 4 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -230,7 +230,6 @@ def __init__(self, stream, errors='strict'):
230230
self.errors = errors
231231
self.bytebuffer = ""
232232
self.charbuffer = u""
233-
self.atcr = False
234233

235234
def decode(self, input, errors='strict'):
236235
raise NotImplementedError
@@ -306,18 +305,12 @@ def readline(self, size=None, keepends=True):
306305
# If size is given, we call read() only once
307306
while True:
308307
data = self.read(readsize)
309-
if self.atcr and data.startswith(u"\n"):
310-
data = data[1:]
311308
if data:
312-
self.atcr = data.endswith(u"\r")
313-
# If we're at a "\r" (and are allowed to read more), read one
314-
# extra character (which might be a "\n") to get a proper
315-
# line ending. (If the stream is temporarily exhausted we return
316-
# the wrong line ending, but at least we won't generate a bogus
317-
# second line.)
318-
if self.atcr and size is None:
309+
# If we're at a "\r" read one # extra character # (which might
310+
# be a "\n") to get a proper # line ending. If the stream is
311+
# temporarily exhausted we return the wrong line ending.
312+
if data.endswith(u"\r"):
319313
data += self.read(size=1, chars=1)
320-
self.atcr = data.endswith(u"\r")
321314

322315
line += data
323316
lines = line.splitlines(True)
@@ -367,7 +360,6 @@ def reset(self):
367360
"""
368361
self.bytebuffer = ""
369362
self.charbuffer = u""
370-
self.atcr = False
371363

372364
def seek(self, offset, whence=0):
373365
""" Set the input stream's current position.

Misc/NEWS

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -266,6 +266,12 @@ Library
266266
- Bug #1149508: ``textwrap`` now handles hyphenated numbers (eg. "2004-03-05")
267267
correctly.
268268

269+
- Partial fixes for SF bugs #1163244 and #1175396: If a chunk read by
270+
``codecs.StreamReader.readline()`` has a trailing "\r", read one more
271+
character even if the user has passed a size parameter to get a proper
272+
line ending. Remove the special handling of a "\r\n" that has been split
273+
between two lines.
274+
269275

270276
Build
271277
-----

0 commit comments

Comments
 (0)