Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit 56066d2

Browse files
committed
Return complete lines from codec stream readers
even if there is an exception in later lines, resulting in correct line numbers for decoding errors in source code. Fixes #1178484. Will backport to 2.4.
1 parent 6d2b346 commit 56066d2

3 files changed

Lines changed: 26 additions & 4 deletions

File tree

Doc/lib/libcodecs.tex

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -394,7 +394,7 @@ \subsubsection{StreamReader Objects \label{stream-reader-objects}}
394394
be extended with \function{register_error()}.
395395
\end{classdesc}
396396

397-
\begin{methoddesc}{read}{\optional{size\optional{, chars}}}
397+
\begin{methoddesc}{read}{\optional{size\optional{, chars, \optional{firstline}}}}
398398
Decodes data from the stream and returns the resulting object.
399399

400400
\var{chars} indicates the number of characters to read from the
@@ -408,12 +408,16 @@ \subsubsection{StreamReader Objects \label{stream-reader-objects}}
408408
decode as much as possible. \var{size} is intended to prevent having
409409
to decode huge files in one step.
410410

411+
\var{firstline} indicates that it would be sufficient to only return
412+
the first line, if there are decoding errors on later lines.
413+
411414
The method should use a greedy read strategy meaning that it should
412415
read as much data as is allowed within the definition of the encoding
413416
and the given size, e.g. if optional encoding endings or state
414417
markers are available on the stream, these should be read too.
415418

416419
\versionchanged[\var{chars} argument added]{2.4}
420+
\versionchanged[\var{firstline} argument added]{2.4.2}
417421
\end{methoddesc}
418422

419423
\begin{methoddesc}{readline}{\optional{size\optional{, keepends}}}

Lib/codecs.py

Lines changed: 17 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -236,7 +236,7 @@ def __init__(self, stream, errors='strict'):
236236
def decode(self, input, errors='strict'):
237237
raise NotImplementedError
238238

239-
def read(self, size=-1, chars=-1):
239+
def read(self, size=-1, chars=-1, firstline=False):
240240

241241
""" Decodes data from the stream self.stream and returns the
242242
resulting object.
@@ -253,6 +253,11 @@ def read(self, size=-1, chars=-1):
253253
is intended to prevent having to decode huge files in one
254254
step.
255255
256+
If firstline is true, and a UnicodeDecodeError happens
257+
after the first line terminator in the input only the first line
258+
will be returned, the rest of the input will be kept until the
259+
next call to read().
260+
256261
The method should use a greedy read strategy meaning that
257262
it should read as much data as is allowed within the
258263
definition of the encoding and the given size, e.g. if
@@ -275,7 +280,16 @@ def read(self, size=-1, chars=-1):
275280
newdata = self.stream.read(size)
276281
# decode bytes (those remaining from the last call included)
277282
data = self.bytebuffer + newdata
278-
newchars, decodedbytes = self.decode(data, self.errors)
283+
try:
284+
newchars, decodedbytes = self.decode(data, self.errors)
285+
except UnicodeDecodeError, exc:
286+
if firstline:
287+
newchars, decodedbytes = self.decode(data[:exc.start], self.errors)
288+
lines = newchars.splitlines(True)
289+
if len(lines)<=1:
290+
raise
291+
else:
292+
raise
279293
# keep undecoded bytes until the next call
280294
self.bytebuffer = data[decodedbytes:]
281295
# put new characters in the character buffer
@@ -306,7 +320,7 @@ def readline(self, size=None, keepends=True):
306320
line = ""
307321
# If size is given, we call read() only once
308322
while True:
309-
data = self.read(readsize)
323+
data = self.read(readsize, firstline=True)
310324
if data:
311325
# If we're at a "\r" read one extra character (which might
312326
# be a "\n") to get a proper line ending. If the stream is

Misc/NEWS

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -191,6 +191,10 @@ Extension Modules
191191
Library
192192
-------
193193

194+
- Bug #1178484: Return complete lines from codec stream readers
195+
even if there is an exception in later lines, resulting in
196+
correct line numbers for decoding errors in source code.
197+
194198
- Bug #1192315: Disallow negative arguments to clear() in pdb.
195199

196200
- Patch #827386: Support absolute source paths in msvccompiler.py.

0 commit comments

Comments
 (0)