Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit 9f6cbe0

Browse files
committed
Merged revisions 88528 via svnmerge from
svn+ssh://[email protected]/python/branches/py3k ........ r88528 | lars.gustaebel | 2011-02-23 12:42:22 +0100 (Wed, 23 Feb 2011) | 16 lines Issue #11224: Improved sparse file read support (r85916) introduced a regression in _FileInFile which is used in file-like objects returned by TarFile.extractfile(). The inefficient design of the _FileInFile.read() method causes various dramatic side-effects and errors: - The data segment of a file member is read completely into memory every(!) time a small block is accessed. This is not only slow but may cause unexpected MemoryErrors with very large files. - Reading members from compressed tar archives is even slower because of the excessive backwards seeking which is done when the same data segment is read over and over again. - As a backwards seek on a TarFile opened in stream mode is not possible, using extractfile() fails with a StreamError. ........
1 parent dcb29c9 commit 9f6cbe0

3 files changed

Lines changed: 22 additions & 3 deletions

File tree

Lib/tarfile.py

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -760,9 +760,8 @@ def read(self, size=None):
760760
self.map_index = 0
761761
length = min(size, stop - self.position)
762762
if data:
763-
self.fileobj.seek(offset)
764-
block = self.fileobj.read(stop - start)
765-
buf += block[self.position - start:self.position + length]
763+
self.fileobj.seek(offset + (self.position - start))
764+
buf += self.fileobj.read(length)
766765
else:
767766
buf += NUL * length
768767
size -= length

Lib/test/test_tarfile.py

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -419,6 +419,22 @@ class StreamReadTest(CommonReadTest):
419419

420420
mode="r|"
421421

422+
def test_read_through(self):
423+
# Issue #11224: A poorly designed _FileInFile.read() method
424+
# caused seeking errors with stream tar files.
425+
for tarinfo in self.tar:
426+
if not tarinfo.isreg():
427+
continue
428+
fobj = self.tar.extractfile(tarinfo)
429+
while True:
430+
try:
431+
buf = fobj.read(512)
432+
except tarfile.StreamError:
433+
self.fail("simple read-through using TarFile.extractfile() failed")
434+
if not buf:
435+
break
436+
fobj.close()
437+
422438
def test_fileobj_regular_file(self):
423439
tarinfo = self.tar.next() # get "regtype" (can't use getmember)
424440
fobj = self.tar.extractfile(tarinfo)

Misc/NEWS

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,10 @@ Core and Builtins
1515
Library
1616
-------
1717

18+
- Issue #11224: Fixed a regression in tarfile that affected the file-like
19+
objects returned by TarFile.extractfile() regarding performance, memory
20+
consumption and failures with the stream interface.
21+
1822
- Issue #11074: Make 'tokenize' so it can be reloaded.
1923

2024
- Issue #4681: Allow mmap() to work on file sizes and offsets larger than

0 commit comments

Comments
 (0)