Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit 2dbc6e6

Browse files
committed
Issue #23529: Limit the size of decompressed data when reading from
GzipFile, BZ2File or LZMAFile. This defeats denial of service attacks using compressed bombs (i.e. compressed payloads which decompress to a huge size). Patch by Martin Panter and Nikolaus Rath.
1 parent 2ce11d2 commit 2dbc6e6

11 files changed

Lines changed: 501 additions & 716 deletions

File tree

Doc/library/bz2.rst

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -120,6 +120,10 @@ All of the classes in this module may safely be accessed from multiple threads.
120120
.. versionchanged:: 3.4
121121
The ``'x'`` (exclusive creation) mode was added.
122122

123+
.. versionchanged:: 3.5
124+
The :meth:`~io.BufferedIOBase.read` method now accepts an argument of
125+
``None``.
126+
123127

124128
Incremental (de)compression
125129
---------------------------

Doc/library/gzip.rst

Lines changed: 20 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -90,13 +90,9 @@ The module defines the following items:
9090
is no compression. The default is ``9``.
9191

9292
The *mtime* argument is an optional numeric timestamp to be written to
93-
the stream when compressing. All :program:`gzip` compressed streams are
94-
required to contain a timestamp. If omitted or ``None``, the current
95-
time is used. This module ignores the timestamp when decompressing;
96-
however, some programs, such as :program:`gunzip`\ , make use of it.
97-
The format of the timestamp is the same as that of the return value of
98-
``time.time()`` and of the ``st_mtime`` attribute of the object returned
99-
by ``os.stat()``.
93+
the last modification time field in the stream when compressing. It
94+
should only be provided in compression mode. If omitted or ``None``, the
95+
current time is used. See the :attr:`mtime` attribute for more details.
10096

10197
Calling a :class:`GzipFile` object's :meth:`close` method does not close
10298
*fileobj*, since you might wish to append more material after the compressed
@@ -108,9 +104,9 @@ The module defines the following items:
108104
including iteration and the :keyword:`with` statement. Only the
109105
:meth:`truncate` method isn't implemented.
110106

111-
:class:`GzipFile` also provides the following method:
107+
:class:`GzipFile` also provides the following method and attribute:
112108

113-
.. method:: peek([n])
109+
.. method:: peek(n)
114110

115111
Read *n* uncompressed bytes without advancing the file position.
116112
At most one single read on the compressed stream is done to satisfy
@@ -124,9 +120,21 @@ The module defines the following items:
124120

125121
.. versionadded:: 3.2
126122

123+
.. attribute:: mtime
124+
125+
When decompressing, the value of the last modification time field in
126+
the most recently read header may be read from this attribute, as an
127+
integer. The initial value before reading any headers is ``None``.
128+
129+
All :program:`gzip` compressed streams are required to contain this
130+
timestamp field. Some programs, such as :program:`gunzip`\ , make use
131+
of the timestamp. The format is the same as the return value of
132+
:func:`time.time` and the :attr:`~os.stat_result.st_mtime` attribute of
133+
the object returned by :func:`os.stat`.
134+
127135
.. versionchanged:: 3.1
128136
Support for the :keyword:`with` statement was added, along with the
129-
*mtime* argument.
137+
*mtime* constructor argument and :attr:`mtime` attribute.
130138

131139
.. versionchanged:: 3.2
132140
Support for zero-padded and unseekable files was added.
@@ -140,6 +148,8 @@ The module defines the following items:
140148
.. versionchanged:: 3.5
141149
Added support for writing arbitrary
142150
:term:`bytes-like objects <bytes-like object>`.
151+
The :meth:`~io.BufferedIOBase.read` method now accepts an argument of
152+
``None``.
143153

144154

145155
.. function:: compress(data, compresslevel=9)

Doc/library/lzma.rst

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -110,6 +110,10 @@ Reading and writing compressed files
110110
.. versionchanged:: 3.4
111111
Added support for the ``"x"`` and ``"xb"`` modes.
112112

113+
.. versionchanged:: 3.5
114+
The :meth:`~io.BufferedIOBase.read` method now accepts an argument of
115+
``None``.
116+
113117

114118
Compressing and decompressing data in memory
115119
--------------------------------------------

Lib/_compression.py

Lines changed: 152 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,152 @@
1+
"""Internal classes used by the gzip, lzma and bz2 modules"""
2+
3+
import io
4+
5+
6+
BUFFER_SIZE = io.DEFAULT_BUFFER_SIZE # Compressed data read chunk size
7+
8+
9+
class BaseStream(io.BufferedIOBase):
10+
"""Mode-checking helper functions."""
11+
12+
def _check_not_closed(self):
13+
if self.closed:
14+
raise ValueError("I/O operation on closed file")
15+
16+
def _check_can_read(self):
17+
if not self.readable():
18+
raise io.UnsupportedOperation("File not open for reading")
19+
20+
def _check_can_write(self):
21+
if not self.writable():
22+
raise io.UnsupportedOperation("File not open for writing")
23+
24+
def _check_can_seek(self):
25+
if not self.readable():
26+
raise io.UnsupportedOperation("Seeking is only supported "
27+
"on files open for reading")
28+
if not self.seekable():
29+
raise io.UnsupportedOperation("The underlying file object "
30+
"does not support seeking")
31+
32+
33+
class DecompressReader(io.RawIOBase):
34+
"""Adapts the decompressor API to a RawIOBase reader API"""
35+
36+
def readable(self):
37+
return True
38+
39+
def __init__(self, fp, decomp_factory, trailing_error=(), **decomp_args):
40+
self._fp = fp
41+
self._eof = False
42+
self._pos = 0 # Current offset in decompressed stream
43+
44+
# Set to size of decompressed stream once it is known, for SEEK_END
45+
self._size = -1
46+
47+
# Save the decompressor factory and arguments.
48+
# If the file contains multiple compressed streams, each
49+
# stream will need a separate decompressor object. A new decompressor
50+
# object is also needed when implementing a backwards seek().
51+
self._decomp_factory = decomp_factory
52+
self._decomp_args = decomp_args
53+
self._decompressor = self._decomp_factory(**self._decomp_args)
54+
55+
# Exception class to catch from decompressor signifying invalid
56+
# trailing data to ignore
57+
self._trailing_error = trailing_error
58+
59+
def close(self):
60+
self._decompressor = None
61+
return super().close()
62+
63+
def seekable(self):
64+
return self._fp.seekable()
65+
66+
def readinto(self, b):
67+
with memoryview(b) as view, view.cast("B") as byte_view:
68+
data = self.read(len(byte_view))
69+
byte_view[:len(data)] = data
70+
return len(data)
71+
72+
def read(self, size=-1):
73+
if size < 0:
74+
return self.readall()
75+
76+
if not size or self._eof:
77+
return b""
78+
data = None # Default if EOF is encountered
79+
# Depending on the input data, our call to the decompressor may not
80+
# return any data. In this case, try again after reading another block.
81+
while True:
82+
if self._decompressor.eof:
83+
rawblock = (self._decompressor.unused_data or
84+
self._fp.read(BUFFER_SIZE))
85+
if not rawblock:
86+
break
87+
# Continue to next stream.
88+
self._decompressor = self._decomp_factory(
89+
**self._decomp_args)
90+
try:
91+
data = self._decompressor.decompress(rawblock, size)
92+
except self._trailing_error:
93+
# Trailing data isn't a valid compressed stream; ignore it.
94+
break
95+
else:
96+
if self._decompressor.needs_input:
97+
rawblock = self._fp.read(BUFFER_SIZE)
98+
if not rawblock:
99+
raise EOFError("Compressed file ended before the "
100+
"end-of-stream marker was reached")
101+
else:
102+
rawblock = b""
103+
data = self._decompressor.decompress(rawblock, size)
104+
if data:
105+
break
106+
if not data:
107+
self._eof = True
108+
self._size = self._pos
109+
return b""
110+
self._pos += len(data)
111+
return data
112+
113+
# Rewind the file to the beginning of the data stream.
114+
def _rewind(self):
115+
self._fp.seek(0)
116+
self._eof = False
117+
self._pos = 0
118+
self._decompressor = self._decomp_factory(**self._decomp_args)
119+
120+
def seek(self, offset, whence=io.SEEK_SET):
121+
# Recalculate offset as an absolute file position.
122+
if whence == io.SEEK_SET:
123+
pass
124+
elif whence == io.SEEK_CUR:
125+
offset = self._pos + offset
126+
elif whence == io.SEEK_END:
127+
# Seeking relative to EOF - we need to know the file's size.
128+
if self._size < 0:
129+
while self.read(io.DEFAULT_BUFFER_SIZE):
130+
pass
131+
offset = self._size + offset
132+
else:
133+
raise ValueError("Invalid value for whence: {}".format(whence))
134+
135+
# Make it so that offset is the number of bytes to skip forward.
136+
if offset < self._pos:
137+
self._rewind()
138+
else:
139+
offset -= self._pos
140+
141+
# Read and discard data until we reach the desired position.
142+
while offset > 0:
143+
data = self.read(min(io.DEFAULT_BUFFER_SIZE, offset))
144+
if not data:
145+
break
146+
offset -= len(data)
147+
148+
return self._pos
149+
150+
def tell(self):
151+
"""Return the current file position."""
152+
return self._pos

0 commit comments

Comments
 (0)