Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit 3c0d0ba

Browse files
committed
Issue #12319: Support for chunked encoding of HTTP request bodies
When the body object is a file, its size is no longer determined with fstat(), since that can report the wrong result (e.g. reading from a pipe). Instead, determine the size using seek(), or fall back to chunked encoding for unseekable files. Also, change the logic for detecting text files to check for TextIOBase inheritance, rather than inspecting the “mode” attribute, which may not exist (e.g. BytesIO and StringIO). The Content-Length for text files is no longer determined ahead of time, because the original logic could have been wrong depending on the codec and newline translation settings. Patch by Demian Brecht and Rolf Krahl, with a few tweaks by me.
1 parent a790fe7 commit 3c0d0ba

9 files changed

Lines changed: 531 additions & 150 deletions

File tree

Doc/library/http.client.rst

Lines changed: 70 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -219,39 +219,62 @@ HTTPConnection Objects
219219
:class:`HTTPConnection` instances have the following methods:
220220

221221

222-
.. method:: HTTPConnection.request(method, url, body=None, headers={})
222+
.. method:: HTTPConnection.request(method, url, body=None, headers={}, *, \
223+
encode_chunked=False)
223224

224225
This will send a request to the server using the HTTP request
225226
method *method* and the selector *url*.
226227

227228
If *body* is specified, the specified data is sent after the headers are
228-
finished. It may be a string, a :term:`bytes-like object`, an open
229-
:term:`file object`, or an iterable of :term:`bytes-like object`\s. If
230-
*body* is a string, it is encoded as ISO-8859-1, the default for HTTP. If
231-
it is a bytes-like object the bytes are sent as is. If it is a :term:`file
232-
object`, the contents of the file is sent; this file object should support
233-
at least the ``read()`` method. If the file object has a ``mode``
234-
attribute, the data returned by the ``read()`` method will be encoded as
235-
ISO-8859-1 unless the ``mode`` attribute contains the substring ``b``,
236-
otherwise the data returned by ``read()`` is sent as is. If *body* is an
237-
iterable, the elements of the iterable are sent as is until the iterable is
238-
exhausted.
239-
240-
The *headers* argument should be a mapping of extra HTTP
241-
headers to send with the request.
242-
243-
If *headers* does not contain a Content-Length item, one is added
244-
automatically if possible. If *body* is ``None``, the Content-Length header
245-
is set to ``0`` for methods that expect a body (``PUT``, ``POST``, and
246-
``PATCH``). If *body* is a string or bytes object, the Content-Length
247-
header is set to its length. If *body* is a :term:`file object` and it
248-
works to call :func:`~os.fstat` on the result of its ``fileno()`` method,
249-
then the Content-Length header is set to the ``st_size`` reported by the
250-
``fstat`` call. Otherwise no Content-Length header is added.
229+
finished. It may be a :class:`str`, a :term:`bytes-like object`, an
230+
open :term:`file object`, or an iterable of :class:`bytes`. If *body*
231+
is a string, it is encoded as ISO-8859-1, the default for HTTP. If it
232+
is a bytes-like object, the bytes are sent as is. If it is a :term:`file
233+
object`, the contents of the file is sent; this file object should
234+
support at least the ``read()`` method. If the file object is an
235+
instance of :class:`io.TextIOBase`, the data returned by the ``read()``
236+
method will be encoded as ISO-8859-1, otherwise the data returned by
237+
``read()`` is sent as is. If *body* is an iterable, the elements of the
238+
iterable are sent as is until the iterable is exhausted.
239+
240+
The *headers* argument should be a mapping of extra HTTP headers to send
241+
with the request.
242+
243+
If *headers* contains neither Content-Length nor Transfer-Encoding, a
244+
Content-Length header will be added automatically if possible. If
245+
*body* is ``None``, the Content-Length header is set to ``0`` for
246+
methods that expect a body (``PUT``, ``POST``, and ``PATCH``). If
247+
*body* is a string or bytes-like object, the Content-Length header is
248+
set to its length. If *body* is a binary :term:`file object`
249+
supporting :meth:`~io.IOBase.seek`, this will be used to determine
250+
its size. Otherwise, the Content-Length header is not added
251+
automatically. In cases where determining the Content-Length up
252+
front is not possible, the body will be chunk-encoded and the
253+
Transfer-Encoding header will automatically be set.
254+
255+
The *encode_chunked* argument is only relevant if Transfer-Encoding is
256+
specified in *headers*. If *encode_chunked* is ``False``, the
257+
HTTPConnection object assumes that all encoding is handled by the
258+
calling code. If it is ``True``, the body will be chunk-encoded.
259+
260+
.. note::
261+
Chunked transfer encoding has been added to the HTTP protocol
262+
version 1.1. Unless the HTTP server is known to handle HTTP 1.1,
263+
the caller must either specify the Content-Length or must use a
264+
body representation whose length can be determined automatically.
251265

252266
.. versionadded:: 3.2
253267
*body* can now be an iterable.
254268

269+
.. versionchanged:: 3.6
270+
If neither Content-Length nor Transfer-Encoding are set in
271+
*headers* and Content-Length cannot be determined, *body* will now
272+
be automatically chunk-encoded. The *encode_chunked* argument
273+
was added.
274+
The Content-Length for binary file objects is determined with seek.
275+
No attempt is made to determine the Content-Length for text file
276+
objects.
277+
255278
.. method:: HTTPConnection.getresponse()
256279

257280
Should be called after a request is sent to get the response from the server.
@@ -336,13 +359,32 @@ also send your request step by step, by using the four functions below.
336359
an argument.
337360

338361

339-
.. method:: HTTPConnection.endheaders(message_body=None)
362+
.. method:: HTTPConnection.endheaders(message_body=None, *, encode_chunked=False)
340363

341364
Send a blank line to the server, signalling the end of the headers. The
342365
optional *message_body* argument can be used to pass a message body
343-
associated with the request. The message body will be sent in the same
344-
packet as the message headers if it is string, otherwise it is sent in a
345-
separate packet.
366+
associated with the request.
367+
368+
If *encode_chunked* is ``True``, the result of each iteration of
369+
*message_body* will be chunk-encoded as specified in :rfc:`7230`,
370+
Section 3.3.1. How the data is encoded is dependent on the type of
371+
*message_body*. If *message_body* implements the :ref:`buffer interface
372+
<bufferobjects>` the encoding will result in a single chunk.
373+
If *message_body* is a :class:`collections.Iterable`, each iteration
374+
of *message_body* will result in a chunk. If *message_body* is a
375+
:term:`file object`, each call to ``.read()`` will result in a chunk.
376+
The method automatically signals the end of the chunk-encoded data
377+
immediately after *message_body*.
378+
379+
.. note:: Due to the chunked encoding specification, empty chunks
380+
yielded by an iterator body will be ignored by the chunk-encoder.
381+
This is to avoid premature termination of the read of the request by
382+
the target server due to malformed encoding.
383+
384+
.. versionadded:: 3.6
385+
Chunked encoding support. The *encode_chunked* parameter was
386+
added.
387+
346388

347389
.. method:: HTTPConnection.send(data)
348390

Doc/library/urllib.request.rst

Lines changed: 37 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -30,18 +30,9 @@ The :mod:`urllib.request` module defines the following functions:
3030
Open the URL *url*, which can be either a string or a
3131
:class:`Request` object.
3232

33-
*data* must be a bytes object specifying additional data to be sent to the
34-
server, or ``None`` if no such data is needed. *data* may also be an
35-
iterable object and in that case Content-Length value must be specified in
36-
the headers. Currently HTTP requests are the only ones that use *data*; the
37-
HTTP request will be a POST instead of a GET when the *data* parameter is
38-
provided.
39-
40-
*data* should be a buffer in the standard
41-
:mimetype:`application/x-www-form-urlencoded` format. The
42-
:func:`urllib.parse.urlencode` function takes a mapping or sequence of
43-
2-tuples and returns an ASCII text string in this format. It should
44-
be encoded to bytes before being used as the *data* parameter.
33+
*data* must be an object specifying additional data to be sent to the
34+
server, or ``None`` if no such data is needed. See :class:`Request`
35+
for details.
4536

4637
urllib.request module uses HTTP/1.1 and includes ``Connection:close`` header
4738
in its HTTP requests.
@@ -192,14 +183,22 @@ The following classes are provided:
192183

193184
*url* should be a string containing a valid URL.
194185

195-
*data* must be a bytes object specifying additional data to send to the
196-
server, or ``None`` if no such data is needed. Currently HTTP requests are
197-
the only ones that use *data*; the HTTP request will be a POST instead of a
198-
GET when the *data* parameter is provided. *data* should be a buffer in the
199-
standard :mimetype:`application/x-www-form-urlencoded` format.
200-
The :func:`urllib.parse.urlencode` function takes a mapping or sequence of
201-
2-tuples and returns an ASCII string in this format. It should be
202-
encoded to bytes before being used as the *data* parameter.
186+
*data* must be an object specifying additional data to send to the
187+
server, or ``None`` if no such data is needed. Currently HTTP
188+
requests are the only ones that use *data*. The supported object
189+
types include bytes, file-like objects, and iterables. If no
190+
``Content-Length`` header has been provided, :class:`HTTPHandler` will
191+
try to determine the length of *data* and set this header accordingly.
192+
If this fails, ``Transfer-Encoding: chunked`` as specified in
193+
:rfc:`7230`, Section 3.3.1 will be used to send the data. See
194+
:meth:`http.client.HTTPConnection.request` for details on the
195+
supported object types and on how the content length is determined.
196+
197+
For an HTTP POST request method, *data* should be a buffer in the
198+
standard :mimetype:`application/x-www-form-urlencoded` format. The
199+
:func:`urllib.parse.urlencode` function takes a mapping or sequence
200+
of 2-tuples and returns an ASCII string in this format. It should
201+
be encoded to bytes before being used as the *data* parameter.
203202

204203
*headers* should be a dictionary, and will be treated as if
205204
:meth:`add_header` was called with each key and value as arguments.
@@ -211,8 +210,10 @@ The following classes are provided:
211210
:mod:`urllib`'s default user agent string is
212211
``"Python-urllib/2.6"`` (on Python 2.6).
213212

214-
An example of using ``Content-Type`` header with *data* argument would be
215-
sending a dictionary like ``{"Content-Type": "application/x-www-form-urlencoded"}``.
213+
An appropriate ``Content-Type`` header should be included if the *data*
214+
argument is present. If this header has not been provided and *data*
215+
is not None, ``Content-Type: application/x-www-form-urlencoded`` will
216+
be added as a default.
216217

217218
The final two arguments are only of interest for correct handling
218219
of third-party HTTP cookies:
@@ -235,15 +236,28 @@ The following classes are provided:
235236
*method* should be a string that indicates the HTTP request method that
236237
will be used (e.g. ``'HEAD'``). If provided, its value is stored in the
237238
:attr:`~Request.method` attribute and is used by :meth:`get_method()`.
238-
Subclasses may indicate a default method by setting the
239+
The default is ``'GET'`` if *data* is ``None`` or ``'POST'`` otherwise.
240+
Subclasses may indicate a different default method by setting the
239241
:attr:`~Request.method` attribute in the class itself.
240242

243+
.. note::
244+
The request will not work as expected if the data object is unable
245+
to deliver its content more than once (e.g. a file or an iterable
246+
that can produce the content only once) and the request is retried
247+
for HTTP redirects or authentication. The *data* is sent to the
248+
HTTP server right away after the headers. There is no support for
249+
a 100-continue expectation in the library.
250+
241251
.. versionchanged:: 3.3
242252
:attr:`Request.method` argument is added to the Request class.
243253

244254
.. versionchanged:: 3.4
245255
Default :attr:`Request.method` may be indicated at the class level.
246256

257+
.. versionchanged:: 3.6
258+
Do not raise an error if the ``Content-Length`` has not been
259+
provided and could not be determined. Fall back to use chunked
260+
transfer encoding instead.
247261

248262
.. class:: OpenerDirector()
249263

Doc/whatsnew/3.6.rst

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -324,6 +324,15 @@ exceptions: see :func:`faulthandler.enable`. (Contributed by Victor Stinner in
324324
:issue:`23848`.)
325325

326326

327+
http.client
328+
-----------
329+
330+
:meth:`HTTPConnection.request() <http.client.HTTPConnection.request>` and
331+
:meth:`~http.client.HTTPConnection.endheaders` both now support
332+
chunked encoding request bodies.
333+
(Contributed by Demian Brecht and Rolf Krahl in :issue:`12319`.)
334+
335+
327336
idlelib and IDLE
328337
----------------
329338

@@ -500,6 +509,16 @@ The :class:`~unittest.mock.Mock` class has the following improvements:
500509
(Contributed by Amit Saha in :issue:`26323`.)
501510

502511

512+
urllib.request
513+
--------------
514+
515+
If a HTTP request has a non-empty body but no Content-Length header
516+
and the content length cannot be determined up front, rather than
517+
throwing an error, :class:`~urllib.request.AbstractHTTPHandler` now
518+
falls back to use chunked transfer encoding.
519+
(Contributed by Demian Brecht and Rolf Krahl in :issue:`12319`.)
520+
521+
503522
urllib.robotparser
504523
------------------
505524

0 commit comments

Comments
 (0)