Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit a510652

Browse files
committed
Add urlparse and urlsplit security warnings.
The added section describing the situation is longer than I might want, but being more brief just leaves open questions. This is a lighter worded version of my original text proposed in https://discuss.python.org/t/how-to-word-a-warning-about-security-uses-in-urllib-parse-docs/26399
1 parent 0f7f9ea commit a510652

1 file changed

Lines changed: 38 additions & 0 deletions

File tree

Doc/library/urllib.parse.rst

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -159,6 +159,10 @@ or on combining URL components into a URL string.
159159
ParseResult(scheme='http', netloc='www.cwi.nl:80', path='/%7Eguido/Python.html',
160160
params='', query='', fragment='')
161161

162+
.. warning::
163+
164+
The :func:`urlparse` API does not perform validation. See :ref:`URL
165+
parsing security <url-parsing-security>` for details.
162166

163167
.. versionchanged:: 3.2
164168
Added IPv6 URL parsing capabilities.
@@ -328,6 +332,11 @@ or on combining URL components into a URL string.
328332
control and space characters are stripped from the URL. ``\n``,
329333
``\r`` and tab ``\t`` characters are removed from the URL at any position.
330334

335+
.. warning::
336+
337+
The :func:`urlsplit` API does not perform validation. See :ref:`URL
338+
parsing security <url-parsing-security>` for details.
339+
331340
.. versionchanged:: 3.6
332341
Out-of-range port numbers now raise :exc:`ValueError`, instead of
333342
returning :const:`None`.
@@ -418,6 +427,35 @@ or on combining URL components into a URL string.
418427
or ``scheme://host/path``). If *url* is not a wrapped URL, it is returned
419428
without changes.
420429

430+
.. _url-parsing-security:
431+
432+
URL parsing security
433+
--------------------
434+
435+
The :func:`urlsplit` and :func:`urlparse` APIs do not perform **validation**
436+
of inputs. They may not raise errors on inputs that other applications
437+
consider invalid. They may accept and pass through some inputs that might
438+
not be considered URLs elsewhere as unusually split component parts. Their
439+
purpose is for practical functionality rather than purity.
440+
441+
Instead of raising an exception on unusual input, they may instead return
442+
some components as empty ``""`` strings. Or components may contain more than
443+
perhaps they should.
444+
445+
We recommend that users of these APIs where the values may be used anywhere
446+
with security implications code defensively. Do some verification within
447+
your code before trusting a returned component part. Does that ``scheme``
448+
make sense? Is that a sensible ``path``? Is there anything strange about
449+
that ``hostname``? etc.
450+
451+
What constitutes a URL is not universally well defined. Different
452+
applications have different needs and desired constraints. For instance the
453+
living `WHATWG spec`_ describes what user facing web clients such as a web
454+
browser require. While :rfc:`3986` is more general. These functions
455+
incorporate some aspects of both, but cannot be claimed compliant with
456+
either. Our APIs and code with expectations on their behaviors predate both
457+
standards. We attempt to maintain backwards compatibility.
458+
421459
.. _parsing-ascii-encoded-bytes:
422460

423461
Parsing ASCII Encoded Bytes

0 commit comments

Comments
 (0)