@@ -159,6 +159,10 @@ or on combining URL components into a URL string.
159159 ParseResult(scheme='http', netloc='www.cwi.nl:80', path='/%7Eguido/Python.html',
160160 params='', query='', fragment='')
161161
162+ .. warning ::
163+
164+ The :func: `urlparse ` API does not perform validation. See :ref: `URL
165+ parsing security <url-parsing-security>` for details.
162166
163167 .. versionchanged :: 3.2
164168 Added IPv6 URL parsing capabilities.
@@ -328,6 +332,11 @@ or on combining URL components into a URL string.
328332 control and space characters are stripped from the URL. ``\n ``,
329333 ``\r `` and tab ``\t `` characters are removed from the URL at any position.
330334
335+ .. warning ::
336+
337+ The :func: `urlsplit ` API does not perform validation. See :ref: `URL
338+ parsing security <url-parsing-security>` for details.
339+
331340 .. versionchanged :: 3.6
332341 Out-of-range port numbers now raise :exc: `ValueError `, instead of
333342 returning :const: `None `.
@@ -418,6 +427,35 @@ or on combining URL components into a URL string.
418427 or ``scheme://host/path ``). If *url * is not a wrapped URL, it is returned
419428 without changes.
420429
430+ .. _url-parsing-security :
431+
432+ URL parsing security
433+ --------------------
434+
435+ The :func: `urlsplit ` and :func: `urlparse ` APIs do not perform **validation **
436+ of inputs. They may not raise errors on inputs that other applications
437+ consider invalid. They may accept and pass through some inputs that might
438+ not be considered URLs elsewhere as unusually split component parts. Their
439+ purpose is for practical functionality rather than purity.
440+
441+ Instead of raising an exception on unusual input, they may instead return
442+ some components as empty ``"" `` strings. Or components may contain more than
443+ perhaps they should.
444+
445+ We recommend that users of these APIs where the values may be used anywhere
446+ with security implications code defensively. Do some verification within
447+ your code before trusting a returned component part. Does that ``scheme ``
448+ make sense? Is that a sensible ``path ``? Is there anything strange about
449+ that ``hostname ``? etc.
450+
451+ What constitutes a URL is not universally well defined. Different
452+ applications have different needs and desired constraints. For instance the
453+ living `WHATWG spec `_ describes what user facing web clients such as a web
454+ browser require. While :rfc: `3986 ` is more general. These functions
455+ incorporate some aspects of both, but cannot be claimed compliant with
456+ either. Our APIs and code with expectations on their behaviors predate both
457+ standards. We attempt to maintain backwards compatibility.
458+
421459.. _parsing-ascii-encoded-bytes :
422460
423461Parsing ASCII Encoded Bytes
0 commit comments