A Python implementation of RFC 3986 including validation and authority parsing. Coming soon: Reference Resolution.
Simply use pip to install rfc3986
like so:
pip install rfc3986
To parse a URI into a convenient named tuple, you can simply:
from rfc3986 import uri_reference example = uri_reference('http://example.com') email = uri_reference('mailto:[email protected]') ssh = uri_reference('ssh://[email protected]:29418/openstack/keystone.git')
With a parsed URI you can access data about the components:
print(example.scheme) # => http print(email.path) # => [email protected] print(ssh.userinfo) # => user print(ssh.host) # => git.openstack.org print(ssh.port) # => 29418
It can also parse URIs with unicode present:
uni = uri_reference(b'http://httpbin.org/get?utf8=\xe2\x98\x83') # ☃ print(uni.query) # utf8=%E2%98%83
With a parsed URI you can also validate it:
if ssh.is_valid(): subprocess.call(['git', 'clone', ssh.unsplit()])
You can also take a parsed URI and normalize it:
mangled = uri_reference('hTTp://exAMPLe.COM') print(mangled.scheme) # => hTTp print(mangled.authority) # => exAMPLe.COM normal = mangled.normalize() print(normal.scheme) # => http print(mangled.authority) # => example.com
But these two URIs are (functionally) equivalent:
if normal == mangled: webbrowser.open(normal.unsplit())
Your paths, queries, and fragments are safe with us though:
mangled = uri_reference('hTTp://exAMPLe.COM/Some/reallY/biZZare/pAth') normal = mangled.normalize() assert normal == 'hTTp://exAMPLe.COM/Some/reallY/biZZare/pAth' assert normal == 'http://example.com/Some/reallY/biZZare/pAth' assert normal != 'http://example.com/some/really/bizzare/path'
If you do not actually need a real reference object and just want to normalize your URI:
from rfc3986 import normalize_uri assert (normalize_uri('hTTp://exAMPLe.COM/Some/reallY/biZZare/pAth') == 'http://example.com/Some/reallY/biZZare/pAth')
You can also very simply validate a URI:
from rfc3986 import is_valid_uri assert is_valid_uri('hTTp://exAMPLe.COM/Some/reallY/biZZare/pAth')
You can validate that a particular string is a valid URI and require independent components:
from rfc3986 import is_valid_uri assert is_valid_uri('http://localhost:8774/v2/resource', require_scheme=True, require_authority=True, require_path=True) # Assert that a mailto URI is invalid if you require an authority # component assert is_valid_uri('mailto:[email protected]', require_authority=True) is False
If you have an instance of a URIReference
, you can pass the same arguments
to URIReference#is_valid
, e.g.,
from rfc3986 import uri_reference http = uri_reference('http://localhost:8774/v2/resource') assert uri.is_valid(require_scheme=True, require_authority=True, require_path=True) # Assert that a mailto URI is invalid if you require an authority # component mailto = uri_reference('mailto:[email protected]') assert uri.is_valid(require_authority=True) is False
-
This is a direct competitor to this library, with extra features, licensed under the GPL.
-
This can parse URIs in the manner of RFC 3986 but provides no validation and only recently added Python 3 support.
Standard library's urlparse/urllib.parse
The functions in these libraries can only split a URI (valid or not) and provide no validation.
This project follows and enforces the Python Software Foundation's Code of Conduct.
If you would like to contribute but do not have a bug or feature in mind, feel free to email Ian and find out how you can help.