A pure Python implementation of the WHATWG URL Standard that achieves 100% conformance with the Web Platform Tests.
This parser follows the living standard at https://url.spec.whatwg.org/ and correctly handles all 869 URL parsing test cases and all 87 ToASCII domain encoding tests from the official WPT suite.
WPT URL Parsing: 869/869 (100.0%)
ToASCII Domains: 87/87 (100.0%)
pip install -r requirements.txt
The only dependency is idna for internationalized domain name processing.
from url_parser import parse_url
url = parse_url("https://user:[email protected]:8080/path?query#fragment")
print(url.href) # https://user:[email protected]:8080/path?query#fragment
print(url.hostname_str) # example.com
print(url.pathname) # /path
# Relative URL resolution
url = parse_url("/new/path", base="https://example.com/old/path")
print(url.href) # https://example.com/new/pathThe parse_url function returns a URLRecord containing:
scheme URL scheme without colon (e.g., "https")
username Username component
password Password component
host Domain string, IPv4Address, or IPv6Address
port Port number or None for default ports
path List of path segments or string for opaque paths
query Query string without leading ?
fragment Fragment without leading #
Formatted output properties:
href Complete serialized URL
origin URL origin for CORS
protocol Scheme with colon (e.g., "https:")
host_str Host:port (port omitted if default)
hostname_str Host without port
port_str Port as string or empty
pathname Serialized path with leading /
search Query with leading ? or empty
hash Fragment with leading # or empty
The WHATWG URL Standard defines complex parsing behavior that browsers implement. This parser handles:
Special schemes http, https, ftp, ws, wss, file with default ports
IPv4 addresses Decimal, octal (0-prefix), and hex (0x-prefix)
IPv6 addresses Full form, compressed (::), and embedded IPv4
Domain encoding IDNA2008 with UTS46 compatibility processing
Percent encoding Context-specific encode sets per spec
Windows paths Drive letter normalization in file URLs
Opaque paths Non-hierarchical URLs like mailto: and javascript:
Relative resolution Full base URL resolution algorithm
Verify conformance against the Web Platform Tests:
python wpt_runner.py
python toascii_runner.py
Results are saved to the results/ directory.
url_parser/
__init__.py Package exports
parser.py State machine with 21 parsing states
url_record.py URL record data structure
host.py Host parsing, IPv4/IPv6, domain-to-ASCII
encoding.py Percent-encoding utilities
wpt_runner.py WPT conformance test runner
toascii_runner.py ToASCII test runner
wpt/ Web Platform Tests
results/ Test output