A toolkit for CDX indices such as Common Crawl and the Internet Archive's Wayback Machine
-
Updated
Nov 21, 2025 - Python
A toolkit for CDX indices such as Common Crawl and the Internet Archive's Wayback Machine
Summarize web archive capture index (CDX) files.
Python tools to retrieve text from CommonCrawl WARC files based on cdx index.
Enables Mac roundtrip editing for ChemDraw scheme-contaning PowerPoints made in Windows
View cdx and warc files, caching them locally as needed
The solution to extend the deadline for the virtual machines on CDX.
Add a description, image, and links to the cdx topic page so that developers can more easily learn about it.
To associate your repository with the cdx topic, visit your repo's landing page and select "manage topics."