Web Crawler (boot.dev)

A small async web crawler that:

Crawls pages on a single domain
Respects robots.txt (best-effort)
Avoids re-crawling pages via URL normalization + canonicalization
Limits concurrency and total pages crawled (max_concurrency, max_pages)
Exports results to:
- report.csv (page content + internal/external link stats)
- site.dot (Graphviz DOT graph of internal links)

Requirements

git clone <YOUR_REPO_URL> cd web-crawler-bd uv sync

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.gitignore		.gitignore
README.md		README.md
crawl.py		crawl.py
csv_report.py		csv_report.py
dot_report.py		dot_report.py
main.py		main.py
pyproject.toml		pyproject.toml
test_crawl.py		test_crawl.py
test_csv_report.py		test_csv_report.py
uv.lock		uv.lock