archive

A command line for the Internet Archive. archive searches millions of items, reads metadata, downloads and verifies files, travels through the Wayback Machine, and uploads to your own items. One pure-Go binary, no credentials needed for public data.

Install • Commands • Usage • Credentials

It talks to the public Internet Archive APIs over HTTPS: the Metadata API, the Advanced Search (Solr) endpoint, the CDX Wayback server, and the S3-like IAS3 upload interface. Every request is paced, retried on transient failures, and cached on disk. No login is needed for anything read-only.

archive is an independent tool. It is not affiliated with or endorsed by the Internet Archive.

Install

go install github.com/tamnd/archive-cli/cmd/archive@latest

Or grab a prebuilt binary, a Linux package (deb/rpm/apk), or a container image from the releases:

brew install tamnd/tap/archive
docker run --rm ghcr.io/tamnd/archive:latest search 'collection:nasa' -n 5

Shell completion is built in: archive completion bash|zsh|fish|powershell.

Commands

Command	Does
`archive search <query>`	search the Solr index; any Lucene query, `--all` for cursor-based export
`archive item <identifier>`	a friendly summary of an item
`archive metadata <identifier> [subpath]`	the raw Metadata API document, or one field
`archive files <identifier>`	files in an item; `--format`, `--glob` to filter
`archive download <identifier> [files...]`	download and md5-verify; `--workers`, `-d` dir
`archive upload <identifier> <file...>`	upload into an item over IAS3; `--meta`
`archive delete <identifier> <file...>`	delete files from an item over IAS3
`archive views <identifier...>`	view statistics for one or more items
`archive tasks <identifier>`	catalog and derive task history of an item
`archive wayback available <url>`	closest archived snapshot of a URL
`archive wayback list <url>`	capture history from the CDX server
`archive wayback get <url>`	fetch the content of a snapshot; `--text`, `-t` timestamp
`archive wayback save <url>`	trigger a fresh Save Page Now capture
`archive open <identifier\|url>`	open the details or Wayback URL in the browser
`archive configure`	store IAS3 credentials
`archive whoami`	show configured credentials
`archive config`	show resolved configuration and data paths
`archive cache path\|info\|clear`	inspect or clear the on-disk cache
`archive version`	print version information

Full reference and guides live at archive-cli.tamnd.com.

Usage

archive search 'collection:nasa' -n 10              # find items
archive item nasa                                   # item summary
archive metadata nasa metadata/title                # one metadata field
archive files nasa --format JPEG -o url             # file listing as URLs
archive download nasa --format JPEG -d .            # download and verify
archive views nasa                                  # view statistics
archive wayback get https://example.com -t 2010     # a page as it was in 2010

Records come out as a table (the default on a terminal), JSON, JSONL, CSV, TSV, url, or raw:

archive search 'collection:nasa' --fields identifier,title,downloads -o table
archive search 'collection:nasa' --fields identifier,downloads -o csv
archive search 'collection:nasa' --fields identifier -o raw | xargs -n1 archive item
archive files nasa --format JPEG -o url | head -20
archive wayback list https://archive.org -n 50 -o jsonl | jq .timestamp

Export a large result set with the cursor-based Scraping API:

archive search 'subject:jazz mediatype:audio' --all --fields identifier,title -o jsonl > jazz.jsonl

Global flags

-o, --output    table|json|jsonl|csv|tsv|url|raw   (auto: table on a TTY, jsonl when piped)
    --fields    comma-separated columns to include
    --no-header omit the header row in table/csv/tsv
    --template  Go text/template applied per record
-n, --limit     max records (0 = unlimited)
-q, --quiet     suppress progress output
    --color     auto|always|never
    --rate      min spacing between requests (default 250ms)
    --timeout   per-request timeout (default 2m)
    --retries   retry attempts on 429/5xx (default 5)
-j, --workers   concurrency for downloads (default 8)
    --no-cache  bypass the on-disk cache
    --dry-run   print actions without performing them

Credentials

Reading public data needs no account. To upload, delete, or read a private task queue, get an IAS3 key pair from archive.org/account/s3.php and store it:

archive configure    # prompts for access and secret keys, writes ~/.config/archive/credentials
archive whoami       # verify what is configured

Credentials resolve in order: --access/--secret flags, then ARCHIVE_ACCESS_KEY/ARCHIVE_SECRET_KEY environment variables (or IA_* aliases), then the credentials file.

Exit codes

0  success
1  error
2  usage error
3  no results
4  authentication required or failed
5  not found

Development

cmd/archive/    thin main entry point
cli/            cobra commands and output rendering
ia/             HTTP client, API calls, and models
docs/           documentation site (Hugo, tago-doks theme)

make build   # ./bin/archive
make test    # go test ./...
make vet     # go vet ./...
make fmt     # gofmt -w -s .

Requires Go 1.23+.

Releasing

Push a version tag and GitHub Actions runs GoReleaser, which builds archives, Linux packages, a multi-arch GHCR image, checksums, an SBOM, a cosign signature, and Homebrew and Scoop entries:

git tag -a v0.2.0 -m "v0.2.0"
git push --tags

The image tag carries no v prefix (ghcr.io/tamnd/archive:0.2.0).

License

Apache-2.0. See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.github/workflows		.github/workflows
cli		cli
cmd/archive		cmd/archive
docs		docs
ia		ia
scripts		scripts
.gitignore		.gitignore
.gitmodules		.gitmodules
.goreleaser.yaml		.goreleaser.yaml
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
go.mod		go.mod
go.sum		go.sum

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

archive

Install

Commands

Usage

Global flags

Credentials

Exit codes

Development

Releasing

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

archive

Install

Commands

Usage

Global flags

Credentials

Exit codes

Development

Releasing

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages