Keystone is a web client for the ARCH (Archives Research Compute Hub) job server.
Note that the following features are only available in the hosted version at: https://arch.archive-it.org
- Google Colab integration
- Dataset publication to archive.org
make build-images
docker compose up
3. Surf on over to http://localhost:12342
Log in as one of the three user types that dev/entrypoint.py created for you:
- Superuser: username:
systempassword:password - Admin: username:
adminpassword:password - Normal: username:
testpassword:password
The build-images Make target will create a local arch-shared subdirectory that will be mounted
within both the running Keystone and ARCH containers to serve as the storage destination for ARCH outputs,
and as a place to add your own custom collections of WARCs for analysis.
The arch-shared directory has the structure:
arch-shared/
├── in
│ └── collections
├── log
└── out
├── custom-collections
└── datasets
These subdirectories are utilized as follows:
log- ARCH job logs
out/custom-collections- ARCH Custom Collection output files
out/datasets- ARCH Dataset output files
in/collections- A place to make your own WARCs available to ARCH as inputs - see "Analyze Your WARCs" below
For each group of WARCs that you'd like to analyze as a collection:
- Create a new subdirectory within
arch-shared/in/collectionswith a descriptive kebab-case style name likemy-test-collectionand copy your*.warc.gzinto it, e.g.
arch-shared/
└── in
└── collections
└── my-test-collection
└── ARCHIVEIT-22994-CRAWL_SELECTED_SEEDS-JOB1965703-SEED3267421-h3.warc.gz
- Restart both the Keystone and ARCH containers
docker compose restart keystone arch
- Your new collection will now be visibile in Keystone (e.g. as
My Test Collection)