Tool for archiving and exploring.
Built out of a need to get out of walled gardens of Pinterest and (much less walled) Pinboard.
Alpha quality at best.
Archivist is built out of three interconnected parts (each package has it's own readme file):
archivist-cli- command line tool for configuration, fetching and querying the dataarchivist-ui- Electron UI built on top ofarchivist-cliarchivist-*- various crawlers, "official" ones:archivist-pinboard- API-based Pinboard archiving: screenshot and freeze-dry of the original websitearchivist-pinterest-crawl- slowly crawl through Pinterest and archive pin image
npm install -g archivist-cli- to archive pinboard:
npm install -g archivist-pinboard - to archive pinterest:
npm install -g archivist-pinterest-crawl
archivist-ui is not on npm (it should probably be a downloadable dmg, but I didn't get around to it), so to generate the .app and put it in /Applications/ yourself:
- clone this repo
cd archivist-ui && ./scripts/install.sh
$ archivist configConfig is a JSON object of shape:
{
"crawler-1": CRAWLER_1_OPTIONS,
"crawler-2": CRAWLER_2_OPTIONS,
...
}Example config (assuming Pinboard and Pinterest backup):
{
"archivist-pinterest-crawl": {
"loginMethod": "cookies",
"profile": "szymon_k"
},
"archivist-pinboard": {
"apiKey": "API_KEY_FOR_PINBOARD"
}
}archivist-pinterest-crawl supports two login methods: "cookies" (which uses cookies from local Google Chrome installation) or "password" which requires plaintext username and password:
"archivist-pinterest-crawl": {
"loginMethod": "password",
"username": "PINTEREST_USERNAME",
"password": "PINTEREST_PASSWORD",
"profile": "szymon_k"
},archivist-pinboard requires API Token from https://pinboard.in/settings/password to run properly.
- backup data:
archivist fetch(might take a long time depending on the size of the archive) - list everything:
archivist query- find everything about keyboards:
archivist query keyboard queryby default returnsndjson, normal JSON can be outputed using--json
- find everything about keyboards:
- kollektor - no-ui self-hosted Pinterest clone
- gwern on archiving URLs
- freeze-dry implementation notes