-
Notifications
You must be signed in to change notification settings - Fork 56
Open
Description
OpenSpending (and GIFT) are going to pioneer the DataHub "next" approach https://github.com/datopian/datahub-next
- Datasets are stored into Git(Hub) + cloud storage - see e.g. https://github.com/gift-data
- Metadata in GitHub as Data Package
- Data either directly in GitHub (if small) or Cloud of choice with pointer in GitHub using git-lfs+giftless
- Publisher UI: can edit directly in GitHub and for Data Package generation we can hook up datapub UI to GitHub
- Custom DataPub flow oriented to Fiscal Data Package spec
- Login/auth integration to GitHub so files can be loaded and saved from GitHub
- MetaStore: either a single file if small set of dataests (e.g. GIFT) or we build a lightweight MetaStore with e.g. ElasticSearch
- Portal: use Portal.js with metadata etc coming from MetaStore
- Admin and user management: use GitHub UI
graph TD
github["Github for metadata and (small) data"]
blob[Blob Storage for larger data]
datapub[DataPub based Publisher app with auth via GitHub]
metastore[MetaStore - may just be a single file]
portal[Portal built in Portal.JS]
giftless[Giftless - Storage Access]
user[User]
github -.via git lfs.-> blob
portal --> metastore
metastore --> github
portal -.-> github
datapub -.-> github
datapub -.sign blob storage upload.-> giftless
datapub -.-> blob
user --view--> portal
user -.direct download.-> blob
user -.edit.-> datapub
Layout of a dataset in new setup (in GitHub)
- each OpenSpending dataset has a repo on github in an org related to OpenSpending
- each dataset is a Frictionless Dataset/Package with a
datapackage.json
which is a Frictionless Tabular Dataset/Package, and possibly (esp GIFT ones) a Fiscal Data Package- Data if small could be stored on github
- For larger data we store to cloud
- Either S3 or GCP (prob GCS)
- If on cloud we store with giftless 🎁 with proper lfs info
- README (?)
Each dataset looks like this:
README.md
datapackage.json
# data files - either in root directory or could be in subdirectories e.g. /data/
my-file.csv
my-file-2.csv
Tasks
- Overview of new system
- Dataset layout
- MetaStore setup
- ...
Metadata
Metadata
Assignees
Labels
No labels