Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Design of New OpenSpending System #1490

@rufuspollock

Description

@rufuspollock

OpenSpending (and GIFT) are going to pioneer the DataHub "next" approach https://github.com/datopian/datahub-next

  • Datasets are stored into Git(Hub) + cloud storage - see e.g. https://github.com/gift-data
    • Metadata in GitHub as Data Package
    • Data either directly in GitHub (if small) or Cloud of choice with pointer in GitHub using git-lfs+giftless
  • Publisher UI: can edit directly in GitHub and for Data Package generation we can hook up datapub UI to GitHub
    • Custom DataPub flow oriented to Fiscal Data Package spec
    • Login/auth integration to GitHub so files can be loaded and saved from GitHub
  • MetaStore: either a single file if small set of dataests (e.g. GIFT) or we build a lightweight MetaStore with e.g. ElasticSearch
  • Portal: use Portal.js with metadata etc coming from MetaStore
  • Admin and user management: use GitHub UI
graph TD

github["Github for metadata and (small) data"]
blob[Blob Storage for larger data]
datapub[DataPub based Publisher app with auth via GitHub]
metastore[MetaStore - may just be a single file]
portal[Portal built in Portal.JS]
giftless[Giftless - Storage Access]

user[User]

github -.via git lfs.-> blob
portal --> metastore
metastore --> github
portal -.-> github
datapub -.-> github
datapub -.sign blob storage upload.-> giftless
datapub -.-> blob

user --view--> portal
user -.direct download.-> blob
user -.edit.-> datapub
Loading

Layout of a dataset in new setup (in GitHub)

  • each OpenSpending dataset has a repo on github in an org related to OpenSpending
  • each dataset is a Frictionless Dataset/Package with a datapackage.json which is a Frictionless Tabular Dataset/Package, and possibly (esp GIFT ones) a Fiscal Data Package
    • Data if small could be stored on github
    • For larger data we store to cloud
      • Either S3 or GCP (prob GCS)
      • If on cloud we store with giftless 🎁 with proper lfs info
    • README (?)

Each dataset looks like this:

README.md
datapackage.json
# data files - either in root directory or could be in subdirectories e.g. /data/
my-file.csv
my-file-2.csv

Tasks

  • Overview of new system
  • Dataset layout
  • MetaStore setup
  • ...

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions