Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Overview of Platform and Key Flows as of July 2020 #1485

@rufuspollock

Description

@rufuspollock

When doing the migration to the new setup I want an understanding of the current platform so we both know what needs to be migrated and how we could do that

Acceptance

  • Diagram / overview of key components and flows

Tasks

  • Interview @akariv about thisi
  • Summarize results

Outputs

Docs: https://openspending.readthedocs.io/en/latest/developers/ - good though no overview diagram / information on flows.

FAQs

  • Where is data stored (and the packages). ANS: Everything does get stored to S3
  • How are they imported into the API. ANS: Via os-api method
  • Are all datasets fiscal data packaged? ANS: not sure, probably not all ...
graph TD

start[Start - User has a file]

subgraph Import
  dgp[Data Genus Processor]
  packager[Packager]
  dpp[Data Package Pipelines]
end

start --> dgp
start -.-> packager

storage[Storage - S3]
db[Structured DB]

dgp --> storage
packager --dp.json + csv--> storage
dgp --> db

storage --OS API method--> db
storage --dpp os-data-importers--> db

db --> dataapi[Data API]

dgp -.-> metastore[MetaStore]
Loading

Storage

https://s3.amazonaws.com/ or https://s3.amazonaws.com:443/

/datastore.openspending.org/dataset-id/name/datapackage.json

http://datastore.openspending.org/

/dataset-id-part-1/dataset-id-part-2/final/datapackage.json

From Elasticsearch, we find fields such as _id that describe a package. For instance, for the id 525a389a219b4aa66f2b2b94311c76e5:peruntukan-kpd with S3, we have the following storage URL:

http://s3.amazonaws.com/datastore.openspending.org/525a389a219b4aa66f2b2b94311c76e5/peruntukan-kpd/datapackage.json

For the id 38f2a72864f6414ccdf0f58a50663b9b:mongolia_budget with datastore.openspending.org, we have (note the extra part /final in the URL):

http://datastore.openspending.org/38f2a72864f6414ccdf0f58a50663b9b/mongolia_budget/final/datapackage.json

Plan:

  • Take the dataset list from elasticsearch => this is our catalog
    • Try and throw irrelevant away ...
    • Then publish a public list (without private info like emails)
  • then check files on S3 ...

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions