-
Notifications
You must be signed in to change notification settings - Fork 56
Open
Description
When doing the migration to the new setup I want an understanding of the current platform so we both know what needs to be migrated and how we could do that
Acceptance
- Diagram / overview of key components and flows
Tasks
- Interview @akariv about thisi
- Summarize results
Outputs
Docs: https://openspending.readthedocs.io/en/latest/developers/ - good though no overview diagram / information on flows.
FAQs
- Where is data stored (and the packages). ANS: Everything does get stored to S3
- How are they imported into the API. ANS: Via os-api method
- Are all datasets fiscal data packaged? ANS: not sure, probably not all ...
graph TD
start[Start - User has a file]
subgraph Import
dgp[Data Genus Processor]
packager[Packager]
dpp[Data Package Pipelines]
end
start --> dgp
start -.-> packager
storage[Storage - S3]
db[Structured DB]
dgp --> storage
packager --dp.json + csv--> storage
dgp --> db
storage --OS API method--> db
storage --dpp os-data-importers--> db
db --> dataapi[Data API]
dgp -.-> metastore[MetaStore]
- https://github.com/openspending/os-data-importers
- https://github.com/openspending/os-api
- DGP: processes each file separately ... (mexico has 15 years)
- importer: processed all together ...
- DPP is deprecated in favour of the DGP
Storage
https://s3.amazonaws.com/ or https://s3.amazonaws.com:443/
/datastore.openspending.org/dataset-id/name/datapackage.json
http://datastore.openspending.org/
/dataset-id-part-1/dataset-id-part-2/final/datapackage.json
From Elasticsearch, we find fields such as _id
that describe a package. For instance, for the id 525a389a219b4aa66f2b2b94311c76e5:peruntukan-kpd
with S3, we have the following storage URL:
http://s3.amazonaws.com/datastore.openspending.org/525a389a219b4aa66f2b2b94311c76e5/peruntukan-kpd/datapackage.json
For the id 38f2a72864f6414ccdf0f58a50663b9b:mongolia_budget
with datastore.openspending.org, we have (note the extra part /final
in the URL):
http://datastore.openspending.org/38f2a72864f6414ccdf0f58a50663b9b/mongolia_budget/final/datapackage.json
Plan:
- Take the dataset list from elasticsearch => this is our catalog
- Try and throw irrelevant away ...
- Then publish a public list (without private info like emails)
- then check files on S3 ...
Metadata
Metadata
Assignees
Labels
No labels