Metadata services

This repository provides entrypoint scripts to generate configurations for PyCSW and PyGeoAPI in a composite and modular way. Such configurations use file partitioning on object storage solutions to read partial configurations from multiple sources.

Workflow example

PyCSW

PyCSW entrypoint uses duckdb to populate a SQLite database with metadata. Different services can then write different parquet files to expose such metadata, and DuckDB uses partitioned read to aggregate them and insert into the SQLite DB.

A metadata row should follow the Metadata model reference of pycsw. The XML property can be generated with PyGeoMeta.

PyGeoAPI

PyGeoAPI entrypoint reads a series of JSON files containing arrays of pygeoapi resources (https://docs.pygeoapi.io/en/latest/data-publishing/ogcapi-features.html). All the resources are joined together with the base configuration.

Reference for creating resources
Reference for creating OGR providers of vector resources

Setup

Setup the variables accordingly.

docker compose up --build

Rationale

pygeoapi and pycsw are very good software, but hard to integrate in other python softwares. They are excellent when used statically (pygeoapi in particular is not very dynamic at runtime). So the idea is to decouple the data sharing from the data storage, and use modern tools like DuckDB and Object Buckets as source of truth for the services. Data should be provided in a static way (using Cloud Native Formats), and should be updated through some recurrent task. Metadata and Configuration should also be treated as data, but stored in a different file set in the Object Storage.

The current implementation just add a thin wrapper over the original services, tipically a python or a bash script executed as entrypoint. Such scripts will fetch the metadata and the configurations and adjust them in the way that the service requires:

a SQLite DB is created for PyCSW, injected with the content of the parquet files stored in the Object Storage
a YML file is created for PyGeoAPI, injected with the configurations read by JSON files stored in the Object Storage

Pros of this approach

Easy to update the original service
Services are idempotent, with the same set of data/coonfiguration they produce the same result.
Complexity is moved to a specialized task, that can depend on the source
More control over what is published

Cons of this approach

Services must be restarted to get updates
More infrastructure burden
You need to have good control of your buckets :D
Slower startup

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
.github/workflows		.github/workflows
pycsw		pycsw
pygeoapi		pygeoapi
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Metadata services

Workflow example

PyCSW

PyGeoAPI

Setup

Rationale

Pros of this approach

Cons of this approach

About

Uh oh!

Packages

Uh oh!

Uh oh!

Contributors 2

Uh oh!

Languages

NINAnor/metadata_services

Folders and files

Latest commit

History

Repository files navigation

Metadata services

Workflow example

PyCSW

PyGeoAPI

Setup

Rationale

Pros of this approach

Cons of this approach

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Packages 0

Uh oh!

Uh oh!

Contributors 2

Uh oh!

Languages

Packages