Thanks to visit codestin.com
Credit goes to github.com

Skip to content

feat: separate structure and profile in tables.yaml #337

@adrienaury

Description

@adrienaury

Problem

tables.yaml file mixes 2 types of informations

  1. information about the datasource structure (tables names, primary keys, dbinfos types)
  2. information about extract or load operations (list of columns, export format, import format)

This example :

version: "1"
tables:
- name: "film"
    keys: ["film_id"]
    columns:
      - name: "film_id"
        dbinfo:
          type: "bigserial"
      - name: "title"
        dbinfo:
          type: "varchar"
          length: 30
          bytes: true
      - name: "picture"
        export: "presence"
        import: "file"
        dbinfo:
          type: "BLOB"

Contains information about the datasource structure (tables names, primary keys, dbinfos types) :

version: "1"
tables:
- name: "film"
    keys: ["film_id"]
    columns:
      - name: "film_id"
        dbinfo:
          type: "bigserial"
      - name: "title"
        dbinfo:
          type: "varchar"
          length: 30
          bytes: true
      - name: "picture"
        dbinfo:
          type: "BLOB"

And information about extract or load operations (list of columns to export, export formats, import formats) :

version: "1"
tables:
- name: "film"
    columns:
      - name: "film_id"
      - name: "title"
      - name: "picture"
        export: "presence"
        import: "file"

There is a difference between each type of information

  1. information about the datasource structure never change
  2. information about extract or load operations will vary depending on the use case

Therefore, it would be interresting to separate these concerns in different files.

Solution

This does not impact existing configurations.

Information about extract or load operations should be managed by the existing ingress-descriptor configuration. This configuration is loaded by the pull and push command via the existing flag : --ingress-descriptor<filename> or -i <filename>.

Ingress descriptor file already manage list of columns to select. The only missing information to complete extract/load operations is the import/export formats.

When using the --ingress-descriptor flag, import/export formats contained inside the ingress-descriptor file will be overriding informations loaded from the root table.yaml file. This is for retro-compatibility with current behavior.

The previous exemple could be configured like this :

tables.yaml

version: "1"
tables:
- name: "film"
    keys: ["film_id"]
    columns:
      - name: "film_id"
        dbinfo:
          type: "bigserial"
      - name: "title"
        dbinfo:
          type: "varchar"
          length: 30
          bytes: true
      - name: "picture"
        dbinfo:
          type: "BLOB"

ingress-descriptor.yaml

version: v1
IngressDescriptor:
    startTable: "film"
    select: ["film_id", "title", "picture"]
    formats:
      - columns: "picture"
        export: "presence"
        import: "file"

The following command would extract data with list of columns to export and export formats defined in ingress-descriptor.yaml

$ lino pull source --ingress-descriptor ingress-descriptor.yaml

The following command would load data with list of columns to import and importformats defined in ingress-descriptor.yaml

$ lino push source --ingress-descriptor ingress-descriptor.yaml

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions