Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Node output mappingΒ #6

@ztaylor54

Description

@ztaylor54

πŸ’ͺ Motivation

Currently all KDP pipelines are basically parallelized linked lists - each node load balances output to all later steps. We currently don't support conditional branching, but it would be a useful feature to implement. This would allow for far more complex pipeline structure, giving pipeline designers much more flexibility to adapt existing workflows to use KDP.

πŸ“– Additional Details

We're going to need to add conditional branching to the edge definitions in a pipeline. There are probably a bunch of ways to go about this.. some ok, most bad. I'd like to be thoughtful here so that we don't hack something together that breaks a lot of the KDP "core tenets," if you will.

The specification should likely go in the pipeline.spec, as the conditionals should be immutable. We should also force that all nodes be reachable, i.e. having a path from root given some set of conditionals. The pipeline validator (yet to be linked to the operator) should also check for possible cycles & emit a warning, possibly even block application of the pipeline unless a specific flag has_cycles (or something to that effect) is specified with the pipeline definition.

The goal here is to be as concise and readable as possible so as not to bloat the pipeline definition schema, while maintaining high extendability & flexibility for pipeline developers.

Ideas for implementation

Consider the following pipeline definition:

              +-------+      +--------+      +------+
 datainput -> | start | ---> | middle | ---> | last | -> end
              +-------+      +--------+      +------+

With the following corresponding yaml:

graph:
    nodes:
      start:
        # <...>
      middle:
        # <...>
      last:
        # <...>
    edges:
    - source: start
      target: middle
    - source: middle
      target: last

A valid use case might look like the following:

              +-------+      +--------+      +--------+
 datainput -> | start | ---> | middle | ---> | last_0 | -> end
              +-------+      +--------+      +--------+
                                 |
                                 |           +--------+
                                 β””---------> | last_1 | -> end
                                             +--------+

Where input flows to last_0 in a nominal case, and to last_1 in an error condition or some other result. Let's say that we set a boolean flag got_error in the output of middle at the top-level of the JSON object.

We could extend graph.edges to include the mapping in the following manner:

edges:
- source: start
  target: middle
- source: middle
  target: last_0
  # lack of "conditions" could imply default branch
- source: middle
   target: last_1
   conditions:
   - match_value: # support multiple types of conditions
       key: "got_error"
       val: "true"
   # array for multiple ? or perhaps take an approach similar to elasticsearch boolean queries

As is evidenced above, there's still plenty to think about. This seems like a good start.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions