Node output mapping

## 💪 Motivation

Currently all KDP pipelines are basically parallelized linked lists - each node load balances output to all later steps. We currently don't support conditional branching, but it would be a useful feature to implement. This would allow for far more complex pipeline structure, giving pipeline designers much more flexibility to adapt existing workflows to use KDP.

## 📖 Additional Details


We're going to need to add conditional branching to the edge definitions in a pipeline. There are probably a bunch of ways to go about this.. some ok, most _bad_. I'd like to be thoughtful here so that we don't hack something together that breaks a lot of the KDP "core tenets," if you will. 

The specification should likely go in the `pipeline.spec`, as the conditionals should be immutable. We should also force that all nodes be reachable, i.e. having a path from root given _some_ set of conditionals. The pipeline validator (yet to be linked to the operator) should also check for possible cycles & emit a warning, possibly even block application of the pipeline unless a specific flag `has_cycles` (or something to that effect) is specified with the pipeline definition.

The goal here is to be as concise and readable as possible so as not to bloat the pipeline definition schema, while maintaining high extendability & flexibility for pipeline developers.

## Ideas for implementation

Consider the following pipeline definition:

```
              +-------+      +--------+      +------+
 datainput -> | start | ---> | middle | ---> | last | -> end
              +-------+      +--------+      +------+
```

With the following corresponding yaml:

```yaml
graph:
    nodes:
      start:
        # <...>
      middle:
        # <...>
      last:
        # <...>
    edges:
    - source: start
      target: middle
    - source: middle
      target: last
```

A valid use case might look like the following: 

```
              +-------+      +--------+      +--------+
 datainput -> | start | ---> | middle | ---> | last_0 | -> end
              +-------+      +--------+      +--------+
                                 |
                                 |           +--------+
                                 └---------> | last_1 | -> end
                                             +--------+
```

Where input flows to `last_0` in a nominal case, and to `last_1` in an error condition or some other result. Let's say that we set a boolean flag `got_error` in the output of `middle` at the top-level of the JSON object.

We could extend `graph.edges` to include the mapping in the following manner:

```yaml
edges:
- source: start
  target: middle
- source: middle
  target: last_0
  # lack of "conditions" could imply default branch
- source: middle
   target: last_1
   conditions:
   - match_value: # support multiple types of conditions
       key: "got_error"
       val: "true"
   # array for multiple ? or perhaps take an approach similar to elasticsearch boolean queries
```

As is evidenced above, there's still plenty to think about. This seems like a good start.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Node output mapping #6

💪 Motivation

📖 Additional Details

Ideas for implementation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Node output mapping #6

Description

💪 Motivation

📖 Additional Details

Ideas for implementation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions