Provide inputs to non-root nodes

## 💪 Motivation

Situations often arise where it would be nice to inject inputs farther down-the-line of pipeline execution than the root node. This is often useful during testing, where the behavior of individual pipeline steps needs to be examined without needing to run data & inputs all the way through the pipeline first. 

It is also useful when pipeline steps fail or must be re-run due to misconfiguration or other issues, such as a failure in an externally-configured service. In cases like these, it would be desirable to execute a partial re-run of a pipeline, starting from where the previous run left off. This would avoid duplication of (possibly expensive) work performed by earlier pipeline steps.

**Note**: The use-case for a partial re-run likely warrants some method of "replaying" pipeline inputs - this could be achieved by caching inputs in the manager's work queues, or something similar. 

## 📖 Additional Details


For a more concrete example, consider the following pipeline:

```
              +-------+      +--------+      +------+
 datainput -> | start | ---> | middle | ---> | last | -> end
              +-------+      +--------+      +------+
```

If an error occurs in `middle`, we might have reason to send data from the datainput directly to `middle`, thus bypassing `start`. This might be implemented in a DataInput spec as follows:

```yaml
spec:
  data: 
    <data block>
  target: middle # add target: <node>
```

Which would result in the DataInput's container pushing data to `middle`'s work queue, instead of root. 

There are a few considerations / caveats:
  - The DataInput schema will need to be updated to include the `target: <node>` option, specifying that the output queue of the DataInput should be something other than the root node. Will default to the root node of `target` is not specified.
  - With the current implementation, a given node may have more than one workqueue (incoming edge) it gets inputs from in round-robin. Shortcut-inputs could be evenly distributed, put all into one queue, or handled separately - the correct approach is unclear.
  - While the DataInput can somewhat-easily be configured to pass data to a different step in the pipeline, it is less straightforward to get the underlying container to pass inputs that `middle` would care about (i.e. emulate `start`'s output).
    - This is where an input "replay" will come in handy, but there's still the case where inputs are unavailable such as during a test of a single pipeline step. This likely requires a new DataInput container to be created specifically for this purpose.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Provide inputs to non-root nodes #9

💪 Motivation

📖 Additional Details

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Provide inputs to non-root nodes #9

Description

💪 Motivation

📖 Additional Details

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions