Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Provide inputs to non-root nodesΒ #9

@ztaylor54

Description

@ztaylor54

πŸ’ͺ Motivation

Situations often arise where it would be nice to inject inputs farther down-the-line of pipeline execution than the root node. This is often useful during testing, where the behavior of individual pipeline steps needs to be examined without needing to run data & inputs all the way through the pipeline first.

It is also useful when pipeline steps fail or must be re-run due to misconfiguration or other issues, such as a failure in an externally-configured service. In cases like these, it would be desirable to execute a partial re-run of a pipeline, starting from where the previous run left off. This would avoid duplication of (possibly expensive) work performed by earlier pipeline steps.

Note: The use-case for a partial re-run likely warrants some method of "replaying" pipeline inputs - this could be achieved by caching inputs in the manager's work queues, or something similar.

πŸ“– Additional Details

For a more concrete example, consider the following pipeline:

              +-------+      +--------+      +------+
 datainput -> | start | ---> | middle | ---> | last | -> end
              +-------+      +--------+      +------+

If an error occurs in middle, we might have reason to send data from the datainput directly to middle, thus bypassing start. This might be implemented in a DataInput spec as follows:

spec:
  data: 
    <data block>
  target: middle # add target: <node>

Which would result in the DataInput's container pushing data to middle's work queue, instead of root.

There are a few considerations / caveats:

  • The DataInput schema will need to be updated to include the target: <node> option, specifying that the output queue of the DataInput should be something other than the root node. Will default to the root node of target is not specified.
  • With the current implementation, a given node may have more than one workqueue (incoming edge) it gets inputs from in round-robin. Shortcut-inputs could be evenly distributed, put all into one queue, or handled separately - the correct approach is unclear.
  • While the DataInput can somewhat-easily be configured to pass data to a different step in the pipeline, it is less straightforward to get the underlying container to pass inputs that middle would care about (i.e. emulate start's output).
    • This is where an input "replay" will come in handy, but there's still the case where inputs are unavailable such as during a test of a single pipeline step. This likely requires a new DataInput container to be created specifically for this purpose.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions