vine: maintain a DAG structure inside of TaskVine

The workflow's DAG structure is maintained in DaskVine, whereas TaskVine is responsible for iterating over submitted tasks to evaluate their committability.

However, data dependencies between different tasks are inherently determined upon submission, but this information is overlooked at the TaskVine level.

This leads to scheduling inefficiencies, as we repeatedly evaluate tasks for committability even when their inputs have not been materialized at all, which delays the timing to find the truly runnable tasks.

For example, tasks whose inputs are unmaterialized should be enqueued in `q->pending_tasks`, while others are enqueued in `q->ready_tasks`. A `cache-update` message enables the pruning of its producer tasks and the scheduling of its consumer tasks, and an `unlink` message triggers moving a task from the ready queue to the pending queue.

Besides, graph operations on the C side are more efficient than in Python, which theoretically allows for more complex graph optimizations without Python inefficiencies becoming the bottleneck.

For example, if the only worker holding a file is lost, we can easily compute the recovery cost by iterating over its upstream tasks. Also, we can merge a subgraph of tasks and commit them as a single task to reduce scheduling latency and enhance data locality.

All graph operations are handled in TaskVine. Instead, DaskVine serves as an additional layer that uses the exposed APIs.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

vine: maintain a DAG structure inside of TaskVine #4114

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

vine: maintain a DAG structure inside of TaskVine #4114

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions