Thanks to visit codestin.com
Credit goes to github.com

Skip to content

vine: maintain a DAG structure inside of TaskVine #4114

@JinZhou5042

Description

@JinZhou5042

The workflow's DAG structure is maintained in DaskVine, whereas TaskVine is responsible for iterating over submitted tasks to evaluate their committability.

However, data dependencies between different tasks are inherently determined upon submission, but this information is overlooked at the TaskVine level.

This leads to scheduling inefficiencies, as we repeatedly evaluate tasks for committability even when their inputs have not been materialized at all, which delays the timing to find the truly runnable tasks.

For example, tasks whose inputs are unmaterialized should be enqueued in q->pending_tasks, while others are enqueued in q->ready_tasks. A cache-update message enables the pruning of its producer tasks and the scheduling of its consumer tasks, and an unlink message triggers moving a task from the ready queue to the pending queue.

Besides, graph operations on the C side are more efficient than in Python, which theoretically allows for more complex graph optimizations without Python inefficiencies becoming the bottleneck.

For example, if the only worker holding a file is lost, we can easily compute the recovery cost by iterating over its upstream tasks. Also, we can merge a subgraph of tasks and commit them as a single task to reduce scheduling latency and enhance data locality.

All graph operations are handled in TaskVine. Instead, DaskVine serves as an additional layer that uses the exposed APIs.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions