Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[Data/Autoscaler] Proposal: Release all resources of an upstream op at once when it finishes, instead of step-wise scale down #63299

@allendang001

Description

@allendang001

Description

In the current Ray Data / autoscaler behavior, when an upstream operator (op) finishes processing,
its resources are scaled down gradually step-by-step. This makes the scale-down process very slow,
and as a result, downstream ops struggle to quickly acquire resources to process data, leading to
pipeline stalls and underutilization.

Problem

  • Upstream op scale-down is very slow (step-wise).
  • Downstream ops wait a long time before they can get enough resources.
  • Overall pipeline throughput suffers because of this delayed resource handoff.

Proposal

Once an op has fully finished processing its data, release all resources occupied by that op
in a single shot, instead of scaling them down incrementally per step. This would:

  • Allow downstream ops to immediately claim freed resources.
  • Reduce end-to-end latency of the pipeline.
  • Simplify the resource handoff logic between ops.

Question

Is there a specific reason the current design prefers step-wise scale-down over one-shot release
after an op completes? Would the community be open to changing this behavior (or making it
configurable)?

Use case

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    community-backlogdataRay Data-related issuesenhancementRequest for new feature and/or capabilityperformancequestionJust a question :)triageNeeds triage (eg: priority, bug/not-bug, and owning component)

    Type

    No type

    Projects

    Status

    Todo

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions