Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

ichekrygin
Copy link
Contributor

@ichekrygin ichekrygin commented Aug 4, 2025

Ensure that scaling down an ElasticJob updates the reclaimablePods field in the workload status, allowing the scheduler to detect freed capacity and requeue pending workloads.

This resolves a bug where pending jobs remained unadmitted even after capacity was released by a scale-down operation.

What type of PR is this?

/kind bug

What this PR does / why we need it:

This resolves a bug where pending jobs remained unadmitted even after capacity was released by a scale-down operation.

Which issue(s) this PR fixes:

Fixes #6384

Special notes for your reviewer:

Does this PR introduce a user-facing change?

ElasticJobs: Fix the bug that scheduling of the Pending workloads was not triggered on scale-down of the running 
elastic Job which could result in admitting one or more of the queued workloads.

Ensure that scaling down an ElasticJob updates the reclaimablePods field in the workload status, allowing the scheduler to detect freed capacity and requeue pending workloads.

This resolves a bug where pending jobs remained unadmitted even after capacity was released by a scale-down operation.

Signed-off-by: ichekrygin <[email protected]>
@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/bug Categorizes issue or PR as related to a bug. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Aug 4, 2025
Copy link

netlify bot commented Aug 4, 2025

Deploy Preview for kubernetes-sigs-kueue canceled.

Name Link
🔨 Latest commit 905580f
🔍 Latest deploy log https://app.netlify.com/projects/kubernetes-sigs-kueue/deploys/68905b717aa54700080b088b

@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Aug 4, 2025
gomega.Eventually(func(g gomega.Gomega) {
testJobA.Spec.Parallelism = ptr.To(int32(1))
g.Expect(k8sClient.Update(ctx, testJobA)).Should(gomega.Succeed())
}).Should(gomega.Succeed())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add some timeout, I also noticed they are missing in the old tests, so you could specify them too

}
case prevStatus == workload.StatusAdmitted && status == workload.StatusAdmitted && !equality.Semantic.DeepEqual(e.ObjectOld.Status.ReclaimablePods, e.ObjectNew.Status.ReclaimablePods):
case prevStatus == workload.StatusAdmitted && status == workload.StatusAdmitted && !equality.Semantic.DeepEqual(e.ObjectOld.Status.ReclaimablePods, e.ObjectNew.Status.ReclaimablePods),
features.Enabled(features.ElasticJobsViaWorkloadSlices) && workloadslicing.ScaledDown(workload.ExtractPodSetCountsFromWorkload(e.ObjectOld), workload.ExtractPodSetCountsFromWorkload(e.ObjectNew)):
Copy link
Contributor

@mimowo mimowo Aug 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, but let me use this as an occasion to get understanding on how scale down works.

When a user requests scale down:

  1. Do we release some quota during the scaledown before the pods are actually deleted, resulting in temporarily pods running over quota?
  2. Do we keep more quota than needed until the scale up is finished?

IIUC ideally we would gradually release the quota (as the pods terminate and are accounted in reclaimablePods) until the scale down is finished. Once finished we cleanup reclaimablePods and update the podsets count, so that the scaled down workload looks as if it was created new. This workload property would make debugging easier.

Let me know if I'm potentially missing something. If you agree, but assess this is more work I can lgtm / approve this one, so that we fix one issue at the time, let me know.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, batch/v1.Job is the only framework that supports the ElasticJobsViaWorkloadSlices feature.
A scale-down is triggered when the user (or automation) updates the Job's spec.parallelism to a lower value. Kueue’s handling of this event is entirely reactionary. Once the spec.parallelism field is decreased, the Kubernetes Job controller proceeds to complete and remove the excess pods. Kueue, in parallel, observes the change and updates the associated Workload’s spec.podSets[].count to reflect the new, lower value.

Since both the K8s Job controller and the Kueue controller respond independently to the same job change, the pod removal and quota release happen concurrently. In practice, pod deletion often completes slightly before the Workload update and subsequent quota release. However, there are no guardrails to enforce strict ordering, nor is there coordination between the two controllers to ensure that quota is only released after the pods are gone.

By the same token, there's no concept of gradual scale-down in this context. When a Job’s parallelism is reduced, for example, from 10 to 2, the Kubernetes Job controller will proceed to delete the 8 excess pods almost immediately. The deletions happen as fast as the controller and API server allow, and there's no stepwise or progressive reduction. So in practice, the scale-down is effectively instantaneous from the controller's point of view.

Copy link
Contributor

@mimowo mimowo Aug 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, actually the reclaimablePods are only decremented when the terminated pods are Succeeded. We should not make the assumption for pods deleted due to scale down. They may end in 0 or non zero exit code. So, I think it is correct not to rely on reclaimablePods.

I see this approach allows for pods running at the same beyond quota for a brief time. Since this is temporary only, I'm not very concerned about it. It would be ideal to release quota gradually, but it might be hard in practice, so we can leave it as a follow up feature on its own.

@mimowo
Copy link
Contributor

mimowo commented Aug 4, 2025

/lgtm
/approve
/cherrypick release-0.13
Thanks 👍

@k8s-infra-cherrypick-robot
Copy link
Contributor

@mimowo: once the present PR merges, I will cherry-pick it on top of release-0.13 in a new PR and assign it to you.

In response to this:

/lgtm
/approve
/cherrypick release-0.13
Thanks 👍

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 4, 2025
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: c4ab65448e17b899628541389b1620ac857a3e3d

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ichekrygin, mimowo

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 4, 2025
@k8s-ci-robot k8s-ci-robot merged commit f9ff9f9 into kubernetes-sigs:main Aug 4, 2025
22 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v0.14 milestone Aug 4, 2025
@k8s-infra-cherrypick-robot
Copy link
Contributor

@mimowo: new pull request created: #6407

In response to this:

/lgtm
/approve
/cherrypick release-0.13
Thanks 👍

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@mimowo
Copy link
Contributor

mimowo commented Aug 4, 2025

Proposal, ptal
/release-note-edit

ElasticJobs: Fix the bug that scheduling of the Pending workloads was not triggered on scale-down of the running 
elastic Job which could result in admitting one or more of the queued workloads.

@ichekrygin ichekrygin deleted the job-scale-down-fix branch August 4, 2025 15:40
kannon92 pushed a commit to openshift-kannon92/kubernetes-sigs-kueue that referenced this pull request Aug 11, 2025
…gs#6395)

* Fix: trigger workload requeue on ElasticJob scale-down
Ensure that scaling down an ElasticJob updates the reclaimablePods field in the workload status, allowing the scheduler to detect freed capacity and requeue pending workloads.

This resolves a bug where pending jobs remained unadmitted even after capacity was released by a scale-down operation.

Signed-off-by: ichekrygin <[email protected]>

* Add missing "Eventually" timeouts and retry intervals.

Signed-off-by: ichekrygin <[email protected]>

---------

Signed-off-by: ichekrygin <[email protected]>
@mimowo mimowo mentioned this pull request Sep 30, 2025
36 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ElasticJob scale-down does not trigger re-scheduling of pending inadmissible workloads
4 participants