Increase the concurrency share of workload-low priority level #95259

tkashem · 2020-10-02T18:13:40Z

What type of PR is this?

/kind bug

What this PR does / why we need it:

All workloads running using service account (except for the ones distinguished by p&f with a logically higher matching precedence) will match the service-accounts flow schema and be assigned to the workload-low priority and thus will have only 20 concurrency shares. (~10% of the total)

On the other hand, global-default flow schema is assigned to global-default priority configuration and thus will have 100 concurrency shares (~50% of the total). If I am not mistaken, global-default goes pretty much unused since workloads running with user (not service account) will fall into this category and is not very common.

Workload with service accounts do not have enough concurrency share and may starve. Increase the concurrency share of workload-low from 20 to 100 and reduce that of global-default from 100 to 20.

Another potential solution: maybe when a concurrency pool is starving, it can borrow from another pool that has concurrency shares to spare?

NONE

k8s-ci-robot · 2020-10-02T18:13:48Z

Hi @tkashem. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

tkashem · 2020-10-02T18:16:12Z

I would like your thoughts on it, there might be some scenarios I am missing but it looks like the concurrency share of workload-low is very low compared to global-default.

/assign @MikeSpreitzer @yue9944882

fejta-bot · 2020-10-02T18:49:43Z

This PR may require API review.

If so, when the changes are ready, complete the pre-review checklist and request an API review.

Status of requested reviews is tracked in the API Review project.

tkashem · 2020-10-07T19:12:01Z

/assign @deads2k @lavalamp

tkashem · 2020-10-07T19:25:38Z

/retest

tkashem · 2020-10-08T12:47:44Z

/retest

MikeSpreitzer

This seems like a plausible improvement to me.

When designing API Priority and Fairness, we initially wanted to have some borrowing between priority levels. But it was a bit tricky to agree on a good definition for the behavior. We decided to do something simpler first, and later add that borrowing.

Please also note the following remark from https://github.com/kubernetes/enhancements/blob/master/keps/sig-api-machinery/20190228-priority-and-fairness.md#non-goals :

This KEP does not introduce threading of additional information through webhooks and/or along other paths to support avoidance of priority inversions. While that is an attractive thing to consider in the future, this KEP is deliberately limited in its ambition. The intent for this KEP is to document that for the case of requests that are secondary to some other requests the configuration should identify those secondary requests and give them sufficiently high priority to avoid priority inversion problems. That will necessarily be approximate, and we settle for that now.

deads2k · 2020-10-15T15:49:44Z

staging/src/k8s.io/apiserver/pkg/apis/flowcontrol/bootstrap/default.go

 			Type: flowcontrol.PriorityLevelEnablementLimited,
 			Limited: &flowcontrol.LimitedPriorityLevelConfiguration{
-				AssuredConcurrencyShares: 20,
+				AssuredConcurrencyShares: 100,


The belief is that we have many more of these than we do workload-high clients?

@deads2k global-default matches workloads that are running as non SA accounts. If I am not mistaken they are fewer compared to service-accounts which is basically all workloads (minus workload-high) running inside the cluster with service accounts.

service-accounts workload matches the workload-low priority level.
The concurrency pool of workload-low is 20 whereas that of global-default is 100, This PR swaps the values so service-accounts workloads have more concurrency share.

As far as workload-high is concerned, I have not seen workload-high workloads starve in my testing where we had about 13K Pods running at any time and a high namespace and pod churn rate. It was a 250-node cluster.
At the same time, I recently came across a CI job where workload-high was starving because a certain admission controller OwnerReferencesPermissionEnforcement was taking order of seconds to respond and triggering p&f to throttle workload-high.

yue9944882

LGTM

MikeSpreitzer · 2020-10-22T17:19:55Z

/LGTM

MikeSpreitzer · 2020-10-22T17:20:30Z

/ok-to-test

tkashem · 2020-10-22T17:25:20Z

/retest

tkashem · 2020-10-22T19:43:49Z

/retest

lavalamp · 2020-10-22T20:36:53Z

/approve

k8s-ci-robot · 2020-10-22T20:37:24Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: lavalamp, MikeSpreitzer, tkashem

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~staging/src/k8s.io/apiserver/pkg/apis/OWNERS~~ [lavalamp]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

tkashem · 2020-10-23T13:21:55Z

/retest

tkashem · 2020-10-23T16:19:06Z

/retest

tkashem · 2020-10-23T16:26:10Z

/sig api-machinery

tkashem · 2020-10-23T16:28:36Z

/triage accepted

tkashem · 2020-10-23T16:29:37Z

/priority important-soon

k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Oct 2, 2020

k8s-ci-robot requested review from piosz and sttts October 2, 2020 18:13

k8s-ci-robot added the kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API label Oct 2, 2020

k8s-ci-robot assigned MikeSpreitzer and yue9944882 Oct 2, 2020

tkashem changed the title ~~allocate service-account flowschema to global-default~~ assign service-account flowschema to global-default priority level Oct 2, 2020

tkashem changed the title ~~assign service-account flowschema to global-default priority level~~ [WIP] assign service-account flowschema to global-default priority level Oct 4, 2020

k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 4, 2020

allocate service-account flowschema to global-default

fd7bf9a

tkashem force-pushed the apf-workload-low branch from 7d9fcfb to fd7bf9a Compare October 7, 2020 19:02

tkashem changed the title ~~[WIP] assign service-account flowschema to global-default priority level~~ Increase the concurrency share of workload-low priority level Oct 7, 2020

k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 7, 2020

k8s-ci-robot assigned deads2k and lavalamp Oct 7, 2020

MikeSpreitzer approved these changes Oct 15, 2020

View reviewed changes

deads2k reviewed Oct 15, 2020

View reviewed changes

yue9944882 reviewed Oct 16, 2020

View reviewed changes

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 22, 2020

k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Oct 22, 2020

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 22, 2020

k8s-ci-robot added sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Oct 23, 2020

k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Oct 23, 2020

k8s-ci-robot added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Oct 23, 2020

k8s-ci-robot merged commit 6578517 into kubernetes:master Oct 23, 2020

k8s-ci-robot added this to the v1.20 milestone Oct 23, 2020

tkashem mentioned this pull request Jan 13, 2021

add auto update for priority & fairness bootstrap configuration objects #98028

Merged

tkashem mentioned this pull request Feb 10, 2021

Bug 1926724: UPSTREAM: 98028: add auto update for priority & fairness bootstrap configuration objects openshift/kubernetes#563

Merged

tkashem mentioned this pull request May 7, 2021

Bug 1927397: UPSTREAM: 98028: add auto update for priority & fairness bootstrap configuration objects openshift/kubernetes#736

Merged

openshift-ci-robot mentioned this pull request Sep 16, 2021

[release-4.6] Bug 2008266: Rebase 1.19.14 openshift/kubernetes#962

Merged

Increase the concurrency share of workload-low priority level #95259

Increase the concurrency share of workload-low priority level #95259

Uh oh!

Conversation

tkashem commented Oct 2, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

k8s-ci-robot commented Oct 2, 2020

Uh oh!

tkashem commented Oct 2, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fejta-bot commented Oct 2, 2020

Uh oh!

tkashem commented Oct 7, 2020

Uh oh!

tkashem commented Oct 7, 2020

Uh oh!

tkashem commented Oct 8, 2020

Uh oh!

MikeSpreitzer left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

deads2k Oct 15, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tkashem Oct 15, 2020

Choose a reason for hiding this comment

Uh oh!

yue9944882 left a comment

Choose a reason for hiding this comment

Uh oh!

MikeSpreitzer commented Oct 22, 2020

Uh oh!

MikeSpreitzer commented Oct 22, 2020

Uh oh!

tkashem commented Oct 22, 2020

Uh oh!

tkashem commented Oct 22, 2020

Uh oh!

lavalamp commented Oct 22, 2020

Uh oh!

k8s-ci-robot commented Oct 22, 2020

Uh oh!

tkashem commented Oct 23, 2020

Uh oh!

tkashem commented Oct 23, 2020

Uh oh!

tkashem commented Oct 23, 2020

Uh oh!

tkashem commented Oct 23, 2020

Uh oh!

tkashem commented Oct 23, 2020

Uh oh!

Uh oh!

tkashem commented Oct 2, 2020 •

edited

Loading

tkashem commented Oct 2, 2020 •

edited

Loading

MikeSpreitzer left a comment •

edited

Loading

deads2k Oct 15, 2020 •

edited

Loading