Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

tkashem
Copy link
Contributor

@tkashem tkashem commented Oct 2, 2020

What type of PR is this?

/kind bug

What this PR does / why we need it:

All workloads running using service account (except for the ones distinguished by p&f with a logically higher matching precedence) will match the service-accounts flow schema and be assigned to the workload-low priority and thus will have only 20 concurrency shares. (~10% of the total)

On the other hand, global-default flow schema is assigned to global-default priority configuration and thus will have 100 concurrency shares (~50% of the total). If I am not mistaken, global-default goes pretty much unused since workloads running with user (not service account) will fall into this category and is not very common.

Workload with service accounts do not have enough concurrency share and may starve. Increase the concurrency share of workload-low from 20 to 100 and reduce that of global-default from 100 to 20.

Another potential solution: maybe when a concurrency pool is starving, it can borrow from another pool that has concurrency shares to spare?

NONE

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/bug Categorizes issue or PR as related to a bug. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Oct 2, 2020
@k8s-ci-robot
Copy link
Contributor

Hi @tkashem. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Oct 2, 2020
@k8s-ci-robot k8s-ci-robot requested review from piosz and sttts October 2, 2020 18:13
@k8s-ci-robot k8s-ci-robot added the kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API label Oct 2, 2020
@tkashem
Copy link
Contributor Author

tkashem commented Oct 2, 2020

I would like your thoughts on it, there might be some scenarios I am missing but it looks like the concurrency share of workload-low is very low compared to global-default.

/assign @MikeSpreitzer @yue9944882

@fejta-bot
Copy link

This PR may require API review.

If so, when the changes are ready, complete the pre-review checklist and request an API review.

Status of requested reviews is tracked in the API Review project.

@tkashem tkashem changed the title allocate service-account flowschema to global-default assign service-account flowschema to global-default priority level Oct 2, 2020
@tkashem tkashem changed the title assign service-account flowschema to global-default priority level [WIP] assign service-account flowschema to global-default priority level Oct 4, 2020
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 4, 2020
@tkashem tkashem changed the title [WIP] assign service-account flowschema to global-default priority level Increase the concurrency share of workload-low priority level Oct 7, 2020
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 7, 2020
@tkashem
Copy link
Contributor Author

tkashem commented Oct 7, 2020

/assign @deads2k @lavalamp

@tkashem
Copy link
Contributor Author

tkashem commented Oct 7, 2020

/retest

1 similar comment
@tkashem
Copy link
Contributor Author

tkashem commented Oct 8, 2020

/retest

Copy link
Member

@MikeSpreitzer MikeSpreitzer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a plausible improvement to me.

When designing API Priority and Fairness, we initially wanted to have some borrowing between priority levels. But it was a bit tricky to agree on a good definition for the behavior. We decided to do something simpler first, and later add that borrowing.

Please also note the following remark from https://github.com/kubernetes/enhancements/blob/master/keps/sig-api-machinery/20190228-priority-and-fairness.md#non-goals :

This KEP does not introduce threading of additional information through webhooks and/or along other paths to support avoidance of priority inversions. While that is an attractive thing to consider in the future, this KEP is deliberately limited in its ambition. The intent for this KEP is to document that for the case of requests that are secondary to some other requests the configuration should identify those secondary requests and give them sufficiently high priority to avoid priority inversion problems. That will necessarily be approximate, and we settle for that now.

Type: flowcontrol.PriorityLevelEnablementLimited,
Limited: &flowcontrol.LimitedPriorityLevelConfiguration{
AssuredConcurrencyShares: 20,
AssuredConcurrencyShares: 100,
Copy link
Contributor

@deads2k deads2k Oct 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The belief is that we have many more of these than we do workload-high clients?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@deads2k global-default matches workloads that are running as non SA accounts. If I am not mistaken they are fewer compared to service-accounts which is basically all workloads (minus workload-high) running inside the cluster with service accounts.

service-accounts workload matches the workload-low priority level.
The concurrency pool of workload-low is 20 whereas that of global-default is 100, This PR swaps the values so service-accounts workloads have more concurrency share.

As far as workload-high is concerned, I have not seen workload-high workloads starve in my testing where we had about 13K Pods running at any time and a high namespace and pod churn rate. It was a 250-node cluster.
At the same time, I recently came across a CI job where workload-high was starving because a certain admission controller OwnerReferencesPermissionEnforcement was taking order of seconds to respond and triggering p&f to throttle workload-high.

Copy link
Member

@yue9944882 yue9944882 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@MikeSpreitzer
Copy link
Member

/LGTM

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 22, 2020
@MikeSpreitzer
Copy link
Member

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Oct 22, 2020
@tkashem
Copy link
Contributor Author

tkashem commented Oct 22, 2020

/retest

1 similar comment
@tkashem
Copy link
Contributor Author

tkashem commented Oct 22, 2020

/retest

@lavalamp
Copy link
Contributor

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: lavalamp, MikeSpreitzer, tkashem

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 22, 2020
@tkashem
Copy link
Contributor Author

tkashem commented Oct 23, 2020

/retest

1 similar comment
@tkashem
Copy link
Contributor Author

tkashem commented Oct 23, 2020

/retest

@tkashem
Copy link
Contributor Author

tkashem commented Oct 23, 2020

/sig api-machinery

@k8s-ci-robot k8s-ci-robot added sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Oct 23, 2020
@tkashem
Copy link
Contributor Author

tkashem commented Oct 23, 2020

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Oct 23, 2020
@tkashem
Copy link
Contributor Author

tkashem commented Oct 23, 2020

/priority important-soon

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note-none Denotes a PR that doesn't merit a release note. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants