-
Notifications
You must be signed in to change notification settings - Fork 41.4k
Make similar buckets for api and etcd request duration histogram #94134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make similar buckets for api and etcd request duration histogram #94134
Conversation
Make similar buckets for the apiserver_request_duration_seconds and the etcd_request_duration_seconds histogram so that the result is more comparable side by side. etcd_request_duration_seconds uses the default buckets provided by prometheus client library: DefBuckets = []float64{.005, .01, .025, .05, .1, .25, .5, 1, 2.5, 5, 10} apiserver_request_duration_seconds on the other hand uses more fine grained buckets, and the maximum bucket size is 60s. Both histograms should use similar bucket sizes so they are more comparable side by side.
Hi @tkashem. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/assign @brancz |
/assign @wojtek-t |
/assign @logicalhan |
+1 this makes a lot of sense to me, thanks for doing this @tkashem |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/cc @brancz
(since we're changing buckets)
/ok-to-test |
hey 👋 afaik @brancz is currently still vacationing :-) |
I like this a lot. We have latencies on our dashboards for the API server and etcd too but didn't try to correlate those just yet. If we can make this change, this will become a lot easier indeed. 💯 |
Looks great! This would be very handy (: |
/approve also from my side 👍 (not sure if that approval works or not) |
Looks good from instrumentation side. /lgtm |
/assign @lavalamp |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: s-urbaniak, tkashem, wojtek-t The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/lgtm |
What type of PR is this?
/kind bug
What this PR does / why we need it:
etcd_request_duration_seconds
uses the default buckets provided by prometheus client library.DefBuckets = []float64{.005, .01, .025, .05, .1, .25, .5, 1, 2.5, 5, 10}
The maximum bucket size is
10s
. On the other hand,apiserver_request_duration_seconds
uses more fine grained bucket sizes and the maximum bucket size is60s
The left panel shows latency for
Deployment-DELETE
api (metric=apiserver_request_duration_seconds
), this is taking about40s
to complete. On the other hand, etcd latency (metric=etcd_request_duration_seconds
) for the same objectapps.Deployment-delete
is capped at10s
. Now the difference in latency is hard to account for. It cloud be latency from ectd but we can't answer this question by looking at the metrics.If the etcd metric has similar bucket sizes, we could account for the difference in latency.
This PR makes the bucket sizes for both metrics similar. Also, no existing bucket for
etcd_request_duration_seconds
was dropped.Does this PR introduce a user-facing change?:
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: