-
Notifications
You must be signed in to change notification settings - Fork 41.4k
Deflake tests that need to grab metrics from controller-manager or scheduler #101960
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deflake tests that need to grab metrics from controller-manager or scheduler #101960
Conversation
@knight42: This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/test pull-kubernetes-e2e-gce-csi-serial |
/test pull-kubernetes-e2e-gce-ubuntu-containerd |
1 similar comment
/test pull-kubernetes-e2e-gce-ubuntu-containerd |
b9cfbaa
to
a1e1511
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this pod has a well known name, can we run it only when it is necessary?
if metricsProxyPod {
return nil
}
I'm afraid that we can start a precedent with this approach, new components can ask to add more "helpers" pods at the beginning of the e2e,despite most of them doesn't need it
i.e. a Conformance run will create this pod, despite it really doesn't need it IIUIC
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is pod is only required if the tests need to fetch metrics from the scheduler or controller-manager, but it is unclear to me how to know if the tests need to grab metrics here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was assuming this was only used by the metrivs grabber so, I maybe wrongly, assumed that we can do it as part of the initialisation of the MetricsGrabber
// NewMetricsGrabber returns new metrics which are initialized.
func NewMetricsGrabber(c clientset.Interface, ec clientset.Interface, kubelets bool, scheduler bool, controllers bool, apiServer bool, clusterAutoscaler bool) (*Grabber, error) {
SetupMetricsProxy(c)
}
and inside SetupMetricsProxy(), only create the pod if it doesn't exist
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, I think we could do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this doesn't seem to be used later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
line 70 will log the components' name, but if you like, I could remove this field, since we could tell which component it is from the port number.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no no no, I miss that sorry, I thought that wasn't used at all
currently in-tree storage tests for Windows are failing. Seems related to this? https://testgrid.kubernetes.io/google-windows#gce-windows-2019-containerd-master The error is
|
/cc |
/reopen |
@knight42: Reopened this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/priority critical-urgent Would you update the description with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve Let's merge this as an interim improvement while discussing in #102050 what the long-term solution should look like. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: knight42, pacoxu, pohly The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Hmm, even with this PR there was a "MetricsGrabber should grab all metrics from a ControllerManager" failure in https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/pull/101960/pull-kubernetes-e2e-gce-ubuntu-containerd/1397803874319863808/ |
@pohly According to the logs https://storage.googleapis.com/kubernetes-jenkins/pr-logs/pull/101960/pull-kubernetes-e2e-gce-ubuntu-containerd/1397803874319863808/build-log.txt
the scheduler pod showed up with an empty ip, so nginx failed to forward the request. It seems that we have to wait for the ips of both pods to be filled. |
pod.Status.PodIP is not set correctly in the metrics-proxy. |
/hold |
IIRC the WaitForRunningAndReady helper will make it |
What type of PR is this?
/kind cleanup
/kind flake
What this PR does / why we need it:
According to the logs in this test https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/pull/101960/pull-kubernetes-e2e-gce-ubuntu-containerd/1392808375145730048/:
It seems that the scheduler pod had not shown up when we setup e2e test suite, so the nginx config for the forwarder pod was incomplete.
This PR make
SetupMetricsProxy
function wait for desired component pods to show up first.Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: