Codestin Search App

simonswine · 2016-11-18T15:12:25Z

What this PR does / why we need it:

We are using preStop lifecycle hooks to gracefully remove a node from a cluster. This hook is potentially long running and after the preStop hook is fired, the DNS resolution of the soon to be stopped Pod is failing, which causes a failure there.

Special notes for your reviewer:

Would be great to backport that to 1.4, 1.3

Release note:

Endpoints, that tolerate unready Pods, are now listing Pods in state Terminating as well

@bprashanth

This change is

k8s-ci-robot · 2016-11-18T15:59:16Z

Jenkins kops AWS e2e failed for commit c44e81f3a5e3bd7da9358790efeea8bc3d69168b. Full PR test history.

The magic incantation to run this job again is @k8s-bot kops aws e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

k8s-ci-robot · 2016-11-18T16:02:46Z

Jenkins GCI GCE e2e failed for commit c44e81f3a5e3bd7da9358790efeea8bc3d69168b. Full PR test history.

The magic incantation to run this job again is @k8s-bot gci gce e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

k8s-ci-robot · 2016-11-18T16:04:45Z

Jenkins GCE e2e failed for commit c44e81f3a5e3bd7da9358790efeea8bc3d69168b. Full PR test history.

The magic incantation to run this job again is @k8s-bot cvm gce e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

k8s-ci-robot · 2016-11-18T16:05:16Z

Jenkins GCE etcd3 e2e failed for commit c44e81f3a5e3bd7da9358790efeea8bc3d69168b. Full PR test history.

The magic incantation to run this job again is @k8s-bot gce etcd3 e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

bprashanth · 2016-11-18T16:39:14Z

I'm fine doing this, the annotation effectively means: keep all service entries and dns records for the pod around for as long as you possibly can (ignore readiness, ignore deletion grace etc). I don't think it can make 1.5 though, since we're well past the feature freeze date and this isn't a stabilization fix. Maybe we can just fold this behavior into #25283? We need to graduate the tolerate-unready somehow, anyway.

simonswine · 2016-11-18T17:56:57Z

@bprashanth: tbh I just expected that behaviour when I wrote my preStop hooks and I was pretty suprised, when I found out DNS is gone and this is the reason for the balance to fail. So I would see it more as a fix than a bug.

Anyhow I won't be much around for the next weeks, @mattbates can you have a look at this PR

mattbates · 2016-11-25T14:48:30Z

@bprashanth just following up here. Re @simonswine's comment, can we progress this PR as a fix and merge, cherry-picking into 1.3 and 1.4 too ideally?

bprashanth

I'm fine with the pr, it's low risk, I don't know if it's going to make 1.5 because of the timing. It's an alpha feature that we are with high probability going to remodel before beta (#25283), so I don't think we should be cherrypicking it into older releases.

The original annotation was to tolerate "unreadiness". This pr bends the definition of "unreadiness" from "ignore failing readiness probes" to "ignore failing readiness probes AND deletion timestamps". A slightly more correct way to do this would be to "ignore readiness AND deletion timestamps on not ready pods", but given that the feature is going to change soon, I'm not sure the distinction matters.

bprashanth · 2016-11-27T19:46:31Z

 				continue
 			}
-			if pod.DeletionTimestamp != nil {
+			if !tolerateUnreadyEndpoints && pod.DeletionTimestamp != nil {


please augment the comment above the annotation definition with:
// Endpoints of Services bearing this annotation retain their DNS
// records and continue receiving traffic for the Service from the moment
// the kubelet starts all containers in the pod and marks it "Running", till the
// kubelet stops all containers and deletes the pod from the apiserver.

smarterclayton · 2016-12-20T19:19:53Z

Please add a test and then this LGTM. You can add it to test/e2e/services.go under It("should create endpoints for unready pods"). There is an image that can arbitrarily delay shutdown network-tester and a flag -delay-shutdown it takes which is number of seconds to hold before graceful shutdown.

…n state terminating * Otherwise it prevents long running task in a preStop hook to succeed, that require DNS resolution

simonswine · 2017-01-03T13:04:25Z

@smarterclayton thanks for your input on how to prevent test flakes. I ve implemented it as you suggested by modifying the annotation of the service and waiting for the test wget to timeout.

smarterclayton · 2017-01-03T15:21:59Z

LGTM thanks

deads2k · 2017-01-03T17:41:01Z

@k8s-bot test this

k8s-github-robot · 2017-01-03T17:45:23Z

Automatic merge from submit-queue (batch tested with PRs 39092, 39126, 37380, 37093, 39237)

smarterclayton · 2017-01-06T03:07:38Z

After reflecting on this a bit more, I think it should be possible for a consumer to request that terminating pods continue to be in the load balancer rotation independent of the annotation. Many applications that can control their shutdown (like one with a very long graceful shutdown period) to take traffic. I think that's orthogonal to the "ready immediately" setting. It should be possible in shutdown to control when traffic is diverted away, and it's not automatically at the very end of termination, not st the very beginning. I'll spawn an issue, but it may be that tolerate unready becomes a policy (EndpointInclusionPolicy) or a set of orthogonal flags.

simonswine · 2017-01-17T16:29:18Z

@smarterclayton: I think this sounds like that the unready handling should have more states then just true and false before being promoted to a spec field in the Service object. Is there an issue to track this effort?

And another thing, do you think this could be cherry-picked into 1.5 or is this not seen as a bugfix as such? If so I think I can't initiate this as I am not able to add and remove labels

smarterclayton · 2017-01-17T20:29:51Z

I did not open an issue yet.

It's reasonable to backport, tagging.

k8s-cherrypick-bot · 2017-01-17T20:31:34Z

Removing label cherrypick-candidate because no release milestone was set. This is an invalid state and thus this PR is not being considered for cherry-pick to any release branch. Please add an appropriate release milestone and then re-add the label.

…7093-upstream-release-1.5 Automatic merge from submit-queue Automated cherry pick of #37093 Cherry pick of #37093 on release-1.5. #37093: Fix: With TolerateUnready set, endpoints are still listed for

k8s-cherrypick-bot · 2017-01-20T19:42:17Z

Commit found in the "release-1.5" branch appears to be this PR. Removing the "cherrypick-candidate" label. If this is an error find help to get your PR picked.

xiang90 · 2017-07-07T06:57:29Z

+	// create a headless Service just for the StatefulSet, and clients shouldn't
+	// be using this Service for anything so unready endpoints don't matter.
+	// Endpoints of these Services retain their DNS records and continue
+	// receiving traffic for the Service from the moment the kubelet starts all


To make self hosted etcd reliable (an important part of self hosted k8s effort), we want to have the DNS resolvable since pod initialization phase (init container). The current implementation does not prevent us from doing that. Basically, I hope after the pod gets the IP and before the Pod terminates, the DNS can be resolvable.

googlebot added the cla: yes label Nov 18, 2016

k8s-github-robot assigned bprashanth Nov 18, 2016

k8s-github-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. release-note-label-needed labels Nov 18, 2016

simonswine force-pushed the fix-tolerate-unready-endpoints-pods-terminating branch from c44e81f to d282f47 Compare November 18, 2016 16:03

bprashanth reviewed Nov 27, 2016

View reviewed changes

k8s-github-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-label-needed labels Dec 10, 2016

simonswine force-pushed the fix-tolerate-unready-endpoints-pods-terminating branch from d282f47 to e68f748 Compare December 20, 2016 17:53

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Dec 20, 2016

k8s-github-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Dec 20, 2016

simonswine force-pushed the fix-tolerate-unready-endpoints-pods-terminating branch from e68f748 to a92a0d1 Compare December 28, 2016 16:36

k8s-github-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Dec 28, 2016

Fix: With TolerateUnready set, endpoints are still listed for a Pod i…

b44de1e

…n state terminating * Otherwise it prevents long running task in a preStop hook to succeed, that require DNS resolution

simonswine force-pushed the fix-tolerate-unready-endpoints-pods-terminating branch from a92a0d1 to b44de1e Compare January 3, 2017 13:00

k8s-github-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jan 3, 2017

smarterclayton added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 3, 2017

k8s-github-robot merged commit d6dbd50 into kubernetes:master Jan 3, 2017

bprashanth mentioned this pull request Jan 4, 2017

StatefulSet should allow optional burst mode (don't wait for readiness) #39363

Closed

smarterclayton added the cherrypick-candidate label Jan 17, 2017

k8s-cherrypick-bot removed the cherrypick-candidate label Jan 17, 2017

smarterclayton added this to the v1.5 milestone Jan 17, 2017

smarterclayton added the cherrypick-candidate label Jan 17, 2017

simonswine mentioned this pull request Jan 18, 2017

Automated cherry pick of #37093 #40069

Merged

saad-ali added the cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. label Jan 20, 2017

k8s-cherrypick-bot removed the cherrypick-candidate label Jan 20, 2017

xiang90 reviewed Jul 7, 2017

View reviewed changes

Conversation

simonswine commented Nov 18, 2016 • edited by krousey Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

k8s-ci-robot commented Nov 18, 2016

Uh oh!

k8s-ci-robot commented Nov 18, 2016

Uh oh!

k8s-ci-robot commented Nov 18, 2016

Uh oh!

k8s-ci-robot commented Nov 18, 2016

Uh oh!

bprashanth commented Nov 18, 2016

Uh oh!

simonswine commented Nov 18, 2016

Uh oh!

mattbates commented Nov 25, 2016

Uh oh!

bprashanth left a comment

Choose a reason for hiding this comment

Uh oh!

bprashanth Nov 27, 2016

Choose a reason for hiding this comment

Uh oh!

smarterclayton commented Dec 20, 2016

Uh oh!

simonswine commented Jan 3, 2017

Uh oh!

smarterclayton commented Jan 3, 2017

Uh oh!

deads2k commented Jan 3, 2017

Uh oh!

k8s-github-robot commented Jan 3, 2017

Uh oh!

smarterclayton commented Jan 6, 2017

Uh oh!

simonswine commented Jan 17, 2017

Uh oh!

smarterclayton commented Jan 17, 2017

Uh oh!

k8s-cherrypick-bot commented Jan 17, 2017

Uh oh!

k8s-cherrypick-bot commented Jan 20, 2017

Uh oh!

xiang90 Jul 7, 2017

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

11 participants

simonswine commented Nov 18, 2016 •

edited by krousey

Loading