Codestin Search App

shashidharatd · 2016-09-23T07:24:48Z

What this PR does / why we need it: Fixes a memory leak

Which issue this PR fixes (optional, in fixes #<issue number>(, #<issue_number>, ...) format, will close that issue when PR gets merged): fixes #33186

Special notes for your reviewer: Every second new goroutines are created and are getting blocked waiting for the lock in the event queue. only one worker will get a lock when there are some events to process, so all the goroutines which are created every second waits for the lock forever and causes the memory/goroutine leak.

As a fix the new worker will be created only when there is no worker exist. and only one worker per cluster either waits for the event or processes all the events and goes out of existence.

Fixes memory/goroutine leak in Federation Service controller.

This change is

k8s-ci-robot · 2016-09-23T07:24:49Z

Can a kubernetes member verify that this patch is reasonable to test? If so, please reply with "@k8s-bot ok to test" on its own line.

Regular contributors should join the org to skip this step.

While we transition away from the Jenkins GitHub PR Builder plugin, "ok to test" commenters will need to be on the admin list defined in this file.

googlebot · 2016-09-23T07:24:50Z

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed, please reply here (e.g. I signed it!) and we'll verify. Thanks.

If you've already signed a CLA, it's possible we don't have your GitHub username or you're using a different email address. Check your existing CLA data and verify that your email is set on your git commits.
If you signed the CLA as a corporation, please let us know the company's name.

shashidharatd · 2016-09-23T07:37:41Z

I signed it!

googlebot · 2016-09-23T07:37:45Z

CLAs look good, thanks!

madhusudancs

This PR drives home the urgency to move the federated service controller to the new FederatedInformer architecture.

madhusudancs · 2016-09-23T07:51:08Z

 func (sc *ServiceController) clusterEndpointWorker() {
-	fedClient := sc.federationClient
+	// process all pending events in endpointWorkerDoneChan
+	eventPending := true


What's the point of this variable? It's not really useful. Just get rid of it and change the loop to

for { ... }

the break statement in default case will only break the select loop, and the goroutine will get stuck in for loop forever, so we need this variable to come out of for loop

Oops, you are right. This isn't a common idiom which is why it threw me off the guard.

What you have is fine, but if you want to stay with a commonly used idiom, please see https://golang.org/ref/spec#Break_statements.

madhusudancs · 2016-09-23T08:01:41Z

+	}
+
 	for clusterName, cache := range sc.clusterCache.clientMap {
+		workerExist, keyFound := sc.endpointWorkerMap[clusterName]


Change keyFound to found.

madhusudancs · 2016-09-23T08:02:21Z

+		if keyFound && workerExist {
+			continue
+		}
+		sc.endpointWorkerMap[clusterName] = true


Set this after starting the goroutine, just to be safe.

madhusudancs · 2016-09-23T08:04:24Z

-	fedClient := sc.federationClient
+	// process all pending events in serviceWorkerDoneChan
+	eventPending := true
+	for eventPending {


Same comment as above.

madhusudancs · 2016-09-23T08:04:48Z

+	}
+
 	for clusterName, cache := range sc.clusterCache.clientMap {
+		workerExist, keyFound := sc.serviceWorkerMap[clusterName]


Same as above. Change to found.

madhusudancs · 2016-09-23T08:18:18Z

 		go func(cache *clusterCache, clusterName string) {
+			fedClient := sc.federationClient
 			for {
-				func() {


This function is important here.

Notice the defer statement inside this block. This func() exists for that sole reason. At the end of each loop, we need to say we are done for that key, so we defer cache.endpointQueue.Done(key). If you remove the func() {} wrapping, defer will be run at the end of the enclosing goroutine and the keys won't be removed from the queue until then. So I don't think you should be removing the func() {} wrapping.

agreed, missed that, will change

madhusudancs · 2016-09-23T08:18:30Z

-					if err != nil {
-						glog.Errorf("Failed to sync service: %+v", err)
-					}
-				}()


Same comment as above.

madhusudancs · 2016-09-23T08:20:03Z

 	KubeAPIQPS    = 20.0
 	KubeAPIBurst  = 30
+
+	maxNoOfClusters = 256


We have informally talked about this before but we have never established this formally. We plan to support 100 clusters initially. 256 seems too high to start with.

sure will change that to 100

…ontroller

shashidharatd · 2016-09-23T09:46:29Z

@madhusudancs handled the review comments in the second commit, plz check

madhusudancs · 2016-09-23T19:20:30Z

@k8s-bot federation gce e2e test this

madhusudancs · 2016-09-24T05:10:56Z

@k8s-bot federation gce e2e test this

k8s-ci-robot · 2016-09-24T06:15:44Z

Jenkins Federation GCE e2e failed for commit 690a06b. Full PR test history.

The magic incantation to run this job again is @k8s-bot federation gce e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

madhusudancs · 2016-09-24T06:53:19Z

Federation tests that are failing are known failures. These changes seem to not have caused any regressions, so LGTM'ing the PR.

@shashidharatd thanks for these changes!

madhusudancs · 2016-09-24T06:56:40Z

1.4.0 train has already left, so this must be cherry-picked to v1.4.1+.

k8s-github-robot · 2016-09-24T07:53:44Z

@k8s-bot test this [submit-queue is verifying that this PR is safe to merge]

k8s-ci-robot · 2016-09-24T08:09:01Z

Jenkins GKE smoke e2e failed for commit 690a06b. Full PR test history.

The magic incantation to run this job again is @k8s-bot gke e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

k8s-github-robot · 2016-09-24T08:30:27Z

Automatic merge from submit-queue

shashidharatd · 2016-09-24T15:44:58Z

Thanks @madhusudancs

ghost · 2016-09-24T17:02:18Z

Added cherrypick-candidate label. As @madhusudancs mentions above, this can wait for v1.4.1. Not sure what labelling/milestone convention we're using to represent that.

jessfraz · 2016-10-06T00:46:21Z

@shashidharatd can you open the PR to cherry-pick this into the release-1.4 branch

madhusudancs · 2016-10-06T03:36:58Z

@jessfraz are we not doing automated cherry-picks any more?

#33163-#33227-#33359-#33605-#33967-#33977-#34158-origin-release-1.4 Automatic merge from submit-queue Automated cherry pick of #32914 #33163 #33227 #33359 #33605 #33967 #33977 #34158 origin release 1.4 Cherry pick of #32914 #33163 #33227 #33359 #33605 #33967 #33977 #34158 on release-1.4. #32914: Limit the number of names per image reported in the node #33163: fix the appending bug #33227: remove cpu limits for dns pod. The current limits are not #33359: Fix goroutine leak in federation service controller #33605: Add periodic ingress reconciliations. #33967: scheduler: cache.delete deletes the pod from node specified #33977: Heal the namespaceless ingresses in federation e2e. #34158: Add missing argument to log message in federated ingress

k8s-cherrypick-bot · 2016-10-06T23:59:30Z

Commit found in the "release-1.4" branch appears to be this PR. Removing the "cherrypick-candidate" label. If this is an error find help to get your PR picked.

…ck-of-#32914-kubernetes#33163-kubernetes#33227-kubernetes#33359-kubernetes#33605-kubernetes#33967-kubernetes#33977-kubernetes#34158-origin-release-1.4 Automatic merge from submit-queue Automated cherry pick of kubernetes#32914 kubernetes#33163 kubernetes#33227 kubernetes#33359 kubernetes#33605 kubernetes#33967 kubernetes#33977 kubernetes#34158 origin release 1.4 Cherry pick of kubernetes#32914 kubernetes#33163 kubernetes#33227 kubernetes#33359 kubernetes#33605 kubernetes#33967 kubernetes#33977 kubernetes#34158 on release-1.4. kubernetes#32914: Limit the number of names per image reported in the node kubernetes#33163: fix the appending bug kubernetes#33227: remove cpu limits for dns pod. The current limits are not kubernetes#33359: Fix goroutine leak in federation service controller kubernetes#33605: Add periodic ingress reconciliations. kubernetes#33967: scheduler: cache.delete deletes the pod from node specified kubernetes#33977: Heal the namespaceless ingresses in federation e2e. kubernetes#34158: Add missing argument to log message in federated ingress

Fix goroutine leak in federation service controller

d8ff487

googlebot added the cla: no label Sep 23, 2016

k8s-github-robot assigned madhusudancs Sep 23, 2016

k8s-github-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. release-note-label-needed labels Sep 23, 2016

googlebot added cla: yes and removed cla: no labels Sep 23, 2016

madhusudancs suggested changes Sep 23, 2016

View reviewed changes

Handle review comments for Fix goroutine leak in federation service c…

690a06b

…ontroller

k8s-github-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Sep 23, 2016

madhusudancs added lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-label-needed labels Sep 24, 2016

madhusudancs added this to the v1.4 milestone Sep 24, 2016

k8s-github-robot mentioned this pull request Sep 24, 2016

[k8s.io] Downward API volume should update labels on modification [Conformance] {E2eNode Suite} #33423

Closed

k8s-github-robot merged commit 46c36fc into kubernetes:master Sep 24, 2016

ghost added the cherrypick-candidate label Sep 24, 2016

jessfraz added the cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. label Oct 6, 2016

jessfraz mentioned this pull request Oct 6, 2016

Automated cherry pick of #32914 #33163 #33227 #33359 #33605 #33967 #33977 #34158 origin release 1.4 #34266

Merged

k8s-cherrypick-bot removed the cherrypick-candidate label Oct 6, 2016

shashidharatd deleted the federation branch October 19, 2016 13:17

Conversation

shashidharatd commented Sep 23, 2016 • edited by madhusudancs Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

k8s-ci-robot commented Sep 23, 2016

Uh oh!

googlebot commented Sep 23, 2016

Uh oh!

shashidharatd commented Sep 23, 2016

Uh oh!

googlebot commented Sep 23, 2016

Uh oh!

madhusudancs left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

shashidharatd commented Sep 23, 2016

Uh oh!

madhusudancs commented Sep 23, 2016

Uh oh!

madhusudancs commented Sep 24, 2016

Uh oh!

k8s-ci-robot commented Sep 24, 2016

Uh oh!

madhusudancs commented Sep 24, 2016

Uh oh!

madhusudancs commented Sep 24, 2016

Uh oh!

k8s-github-robot commented Sep 24, 2016

Uh oh!

k8s-ci-robot commented Sep 24, 2016

Uh oh!

k8s-github-robot commented Sep 24, 2016

Uh oh!

shashidharatd commented Sep 24, 2016

Uh oh!

ghost commented Sep 24, 2016

Uh oh!

jessfraz commented Oct 6, 2016

Uh oh!

madhusudancs commented Oct 6, 2016

Uh oh!

k8s-cherrypick-bot commented Oct 6, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

shashidharatd commented Sep 23, 2016 •

edited by madhusudancs

Loading