-
Notifications
You must be signed in to change notification settings - Fork 41.4k
Deflake tests in staging/src/k8s.io/kube-aggregator/pkg/apiserver
#115859
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
in a super quick look, I can observe that you have getters to the controller cache, maybe that can give you a way to confirm that all items were processed ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While this blocks until all items in the queue have synced, I don't think it is the right general solution. waitForEmptyQueue
needs to be modified to wait for queue to stop processing, otherwise the other tests that use waitForEmptyQueue
will still be affected
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so too. I am trying to solve this with more general way. See #115859 (comment)
I think the way (using cache) looks not clear for now. Actually, I am struggling to solve this issue with more general way. (the current fix looks just a hack for me) I feel that there are a few workarounds,
3 looks better approach for me... what do you think? |
applied 3. One concern is that I am not sure it is ok to add a method to |
There are some other implementations for |
added completerWorkQueue to check if the workqueue is complete for testing purpose. |
f2d6989
to
8a2bc2a
Compare
waitForQueueComplete
to check if the workqueue is complete
added a test that you suggested and changed the commit message. |
/lgtm |
LGTM label has been added. Git tree hash: 815e55710d8c07d51424a4e0f0fc2b08200cb486
|
diff --git a/staging/src/k8s.io/kube-aggregator/pkg/apiserver/handler_discovery_test.go b/staging/src/k8s.io/kube-aggregator/pkg/apiserver/handler_discovery_test.go
index 9bd48972404..c07993779d7 100644
--- a/staging/src/k8s.io/kube-aggregator/pkg/apiserver/handler_discovery_test.go
+++ b/staging/src/k8s.io/kube-aggregator/pkg/apiserver/handler_discovery_test.go
@@ -18,6 +18,7 @@ package apiserver
import (
"context"
"net/http"
"net/http/httptest"
"strconv"
@@ -44,12 +45,14 @@ func newDiscoveryManager(rm discoveryendpoint.ResourceManager) *discoveryManager
return NewDiscoveryManager(rm).(*discoveryManager)
}
-// Returns true if the queue of services to sync empty this means everything has
-// been reconciled and placed into merged document
-func waitForEmptyQueue(stopCh <-chan struct{}, dm *discoveryManager) bool {
+// Returns true when it reachs the number of the cached services
+func waitForCacheNumber(stopCh <-chan struct{}, n int, dm *discoveryManager) bool {
return cache.WaitForCacheSync(stopCh, func() bool {
// Once items have successfully synced they are removed from queue.
- return dm.dirtyAPIServiceQueue.Len() == 0
+ dm.resultsLock.Lock()
+ defer dm.resultsLock.Unlock()
+ return len(dm.cachedResults) == n
})
}
@@ -63,7 +66,6 @@ func TestBasic(t *testing.T) {
service2.SetGroups(apiGroup2.Items)
aggregatedResourceManager := discoveryendpoint.NewResourceManager()
aggregatedManager := newDiscoveryManager(aggregatedResourceManager)
-
for _, g := range apiGroup1.Items {
for _, v := range g.Versions {
aggregatedManager.AddAPIService(&apiregistrationv1.APIService{
@@ -103,7 +105,7 @@ func TestBasic(t *testing.T) {
go aggregatedManager.Run(testCtx.Done())
- require.True(t, waitForEmptyQueue(testCtx.Done(), aggregatedManager))
+ require.True(t, waitForCacheNumber(testCtx.Done(), 2, aggregatedManager))
response, _, parsed := fetchPath(aggregatedResourceManager, "")
if response.StatusCode != 200 {
@@ -159,7 +161,9 @@ func TestDirty(t *testing.T) {
defer cancel()
go aggregatedManager.Run(testCtx.Done())
- require.True(t, waitForEmptyQueue(testCtx.Done(), aggregatedManager))
+ require.True(t, waitForCacheNumber(testCtx.Done(), 1, aggregatedManager))
+ // time.Sleep(1 * time.Second)
+ fmt.Println("POST", aggregatedManager.apiServices, aggregatedManager.cachedResults)
// immediately check for ping, since Run() should block for local services
if !pinged.Load() {
@@ -211,7 +215,7 @@ func TestRemoveAPIService(t *testing.T) {
aggregatedManager.RemoveAPIService(s.Name)
}
- require.True(t, waitForEmptyQueue(testCtx.Done(), aggregatedManager))
+ require.True(t, waitForCacheNumber(testCtx.Done(), 0, aggregatedManager))
response, _, parsed := fetchPath(aggyService, "")
if response.StatusCode != 200 {
@@ -293,7 +297,7 @@ func TestLegacyFallback(t *testing.T) {
defer cancel()
go aggregatedManager.Run(testCtx.Done())
- require.True(t, waitForEmptyQueue(testCtx.Done(), aggregatedManager))
+ require.True(t, waitForCacheNumber(testCtx.Done(), 1, aggregatedManager))
// At this point external services have synced. Check if discovery document
// includes the legacy resources
@@ -362,7 +366,7 @@ func TestNotModified(t *testing.T) {
// Important to wait here to ensure we prime the cache with the initial list
// of documents in order to exercise 304 Not Modified
- require.True(t, waitForEmptyQueue(testCtx.Done(), aggregatedManager))
+ require.True(t, waitForCacheNumber(testCtx.Done(), 1, aggregatedManager))
// Now add all groups. We excluded one group before so that AllServicesSynced
// could include it in this round. Now, if AllServicesSynced ever returns
@@ -373,7 +377,7 @@ func TestNotModified(t *testing.T) {
}
// This would wait the full timeout on 1.26.0.
- require.True(t, waitForEmptyQueue(testCtx.Done(), aggregatedManager))
+ require.True(t, waitForCacheNumber(testCtx.Done(), 1, aggregatedManager))
} I tried the cache approach and looks easier than the queue
|
@aojea while that is a good approach I think if we in the future added a test for updating a pre-existing entry (which I'm now likely to add very soon) your alternative not be sufficient if I'm reading correctly? |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: alexzielenski, gjkim42, liggitt The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
looks like there are verify errors to resolve /hold |
This uses atomic.Bool as updating and reading a boolean-type variable concurrently is not thread-safe.
`waitForEmptyQueue` cannot guarantee that all items in the queue have been synced completely but guarantee that all items have been started. This adds `waitForQueueComplete` and implements `completerWorkqueue` to check if the workqueue is complete to deflake the tests in staging/src/k8s.io/kube-aggregator/pkg/apiserver.
c40fa3e
to
e24e3de
Compare
That is because #115770 has been merged which introduces a new test to fix. Now, I rebased this PR and fixed the newly created test as well.
Could you re-LGTM on this? |
/lgtm |
LGTM label has been added. Git tree hash: 50f28cfc357d258afd1c1b1a552e24aa905c1da1
|
/hold cancel |
@alexzielenski it was just an alternative because I saw your comment in the workqueue complete, but I don't have strong opinion, the current option LGTM too |
What type of PR is this?
/kind failing-test
/kind flake
What this PR does / why we need it:
This PR addresses two issues in
handler_discovery_test.go
.discoveryManager
completes the sync on all items in the queue.More details on issue 2 below:
waitForEmptyQueue
cannot guarantee thatdiscoveryManager
completes the sync on all items in the queue. It just guarantee that all items have been started to sync.This PR adds
waitForQueueComplete
and implementscompleterWorkqueue
to check if the workqueue is complete.Which issue(s) this PR fixes:
Fixes #115858
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: