Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Fix TestAggregatedAPIServer flake from cleanup ordering#137336

Open
Jefftree wants to merge 1 commit intokubernetes:masterfrom
Jefftree:fix-wardle-test-cleanup-ordering
Open

Fix TestAggregatedAPIServer flake from cleanup ordering#137336
Jefftree wants to merge 1 commit intokubernetes:masterfrom
Jefftree:fix-wardle-test-cleanup-ordering

Conversation

@Jefftree
Copy link
Member

@Jefftree Jefftree commented Mar 2, 2026

What type of PR is this?

/kind flake

What this PR does / why we need it:

Fixes a test flake caused by incorrect cleanup ordering. During teardown, defer os.RemoveAll(wardleCertDir) ran before t.Cleanup(cancel) (Go runs all defers before any t.Cleanup functions), so the wardle server's cert files were deleted while it was still running. The fsnotify file watcher would then fail trying to re-watch the deleted files.

The fix uses defer cancel() and t.Cleanup(os.RemoveAll) so the context is cancelled first (stopping the server) before files are removed.

from the log below:

I0223 16:12:33.409391   40862 dynamic_serving_content.go:195] "Failed to remove file watch, it may have been deleted" file="/tmp/test-integration-wardle-server2347956986/apiserver.key" err="fsnotify: can't remove non-existent watch: /tmp/test-integration-wardle-server2347956986/apiserver.key"
...
E0223 16:12:33.409701   40862 dynamic_serving_content.go:221] "Unhandled Error" err="key failed with : open /tmp/test-integration-wardle-server2347956986/apiserver.key: no such file or directory" logger="UnhandledError"
...
E0223 16:12:33.413012   40862 dynamic_serving_content.go:144] "Failed to watch cert and key file, will retry later" err="error adding watch for file /tmp/test-integration-wardle-server2347956986/apiserver.key: no such file or directory"

Not 100% sure this will fix the flake, but I think it improves our edge case handling.

Which issue(s) this PR is related to:

Fixes #137207

Special notes for your reviewer:

N/A

Does this PR introduce a user-facing change?

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

NONE

@k8s-ci-robot
Copy link
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note-none Denotes a PR that doesn't merit a release note. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. kind/flake Categorizes issue or PR as related to a flaky test. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Mar 2, 2026
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the needs-priority Indicates a PR lacks a `priority/foo` label and requires one. label Mar 2, 2026
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Jefftree
Once this PR has been reviewed and has the lgtm label, please assign msau42 for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Mar 2, 2026
@k8s-ci-robot k8s-ci-robot requested review from SataQiu and wojtek-t March 2, 2026 03:59
@k8s-ci-robot k8s-ci-robot added area/test sig/testing Categorizes an issue or PR as relevant to SIG Testing. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Mar 2, 2026
@Jefftree Jefftree marked this pull request as ready for review March 2, 2026 15:11
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 2, 2026
During test teardown, defer os.RemoveAll(wardleCertDir) ran before
t.Cleanup(cancel), deleting the wardle server's certificate files while
the server was still running. The fsnotify file watcher would then fail
trying to re-add watches on the deleted files.

Fix the ordering by using defer cancel() (runs in the defer phase) and
t.Cleanup for os.RemoveAll (runs after all defers). This ensures the
context is cancelled first, signaling the wardle server to shut down,
before its certificate files are removed.
@Jefftree
Copy link
Member Author

Jefftree commented Mar 2, 2026

/assign @BenTheElder

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/flake Categorizes issue or PR as related to a flaky test. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. release-note-none Denotes a PR that doesn't merit a release note. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

TestAggregatedAPIServer/WithWardleFeatureGateAtV1.0 flake

3 participants