Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

alexzielenski
Copy link
Member

@alexzielenski alexzielenski commented Jan 24, 2023

Discovered by flaking TestRemoveAPIServer.

Caused by following sequence:

  1. Add APIService to map
  2. Begin Async Fetch (takes long time in flaking case)
  3. Remove APIService from map
  4. Finish Async Fetch & stores apiservice back in map to update lastReconciled time

fixes by removing lastReconciled (only used for testing) so it would not need to be updated.
and switching tests to just waiting until dirty queue is empty

had to put sleeps in the TestRemoveAPIService to be able to trigger the bug as well as after fetchFreshDiscoveryForService to simulate long-running request, as stress could not repro on my system

What type of PR is this?

/kind bug

What this PR does / why we need it:

Which issue(s) this PR fixes:

flaky test initially discovered in #115230 (comment)

but its also a bug outside of tests

Special notes for your reviewer:

Does this PR introduce a user-facing change?

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/bug Categorizes issue or PR as related to a bug. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Jan 24, 2023
@alexzielenski
Copy link
Member Author

/sig api-machinery
/cc @Jefftree @apelisse

@k8s-ci-robot k8s-ci-robot added sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jan 24, 2023
Copy link
Member Author

@alexzielenski alexzielenski Jan 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was the problematic line of code.

if apiservice was removed (for instance during long-running fetchFreshDiscoveryForService) then this would then add it back erroneously

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is lastReconciled ever set now?

Copy link
Member Author

@alexzielenski alexzielenski Jan 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has been removed. It was only used for tests and replaced by AllServicesSynced

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, great.

@alexzielenski alexzielenski changed the title fix race in aggregated discovery handler fix race in aggregated discovery controller Jan 24, 2023
@apelisse
Copy link
Member

I think that's good, thanks!

@apelisse
Copy link
Member

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 25, 2023
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: aaf23b9733607a1ea9a2d1394dcc19a0ce4d0541

@alexzielenski
Copy link
Member Author

/assign @lavalamp
please look if you have time

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if it is for testing can you make it not exported and/or move it to live in the _test.go file?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would require making the tests whitebox. Wasnt sure of the correct side of the tradeoff to take. Ill change it over then

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's unfortunate but I don't think you want external callers of this as it is.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather have the test use an internal method than a function exported for the sake of an external test?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correct. im moving it over

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed AllServicesSynced and replaced with internal test function

Caused by following sequence:

1. Add APIService to map
2. Begin Async Fetch
3. Remove APIService from map
4. Finish Async Fetch & stores apiservice back in map to update lastReconciled time

fixes by removing lastReconciled (only used for testing) and switching tests to just waiting until dirty queue is empty
@alexzielenski alexzielenski force-pushed the apiserver/discovery/lastreconciled-race branch from 08cb6d3 to cff4d07 Compare January 25, 2023 01:33
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 25, 2023
@k8s-ci-robot k8s-ci-robot requested a review from lavalamp January 25, 2023 01:33
@alexzielenski
Copy link
Member Author

/retest

PTAL again

)

func newDiscoveryManager(rm discoveryendpoint.ResourceManager) *discoveryManager {
return NewDiscoveryManager(rm).(*discoveryManager)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something to think about for a followup: why does this package need both an interface and an implementation struct? Why not just an exported implementation struct?

@lavalamp
Copy link
Contributor

/lgtm
/approve
/retest

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 25, 2023
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 366bcfee710586f5919056696d681f1416554868

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alexzielenski, lavalamp

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 25, 2023
@k8s-triage-robot
Copy link

The Kubernetes project has merge-blocking tests that are currently too flaky to consistently pass.

This bot retests PRs for certain kubernetes repos according to the following rules:

  • The PR does have any do-not-merge/* labels
  • The PR does not have the needs-ok-to-test label
  • The PR is mergeable (does not have a needs-rebase label)
  • The PR is approved (has cncf-cla: yes, lgtm, approved labels)
  • The PR is failing tests required for merge

You can:

/retest

1 similar comment
@k8s-triage-robot
Copy link

The Kubernetes project has merge-blocking tests that are currently too flaky to consistently pass.

This bot retests PRs for certain kubernetes repos according to the following rules:

  • The PR does have any do-not-merge/* labels
  • The PR does not have the needs-ok-to-test label
  • The PR is mergeable (does not have a needs-rebase label)
  • The PR is approved (has cncf-cla: yes, lgtm, approved labels)
  • The PR is failing tests required for merge

You can:

/retest

@alexzielenski
Copy link
Member Author

/retest

@k8s-triage-robot
Copy link

The Kubernetes project has merge-blocking tests that are currently too flaky to consistently pass.

This bot retests PRs for certain kubernetes repos according to the following rules:

  • The PR does have any do-not-merge/* labels
  • The PR does not have the needs-ok-to-test label
  • The PR is mergeable (does not have a needs-rebase label)
  • The PR is approved (has cncf-cla: yes, lgtm, approved labels)
  • The PR is failing tests required for merge

You can:

/retest

1 similar comment
@k8s-triage-robot
Copy link

The Kubernetes project has merge-blocking tests that are currently too flaky to consistently pass.

This bot retests PRs for certain kubernetes repos according to the following rules:

  • The PR does have any do-not-merge/* labels
  • The PR does not have the needs-ok-to-test label
  • The PR is mergeable (does not have a needs-rebase label)
  • The PR is approved (has cncf-cla: yes, lgtm, approved labels)
  • The PR is failing tests required for merge

You can:

/retest

@alexzielenski
Copy link
Member Author

/retest

@k8s-ci-robot k8s-ci-robot merged commit eb4e2a2 into kubernetes:master Jan 26, 2023
@k8s-ci-robot k8s-ci-robot added this to the v1.27 milestone Jan 26, 2023
@cici37
Copy link
Contributor

cici37 commented Jan 26, 2023

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jan 26, 2023
@liggitt
Copy link
Member

liggitt commented Feb 14, 2023

Thanks for the fix ... the test is still flaking on release-1.26 (https://storage.googleapis.com/k8s-triage/index.html?pr=1&test=TestRemoveAPIService)... can we backport the fix if only to stabilize CI?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. release-note-none Denotes a PR that doesn't merit a release note. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants