Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

sspaink
Copy link
Contributor

@sspaink sspaink commented Apr 18, 2025

Why the changes in this PR are needed?

resolve: #7492

As described in the issue above, if a status API is slow to respond it can cause OPA to be blocked writing to the bulkBundleCh.

What are the changes in this PR?

@mjungsbluth thank you for the suggested change 🥳

Changed the channels to be buffered channels. Reusing the same logic from the decision log event buffer, updated the UpdateBundleStatus and BulkUpdateBundleStatus methods to add to the buffered channel while never blocking. If the buffered channel is full the oldest status update is dropped. In case that dropped spot is taken by another concurrent call to BulkUpdateBundleStatus, try again 1000 times until dropping the incoming status event. Never should block.

The limit is configurable with a new option buffer_status_limit, defaults to 10. Not sure if there is a better default, if the status API is slow probably better not overwhelm it by default?

… blocking

If a status API is slow to respond it can cause OPA to be blocked writing to an unbuffered channel. This fixes it by using a buffered channel that never blocks but drops the oldest status update if full.

Signed-off-by: sspaink <[email protected]>
Copy link

netlify bot commented Apr 18, 2025

Deploy Preview for openpolicyagent ready!

Name Link
🔨 Latest commit 5d7225f
🔍 Latest deploy log https://app.netlify.com/sites/openpolicyagent/deploys/680817205e3e4500087cbfe9
😎 Deploy Preview https://deploy-preview-7522--openpolicyagent.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

Copy link
Contributor

@johanfylling johanfylling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

Just some nits.


// make sure the lastBundleStatuses has been written so the trigger sends the expected status
// otherwise there could be a race condition before the bundle status is written
time.Sleep(10 * time.Millisecond)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could fail in CI because of an extremely slow agent. Might not be an issue, though.
If this becomes an issue, we could revisit and try to wait for the right condition rather than sleeping. Maybe by looping in the below goroutine until len(fixture.plugin.bulkBundleCh) => 1 (with an extremely short sleep per iteration) before calling Trigger() (see e.g. test.Eventually()).

Copy link
Contributor Author

@sspaink sspaink Apr 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately the race detector doesn't like test.Eventually also checking the lastBundleStatuses variable, not sure how to resolve it without adding a lock 🤔 Could have p.loop accept a function to set channels so we can swap it in the test with one that has a mutex and one that doesn't.

Copy link
Contributor Author

@sspaink sspaink Apr 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just pushed a commit with a possible alternative: 5afa765

removed needing the p.loop and just does the same steps but within the test, so we don't have to worry about the uncertainty of another routine. I think it overall still tests the same thing? I split the tests because just the second half needs this. not having a sleep in a test would help me sleep better at night haha

Copy link
Contributor

@johanfylling johanfylling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! 👍

go func() {
_ = fixture.plugin.Trigger(ctx)
_ = fixture.plugin.Trigger(context.Background())
}()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might not see the full picture here, but I don't understand why we need this trigger routine and can't simply call fixture.plugin.oneShot(context.Background()) in the other routine. I think the test asserts what it's supposed to assert though, so won't hold this PR up because of this nit.

@johanfylling johanfylling merged commit d65888c into open-policy-agent:main Apr 23, 2025
70 of 74 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Slow Status Service API Delays Bundle Activation
3 participants