-
Notifications
You must be signed in to change notification settings - Fork 32
Move retries to tink-server: #196
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Having retries in both tink-server and tink-agent caused noticeable delays in Workflow Actions running. With retries in tink-server only the delays disappeared. Benchmarking showed improved performance as well. A rudimentary benchmark showed close to 2000 concurrent Agents performed fine. This is a not an official benchmark as the backing Kubernetes cluster plays a significant part in the performance. Signed-off-by: Jacob Weinstock <[email protected]>
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #196 +/- ##
==========================================
- Coverage 46.13% 46.08% -0.05%
==========================================
Files 100 100
Lines 8439 8410 -29
==========================================
- Hits 3893 3876 -17
+ Misses 4308 4297 -11
+ Partials 238 237 -1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Signed-off-by: Jacob Weinstock <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR centralizes retry logic in the server by removing client-side retries in the agent and standardizing server backoff behavior.
- Remove
RetryOptionsand retry loops from agent transport and tests - Update server-side gRPC handlers to use a constant backoff with 1 s intervals and a 1 min max elapsed time
- Adjust tests to reflect removal of agent-side retry configuration
Reviewed Changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| tink/server/internal/grpc/grpc.go | Switched to constant backoff and reduced max elapsed time values |
| tink/agent/internal/transport/grpc/grpc.go | Removed client retry logic, related fields, and imports |
| tink/agent/internal/transport/grpc/grpc_test.go | Deleted test retry configurations and backoff import |
| tink/agent/agent.go | Dropped RetryInterval setting |
Comments suppressed due to low confidence (1)
tink/server/internal/grpc/grpc.go:61
- Consider adding unit tests for retry behavior in
Handler.GetActionandHandler.ReportActionStatusto verify the configured backoff parameters and ensure transient errors are retried as expected.
if len(h.RetryOptions) == 0 {
fe84a6e to
2ab3836
Compare
This provides a more responsive Workflow run. Manual scale tests didn't seem to be affected. Signed-off-by: Jacob Weinstock <[email protected]>
Description
Having retries in both tink-server and tink-agent caused noticeable delays in Workflow Actions running. With retries in tink-server only the delays disappeared. Benchmarking showed improved performance as well. A rudimentary benchmark showed over 1000 concurrent Agents performed fine. This is a not an official benchmark as the backing Kubernetes cluster plays a significant part in the performance and the benchmark was done with a local k3d cluster.
Why is this needed
Fixes: #
How Has This Been Tested?
How are existing users impacted? What migration steps/scripts do we need?
Checklist:
I have: