Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@jacobweinstock
Copy link
Member

@jacobweinstock jacobweinstock commented Jun 6, 2025

Description

Having retries in both tink-server and tink-agent caused noticeable delays in Workflow Actions running. With retries in tink-server only the delays disappeared. Benchmarking showed improved performance as well. A rudimentary benchmark showed over 1000 concurrent Agents performed fine. This is a not an official benchmark as the backing Kubernetes cluster plays a significant part in the performance and the benchmark was done with a local k3d cluster.

Why is this needed

Fixes: #

How Has This Been Tested?

How are existing users impacted? What migration steps/scripts do we need?

Checklist:

I have:

  • updated the documentation and/or roadmap (if required)
  • added unit or e2e tests
  • provided instructions on how to upgrade

Having retries in both tink-server
and tink-agent caused noticeable delays
in Workflow Actions running. With retries
in tink-server only the delays disappeared.
Benchmarking showed improved performance as well.
A rudimentary benchmark showed close to 2000
concurrent Agents performed fine. This is a
not an official benchmark as the backing Kubernetes
cluster plays a significant part in the performance.

Signed-off-by: Jacob Weinstock <[email protected]>

This comment was marked as outdated.

@codecov
Copy link

codecov bot commented Jun 6, 2025

Codecov Report

Attention: Patch coverage is 33.33333% with 4 lines in your changes missing coverage. Please review.

Project coverage is 46.08%. Comparing base (9c18a1d) to head (e7e9c4f).
Report is 4 commits behind head on main.

Files with missing lines Patch % Lines
tink/server/internal/grpc/grpc.go 0.00% 4 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #196      +/-   ##
==========================================
- Coverage   46.13%   46.08%   -0.05%     
==========================================
  Files         100      100              
  Lines        8439     8410      -29     
==========================================
- Hits         3893     3876      -17     
+ Misses       4308     4297      -11     
+ Partials      238      237       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@jacobweinstock jacobweinstock requested a review from Copilot June 6, 2025 16:26
Signed-off-by: Jacob Weinstock <[email protected]>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR centralizes retry logic in the server by removing client-side retries in the agent and standardizing server backoff behavior.

  • Remove RetryOptions and retry loops from agent transport and tests
  • Update server-side gRPC handlers to use a constant backoff with 1 s intervals and a 1 min max elapsed time
  • Adjust tests to reflect removal of agent-side retry configuration

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
tink/server/internal/grpc/grpc.go Switched to constant backoff and reduced max elapsed time values
tink/agent/internal/transport/grpc/grpc.go Removed client retry logic, related fields, and imports
tink/agent/internal/transport/grpc/grpc_test.go Deleted test retry configurations and backoff import
tink/agent/agent.go Dropped RetryInterval setting
Comments suppressed due to low confidence (1)

tink/server/internal/grpc/grpc.go:61

  • Consider adding unit tests for retry behavior in Handler.GetAction and Handler.ReportActionStatus to verify the configured backoff parameters and ensure transient errors are retried as expected.
if len(h.RetryOptions) == 0 {

This provides a more responsive Workflow
run. Manual scale tests didn't seem to be
affected.

Signed-off-by: Jacob Weinstock <[email protected]>
@jacobweinstock jacobweinstock added the ready-to-merge Signal Mergify to merge the PR label Jun 6, 2025
@mergify mergify bot merged commit b2666a2 into tinkerbell:main Jun 6, 2025
12 checks passed
@jacobweinstock jacobweinstock deleted the rework-backoffs branch June 6, 2025 21:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-to-merge Signal Mergify to merge the PR tink-agent tink-server

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant