Move retries to tink-server: #196

jacobweinstock · 2025-06-06T03:26:14Z

Description

Having retries in both tink-server and tink-agent caused noticeable delays in Workflow Actions running. With retries in tink-server only the delays disappeared. Benchmarking showed improved performance as well. A rudimentary benchmark showed over 1000 concurrent Agents performed fine. This is a not an official benchmark as the backing Kubernetes cluster plays a significant part in the performance and the benchmark was done with a local k3d cluster.

Why is this needed

Fixes: #

How Has This Been Tested?

How are existing users impacted? What migration steps/scripts do we need?

Checklist:

I have:

updated the documentation and/or roadmap (if required)
added unit or e2e tests
provided instructions on how to upgrade

Having retries in both tink-server and tink-agent caused noticeable delays in Workflow Actions running. With retries in tink-server only the delays disappeared. Benchmarking showed improved performance as well. A rudimentary benchmark showed close to 2000 concurrent Agents performed fine. This is a not an official benchmark as the backing Kubernetes cluster plays a significant part in the performance. Signed-off-by: Jacob Weinstock <[email protected]>

codecov · 2025-06-06T03:28:42Z

Codecov Report

Attention: Patch coverage is 33.33333% with 4 lines in your changes missing coverage. Please review.

Project coverage is 46.08%. Comparing base (9c18a1d) to head (e7e9c4f).
Report is 4 commits behind head on main.

Files with missing lines	Patch %	Lines
tink/server/internal/grpc/grpc.go	0.00%	4 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #196      +/-   ##
==========================================
- Coverage   46.13%   46.08%   -0.05%     
==========================================
  Files         100      100              
  Lines        8439     8410      -29     
==========================================
- Hits         3893     3876      -17     
+ Misses       4308     4297      -11     
+ Partials      238      237       -1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: Jacob Weinstock <[email protected]>

Copilot

Pull Request Overview

This PR centralizes retry logic in the server by removing client-side retries in the agent and standardizing server backoff behavior.

Remove RetryOptions and retry loops from agent transport and tests
Update server-side gRPC handlers to use a constant backoff with 1 s intervals and a 1 min max elapsed time
Adjust tests to reflect removal of agent-side retry configuration

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File	Description
tink/server/internal/grpc/grpc.go	Switched to constant backoff and reduced max elapsed time values
tink/agent/internal/transport/grpc/grpc.go	Removed client retry logic, related fields, and imports
tink/agent/internal/transport/grpc/grpc_test.go	Deleted test retry configurations and backoff import
tink/agent/agent.go	Dropped `RetryInterval` setting

Comments suppressed due to low confidence (1)

tink/server/internal/grpc/grpc.go:61

Consider adding unit tests for retry behavior in Handler.GetAction and Handler.ReportActionStatus to verify the configured backoff parameters and ensure transient errors are retried as expected.

if len(h.RetryOptions) == 0 {

tink/server/internal/grpc/grpc.go

This provides a more responsive Workflow run. Manual scale tests didn't seem to be affected. Signed-off-by: Jacob Weinstock <[email protected]>

jacobweinstock added tink-server tink-agent labels Jun 6, 2025

jacobweinstock requested a review from Copilot June 6, 2025 03:26

This comment was marked as outdated.

Sign in to view

jacobweinstock requested a review from Copilot June 6, 2025 16:26

Remove unused struct fields:

2ab3836

Signed-off-by: Jacob Weinstock <[email protected]>

Copilot AI reviewed Jun 6, 2025

View reviewed changes

tink/server/internal/grpc/grpc.go Show resolved Hide resolved

tink/server/internal/grpc/grpc.go Show resolved Hide resolved

jacobweinstock force-pushed the rework-backoffs branch from fe84a6e to 2ab3836 Compare June 6, 2025 16:28

Use constant backoff:

e7e9c4f

This provides a more responsive Workflow run. Manual scale tests didn't seem to be affected. Signed-off-by: Jacob Weinstock <[email protected]>

jacobweinstock added the ready-to-merge Signal Mergify to merge the PR label Jun 6, 2025

mergify bot merged commit b2666a2 into tinkerbell:main Jun 6, 2025
12 checks passed

jacobweinstock deleted the rework-backoffs branch June 6, 2025 21:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Move retries to tink-server: #196

Move retries to tink-server: #196

Uh oh!

jacobweinstock commented Jun 6, 2025 •

edited

Loading

Uh oh!

This comment was marked as outdated.

Uh oh!

codecov bot commented Jun 6, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Move retries to tink-server: #196

Move retries to tink-server: #196

Uh oh!

Conversation

jacobweinstock commented Jun 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Why is this needed

How Has This Been Tested?

How are existing users impacted? What migration steps/scripts do we need?

Checklist:

Uh oh!

This comment was marked as outdated.

Uh oh!

codecov bot commented Jun 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jacobweinstock commented Jun 6, 2025 •

edited

Loading

codecov bot commented Jun 6, 2025 •

edited

Loading