Codestin Search App

mosonyi · 2026-03-16T09:08:27Z

Fixes #292

eldara-cruncher · 2026-03-16T09:17:20Z

Audit of PR #294

The approach is sound — comparing error vs running timestamps per slot is the right way to detect failed rolling updates where old replicas mask the failure. A few findings:

1. Duplicated function

activeDeploymentErrorsByService() is copy-pasted identically across views/services/update.go and views/stacks/update.go. Same issue already exists with latestTasksByServiceKey(). Any future bug fix needs to be applied in two places.

2. No time cutoff

The existing error-detection code uses a 5-minute cutoff (newestTaskTime.Add(-5 * time.Minute)) to avoid surfacing stale historical errors. The new function has no cutoff — if a slot has an old failed task from hours/days ago and no running task for that slot, it will be flagged indefinitely.

3. Empty `errMsg` edge case

A task with State == TaskStateFailed but Status.Err == "" will set serviceHasError = true while leaving serviceErrorText as an empty string. The services view guards against this at line 684, but the stacks view path may not.

4. Non-deterministic error selection

if _, seen := result[s.serviceID]; !seen means which slot's error surfaces for a multi-slot service depends on Go map iteration order.

5. No tests

Both latestTasksByServiceKey and activeDeploymentErrorsByService are pure functions — highly testable. Given this is the second fix attempt for the same bug, tests would prevent regression.

I'll open a PR into this branch addressing these points.

Extracts taskKeyForService, latestTasksByServiceKey, and activeDeploymentErrorsByService into views/taskutil so both services and stacks views share one implementation. Fixes: - Add 5-minute cutoff to activeDeploymentErrorsByService to avoid surfacing stale historical errors - Handle empty errMsg on failed tasks (fallback to "task <state>") - Pick most recent error when multiple slots fail for same service - Add unit tests for all three functions Co-authored-by: eldara-cruncher <[email protected]> Co-authored-by: Claude Opus 4.6 (1M context) <[email protected]>

clangenb

LGTM!

Fix showing errors

3c5f145

mosonyi added A2-bugfix B2-high-priority C0-breaks-nothing labels Mar 16, 2026

mosonyi marked this pull request as ready for review March 16, 2026 09:08

mosonyi requested a review from clangenb March 16, 2026 09:10

clangenb approved these changes Mar 16, 2026

View reviewed changes

clangenb merged commit 0b57e35 into main Mar 16, 2026
11 checks passed

eldara-cruncher deleted the zm/fixerrorshow2 branch June 5, 2026 12:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix showing errors#294

Fix showing errors#294
clangenb merged 2 commits into
mainfrom
zm/fixerrorshow2

mosonyi commented Mar 16, 2026

Uh oh!

eldara-cruncher commented Mar 16, 2026

Uh oh!

clangenb left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

mosonyi commented Mar 16, 2026

Uh oh!

eldara-cruncher commented Mar 16, 2026

Audit of PR #294

1. Duplicated function

2. No time cutoff

3. Empty errMsg edge case

4. Non-deterministic error selection

5. No tests

Uh oh!

clangenb left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

3. Empty `errMsg` edge case