Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Fix showing errors#294

Merged
clangenb merged 2 commits into
mainfrom
zm/fixerrorshow2
Mar 16, 2026
Merged

Fix showing errors#294
clangenb merged 2 commits into
mainfrom
zm/fixerrorshow2

Conversation

@mosonyi
Copy link
Copy Markdown
Collaborator

@mosonyi mosonyi commented Mar 16, 2026

Fixes #292

@mosonyi mosonyi marked this pull request as ready for review March 16, 2026 09:08
@mosonyi mosonyi requested a review from clangenb March 16, 2026 09:10
@eldara-cruncher
Copy link
Copy Markdown
Collaborator

Audit of PR #294

The approach is sound — comparing error vs running timestamps per slot is the right way to detect failed rolling updates where old replicas mask the failure. A few findings:

1. Duplicated function

activeDeploymentErrorsByService() is copy-pasted identically across views/services/update.go and views/stacks/update.go. Same issue already exists with latestTasksByServiceKey(). Any future bug fix needs to be applied in two places.

2. No time cutoff

The existing error-detection code uses a 5-minute cutoff (newestTaskTime.Add(-5 * time.Minute)) to avoid surfacing stale historical errors. The new function has no cutoff — if a slot has an old failed task from hours/days ago and no running task for that slot, it will be flagged indefinitely.

3. Empty errMsg edge case

A task with State == TaskStateFailed but Status.Err == "" will set serviceHasError = true while leaving serviceErrorText as an empty string. The services view guards against this at line 684, but the stacks view path may not.

4. Non-deterministic error selection

if _, seen := result[s.serviceID]; !seen means which slot's error surfaces for a multi-slot service depends on Go map iteration order.

5. No tests

Both latestTasksByServiceKey and activeDeploymentErrorsByService are pure functions — highly testable. Given this is the second fix attempt for the same bug, tests would prevent regression.

I'll open a PR into this branch addressing these points.

Extracts taskKeyForService, latestTasksByServiceKey, and
activeDeploymentErrorsByService into views/taskutil so both
services and stacks views share one implementation.

Fixes:
- Add 5-minute cutoff to activeDeploymentErrorsByService to
  avoid surfacing stale historical errors
- Handle empty errMsg on failed tasks (fallback to "task <state>")
- Pick most recent error when multiple slots fail for same service
- Add unit tests for all three functions

Co-authored-by: eldara-cruncher <[email protected]>
Co-authored-by: Claude Opus 4.6 (1M context) <[email protected]>
Copy link
Copy Markdown
Collaborator

@clangenb clangenb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@clangenb clangenb merged commit 0b57e35 into main Mar 16, 2026
11 checks passed
@eldara-cruncher eldara-cruncher deleted the zm/fixerrorshow2 branch June 5, 2026 12:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Wrong interpretation of error

3 participants