fix(template-commit): enforce requestID uniqueness and add idempotent commit resue#336
fix(template-commit): enforce requestID uniqueness and add idempotent commit resue#336fslongjin wants to merge 3 commits into
Conversation
… commit reuse - Validate requestID is non-empty in HTTP handler and service layer - Add unique (request_id, operation) index handling for COMMIT jobs - Reuse existing job on duplicate request payload; reject on payload drift - Introduce commit-specific job record builder and duplicate-key detection Signed-off-by: jinlong <[email protected]>
|
Claude finished @fslongjin's task in 1m 10s —— View job Code Review: enforce requestID uniqueness and add idempotent commit reuseOverviewThis PR closes #332 by enforcing Review Findings1.
|
| Category | Count | Details |
|---|---|---|
| 1 | Missing ErrTemplateAttemptInProgress error mapping (should be 400, currently 500) |
|
| 2 | Error swallowing in fingerprint, test coverage gaps | |
| ✅ Strengths | 4 | Concurrent idempotency design, duplicate key detection, legacy migration, clean separation |
The PR is well-structured and the core idempotency design is sound. The main actionable finding is the missing error mapping in the HTTP handler (#1).
| Branch: fix-template-commit-duplicate-key
Treat cubelet's success code as ErrorCode_Success, run commit jobs asynchronously, and preserve actionable failure messages so successful snapshots are not recorded as failed jobs. Signed-off-by: jinlong <[email protected]>
|
Claude finished @fslongjin's task in 1m 19s —— View job
Progress
Incremental Review: Commit
|
| Change | File:Line | Assessment |
|---|---|---|
Asynchronous job dispatch (go runTemplateCommitJob(...)) |
template_commit.go:155 |
✅ Matches SubmitTemplateImage() pattern; prevents HTTP handler from blocking |
| RPC timeout for CommitSandbox (5 min) | template_commit.go:33 |
✅ Prevents hung cubelet from leaving job stuck in SNAPSHOTTING forever |
| Panic recovery in job goroutine | template_commit.go:177-188 |
✅ Logs stack trace and marks job FAILED |
buildCommitFailureMessage with detailed error messages |
template_commit.go:326-345 |
✅ Never returns empty string; handles nil response, nil Ret, whitespace-only RetMsg |
| Cleanup RPC timeout (1 min) | template_commit.go:37,245-251 |
✅ Prevents cleanup path from hanging |
Test for buildCommitFailureMessage |
template_commit_test.go:109-162 |
✅ Covers nil, non-success codes, whitespace-only messages |
Previously flagged issues — status update
1. ‼️ Missing ErrTemplateAttemptInProgress error mapping — STILL UNFIXED
template_commit.go:129-136 still only maps ErrTemplateIDRequired, ErrDuplicateTemplate, and ErrTemplateStoreNotInitialized. The ErrTemplateAttemptInProgress error (returned from SubmitTemplateCommit lines 77, 99, 135) falls through to ErrorCode_MasterInternalError (500) instead of ErrorCode_MasterParamsError (400).
The sibling handler in template.go:107-108 already handles this correctly:
case errors.Is(err, templatecenter.ErrTemplateAttemptInProgress):
code = int(errorcode.ErrorCode_MasterParamsError)2. Warning: Error swallowing in buildCommitTemplateSpecFingerprint — STILL PRESENT
template_image.go:1965-1969:
func buildCommitTemplateSpecFingerprint(req *types.CreateCubeSandboxReq) string {
payload, _ := marshalTemplateCommitJobRequest(req) // error discarded
sum := sha256.Sum256([]byte(payload))
return hex.EncodeToString(sum[:])
}If marshaling fails, the fingerprint becomes SHA256 of empty string, which could cause false-positive collisions. A log line on error would mitigate this without changing the signature.
3. Test coverage gaps — PARTIALLY ADDRESSED
The new TestBuildCommitFailureMessageNeverEmpty is good. However, the following scenarios remain untested:
- Idempotent reuse (same requestID + same payload → returns existing job)
- Payload drift rejection (same requestID + different payload →
ErrTemplateAttemptInProgress) - Active job conflict (pending/running job for template)
- Duplicate key race fallback (concurrent writer path at
template_commit.go:126-139)
These require integration-level testing with a database and are likely tracked separately.
New findings in this commit
4. ✅ Clean separation of concerns
The error message construction in buildCommitFailureMessage and the new buildCommitFailureMessage test are well-structured. The function correctly handles all edge cases (nil response, nil Ret, empty RetMsg, whitespace-only RetMsg).
5. ✅ Idempotent path returns full job info
When an existing job is reused (lines 73-76, 95-98, 131-134), the function returns GetTemplateImageJobInfo(ctx, jobID) on line 145, which provides the caller with the current job status — not just a confirmation of reuse. This is correct behavior.
6. ⚠️ buildCommitFailureMessage vs buildCommitFailureMessage test
The test function TestBuildCommitFailureMessageNeverEmpty at template_commit_test.go:113 has a minor inconsistency: wantParts uses partial string matching (strings.Contains), but the test failure message uses t.Fatalf which aborts on the first failing sub-case — hiding results of remaining cases. Preferring t.Errorf or subtests (t.Run) would surface all failures in one run. This is a minor testing style point.
Summary
| Category | Count | Details |
|---|---|---|
| 2 | Missing error mapping, error swallowing in fingerprint | |
| ✅ Fixed in this commit | 1 | Cubelet success code check (!= 0 → ErrorCode_Success) |
| ✅ Strengths added | 5 | Async dispatch, timeouts, panic recovery, error message quality, tests |
| 1 | t.Fatalf hides remaining sub-case failures |
The new commit addresses the root cause of #332 and adds several important hardening measures (timeouts, panic recovery, async dispatch). The two remaining items from the previous review (error mapping, fingerprint error swallowing) are pre-existing and should be addressed in a follow-up.
…commit - Extract error code mapping into`commitTemplateErrorCode` function, adding support for`ErrTemplateAttemptInProgress` - Refactor fingerprint generation to use request snapshot instead of stored request, reducing data dependency - Update corresponding tests to cover new error cases and fingerprint logic Signed-off-by: jinlong <[email protected]>
|
Claude finished @fslongjin's task in 1m 27s —— View job Review: enforce requestID uniqueness and add idempotent commit reuseProgress
Previous review findings — status update
New review findings✅ Robust concurrent idempotency designThe "check-then-act with duplicate-key fallback" pattern in
The ✅
|
| Category | Count | Details |
|---|---|---|
| ✅ Previously fixed | 2 | Error mapping (ErrTemplateAttemptInProgress → 400), fingerprint error swallowing |
| 1 | t.Fatalf style in table-driven tests |
|
| ✅ Strengths | 5 | Concurrent idempotency, duplicate key detection, legacy migration, async dispatch with timeouts/panic recovery, error message quality |
| 2 | Ordering dependency for attempt calculation, broad "1062" substring match |
The PR is well-structured and addresses the root cause of issue #332 (the != 0 vs != ErrorCode_Success bug) along with robust idempotent job reuse. The previous review's two actionable issues have both been resolved. No blocking issues remain.
|
Thanks for the review. The actionable items from the previous AI review have been addressed in commit What changed:
Validation performed:
I also deployed the current branch to the target test machine and verified the user-facing behavior:
The migration concurrency concern and deeper DB-backed scenarios (idempotent reuse, duplicate-key race fallback, legacy migration disambiguation) are valid broader coverage items, but they require integration-level database tests and are intentionally not folded into this focused PR fix. |

Close: #332