fix: prevent self-reinforcing error loop in deployment creation#3860
Open
vcode-sh wants to merge 2 commits intoDokploy:canaryfrom
Open
fix: prevent self-reinforcing error loop in deployment creation#3860vcode-sh wants to merge 2 commits intoDokploy:canaryfrom
vcode-sh wants to merge 2 commits intoDokploy:canaryfrom
Conversation
When deployment creation fails (e.g. transient DB/network error during removeLastTenDeployments), the catch block writes an error deployment record with logPath: "". On the next deploy attempt, cleanup tries to delete this record, runs `rm -f ` with an empty path, fails, and the cycle repeats — permanently blocking deployments. Changes: - Make removeDeployment() idempotent (return undefined instead of throwing when already deleted, guard against empty/invalid logPath) - Fix copy-paste error message "Error creating" → "Error removing" - Wrap each deployment removal in removeLastTenDeployments() and removeLastFiveDeployments() in individual try-catch so cleanup of old records never blocks creation of new deployments - Use logPath: "none" instead of "" in error deployment records to prevent path.join() producing "." which bypasses guards Fixes Dokploy#3752 Co-Authored-By: Claude Opus 4.6 <[email protected]>
Co-Authored-By: Claude Opus 4.6 <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #3752
A transient failure during
removeLastTenDeployments()(called at the start of everycreateDeployment()) creates a permanent, self-reinforcing error loop that prevents all future deployments for the affected application.Root Cause
createDeployment()callsremoveLastTenDeployments()firstlogPath: ""path.join("")returns".", so thelogPath !== "."guard is bypassedremoveLastTenDeployments()tries to clean up this poisoned recordremoveDeployment()runsrm -f .(orrm -fwith empty string) which failslogPath: ""Changes
1. Make
removeDeployment()idempotentundefinedinstead of throwing when a deployment is already deleted (race condition / concurrent cleanup)logPathbefore running shellrm -fcommand"Error creating the deployment"→"Error removing the deployment"2. Make
removeLastTenDeployments()resilientconsole.error(including deployment ID) but don't propagateexecAsyncRemoteagainst empty command strings3. Make
removeLastFiveDeployments()resilientlogPath !== "."andlogPath !== "none"guards (consistent withremoveLastTenDeployments)4. Fix poisoned
logPath: ""in error recordslogPath: ""tologPath: "none"in all catch blocks of:createDeployment()createDeploymentPreview()createDeploymentCompose()createDeploymentBackup()createDeploymentSchedule()createDeploymentVolumeBackup()removeDeployment()logPath guard to also skip"none"How to Reproduce
removeLastTenDeployments()(e.g. DB connection timeout)logPath: ""Verification
pnpm checkpasses (Biome formatting/linting)pnpm --filter=server typecheckpasses (TypeScript)removeDeploymentreturn type changes fromDeploymenttoDeployment | undefined— no callers use the return value from cleanup pathsTest Plan
pnpm checkpassespnpm --filter=server typecheckpasseslogPath: ""orlogPath: "none", verify next deployment succeeds instead of loopingremoveDeploymentthrowing when already deletedGreptile Summary
Fixed critical self-reinforcing error loop that permanently blocked deployments after transient failures.
Key Changes:
logPath: ""tologPath: "none"to preventpath.join("")returning".", which bypassed safety guards and causedrm -f .commands to failremoveDeployment()idempotent by returningundefinedinstead of throwing when deployment already deleted (handles race conditions in concurrent cleanup)removeLastTenDeployments()andremoveLastFiveDeployments()- each deployment removal is wrapped in try-catch, logged on failure, but doesn't block cleanup of other deployments".", or"none"paths before executing shell commandsexecAsyncRemote()when command string is emptyImpact:
The fix prevents a scenario where a single transient failure during deployment cleanup creates a poisoned database record that causes all subsequent deployments to fail, with the error count growing on each attempt. The changes make the system self-healing by ensuring cleanup operations never block new deployments.
API Note:
The
deployment.removeDeploymenttRPC endpoint now returnsDeployment | undefinedinstead of throwing when the deployment doesn't exist. This makes the endpoint idempotent and the frontend handles this correctly.Confidence Score: 5/5
Last reviewed commit: bf1f1cf
(3/5) Reply to the agent's comments like "Can you suggest a fix for this @greptileai?" or ask follow-up questions!