fix(datastore): handle commit failures gracefully instead of panicking#572
fix(datastore): handle commit failures gracefully instead of panicking#572TimeToBuildBob wants to merge 4 commits intoActivityWatch:masterfrom
Conversation
When a transaction commit fails (e.g. disk full / SQLITE_FULL), the worker thread panicked, permanently breaking the datastore channel. All subsequent requests returned MpscError (HTTP 500) until restart. Replace the panic with error logging and continue. The rolled-back events will be re-sent by watchers via heartbeat or retried by clients. Add CommitFailed error variant mapped to HTTP 503 (Service Unavailable) so clients know to back off and retry. Fixes ActivityWatch#256
There was a problem hiding this comment.
Important
Looks good to me! 👍
Reviewed everything up to c79081f in 7 seconds. Click for details.
- Reviewed
46lines of code in3files - Skipped
0files when reviewing. - Skipped posting
0draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
Workflow ID: wflow_bzI2nABJ4CysiO40
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
Greptile SummaryThis PR successfully addresses the worker thread panic issue when SQLite commit fails (e.g., Key improvements:
Implementation notes:
About the Confidence Score: 5/5
Important Files Changed
Last reviewed commit: 4c34dd4 |
aw-datastore/src/lib.rs
Outdated
| Uninitialized(String), | ||
| OldDbVersion(String), | ||
| // Database write failure (e.g. disk full) | ||
| CommitFailed(String), |
There was a problem hiding this comment.
The CommitFailed variant is never constructed or returned anywhere in the codebase. In worker.rs:193-203, commit failures are logged but no error is returned to clients since responses are sent before the commit happens (line 177). Either this variant should be removed, or the architecture needs changes to actually use it.
There was a problem hiding this comment.
Good catch. The CommitFailed variant is intentionally kept as defensive API surface — if the architecture changes to propagate commit failures to clients in the future, the error type and HTTP 503 mapping are already in place. Removed the inconsistency though: the legacy import commit now uses the same graceful error handling as the main loop (fixed in 4c34dd4).
There was a problem hiding this comment.
Followed up in 2902485: removed DatastoreError::CommitFailed and its HTTP mapping since that error is never constructed in the current architecture.
Additional Comments (1)
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time! |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #572 +/- ##
==========================================
- Coverage 70.81% 67.98% -2.84%
==========================================
Files 51 54 +3
Lines 2916 3145 +229
==========================================
+ Hits 2065 2138 +73
- Misses 851 1007 +156 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
The main work loop commit (line 193) was already handled gracefully (error log + continue), but the legacy import commit (line 143) still panicked on failure. This makes the error handling consistent. Addresses review feedback from Greptile.
Review Feedback AddressedFixed the inconsistency flagged by Greptile: the legacy import commit handler (line 143) now uses the same graceful error handling as the main work loop (line 193) — logs the error and continues instead of panicking. Changes in 4c34dd4:
On |
aw-datastore/src/worker.rs
Outdated
| // Continue instead of panicking — the rolled-back events will be | ||
| // re-sent by watchers (heartbeats) or retried by clients. |
There was a problem hiding this comment.
Will they? If no CommitFailed is propagated up, how would watchers/clients know?
There was a problem hiding this comment.
Good catch — you're right. The comment was wrong. Clients already received success responses before the commit happens (the worker responds at line 181 in work_loop, commit is at line 197+). So they have no signal that the commit failed and won't retry.
Fixed the comment in 75d5a4c to accurately describe what happens: the worker survives the transient failure, but the batch of events is permanently lost. Watchers will resume from current state on next heartbeat (creating a new event from that point), not re-send the specific lost events.
There was a problem hiding this comment.
Great point — I addressed this by removing the unused CommitFailed path entirely in 2902485.
Since commit failures in work_loop are not propagated to clients (responses are sent before commit), keeping CommitFailed + HTTP 503 mapping was misleading. The PR now keeps behavior explicit: log-and-continue in the worker, with no fake API surface implying client-visible commit-failure signaling.
I also re-reviewed the rest of the PR with this model in mind.
|
@greptileai review |
|
@TimeToBuildBob Please review this carefully and address #572 (comment) |
Follow-up on review thread
|
This seems really bad, would mean it silently (to the client) swallows requests without successful persistence. As you said:
This seems potentially very problematic. |
|
Crashing the total app is very systemd-friendly, but would it work out well when the server is started from QT? |
Summary
panic!on transaction commit failure witherror!+continue, preventing the worker thread from dying permanentlyCommitFailederror variant toDatastoreErrormapped to HTTP 503 (Service Unavailable)When disk is full, SQLite returns
SQLITE_FULLon commit. The old code panicked, killing the worker thread. Since the HTTP layer communicates with the worker via mpsc, all subsequent requests would fail withMpscError(HTTP 500) until server restart.Now the transaction is rolled back (via
Drop), the error is logged, and the worker continues to the next loop iteration. Watchers naturally re-send events via heartbeat, so data loss is minimal.Test plan
cargo check— compiles cleanly (only pre-existing warnings)cargo test --lib— all 37 tests passFixes #256
Important
Fixes transaction commit failure handling in
worker.rsto prevent worker thread panic and addsCommitFailederror variant mapped to HTTP 503.panic!witherror!andcontinueon transaction commit failure inwork_loop()inworker.rs, preventing worker thread termination.CommitFailederror variant toDatastoreErrorinlib.rs, mapped to HTTP 503 inendpoints/util.rs.SQLITE_FULLinwork_loop()inworker.rs.CommitFailedto HTTP 503 inFrom<DatastoreError> for HttpErrorJsoninendpoints/util.rs.This description was created by
for c79081f. You can customize this summary. It will automatically update as commits are pushed.