Thanks to visit codestin.com
Credit goes to github.com

Skip to content

fix(session): prevent data loss with atomic writes and corrupt-file repair#3312

Merged
Re-bin merged 3 commits into
HKUDS:mainfrom
aiguozhi123456:fix/atomic-session-save
Apr 19, 2026
Merged

fix(session): prevent data loss with atomic writes and corrupt-file repair#3312
Re-bin merged 3 commits into
HKUDS:mainfrom
aiguozhi123456:fix/atomic-session-save

Conversation

@aiguozhi123456

@aiguozhi123456 aiguozhi123456 commented Apr 19, 2026

Copy link
Copy Markdown
Contributor

Summary

Session saves are now atomic and corrupt session files are automatically repaired on load. A crash during save (OOM kill, disk full, SIGKILL) no longer destroys conversation history.

What changed:

  • save() writes to a .jsonl.tmp file then atomically replaces the target via os.replace(), matching the pattern already used in qq.py. On failure the temp file is cleaned up, leaving the previous version intact.
  • New _repair() method recovers valid JSONL lines from partially-written files by skipping malformed lines instead of discarding the entire session.
  • _load() falls back to _repair() before returning None, so a truncated file yields a session with surviving messages rather than a blank conversation.

Why os.replace over open("w"): The old open(path, "w") truncates the file before writing begins. If the process dies between truncation and the final write(), the file is left empty or partial. os.replace is an atomic POSIX operation that swaps the file pointer in a single filesystem step. The same pattern is already in use at qq.py:675.

Design decisions:

  • Temp file lives in the same directory as the target (guarantees same filesystem for os.replace).
  • BaseException catch in save() covers KeyboardInterrupt and SystemExit, not just Exception.
  • _repair() tolerates missing metadata and bad timestamps, rebuilding defaults for anything it cannot parse.

Tests: tests/agent/test_session_atomic.py covers atomic save correctness, temp-file cleanup on failure, and repair of truncated, corrupt, and mixed-validity JSONL files (12 tests). All 45 session-related tests pass.


Built with OpenCode
Compound Engineering
HARNESS

SessionManager.save() previously used bare open("w") which could
truncate the JSONL file if the process crashed mid-write. Now writes
to a .tmp file and atomically replaces via os.replace(), matching the
pattern already used in qq.py.

_load() now attempts _repair() before returning None, recovering
valid lines from partially-written files. 12 new tests cover atomic
save correctness, temp-file cleanup on failure, and repair of
truncated/corrupt JSONL.

cowork-with:opencode(glm-5.1)
@aiguozhi123456 aiguozhi123456 changed the title fix(session): atomic writes and corrupt-file repair fix(session): prevent data loss with atomic writes and corrupt-file repair Apr 19, 2026

@Re-bin Re-bin left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the right kind of fix.

The intent is narrow, the boundary is clear, and the atomic-write change is the right way to prevent truncation-driven session loss.

During review I found one missing read-side path: the PR originally repaired corrupt files in SessionManager._load(), but read_session_file() and list_sessions() still treated the same corrupt JSONL as unreadable, which meant WebUI / HTTP views could still behave as if the session had disappeared. I patched that gap, added focused regression coverage in tests/agent/test_session_atomic.py, and pushed the fix back to this PR branch (b490fd87).

I pulled in the latest origin/main and reran both targeted and full tests locally:

  • python -m pytest tests/agent/test_session_atomic.py tests/agent/test_session_delete.py tests/agent/test_session_manager_history.py tests/agent/test_unified_session.py -q -> 53 passed
  • python -m pytest -q -> 2139 passed

From my side, this is now clean and ready to merge.

@Re-bin Re-bin merged commit 56a779c into HKUDS:main Apr 19, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants