Thanks to visit codestin.com
Credit goes to github.com

Skip to content

fix(optimizer): add mid-flight free-disk watchdog (#4297)#8948

Open
MilosM348 wants to merge 1 commit into
qdrant:devfrom
MilosM348:fix/optimizer-ood-watchdog
Open

fix(optimizer): add mid-flight free-disk watchdog (#4297)#8948
MilosM348 wants to merge 1 commit into
qdrant:devfrom
MilosM348:fix/optimizer-ood-watchdog

Conversation

@MilosM348
Copy link
Copy Markdown

Summary

Adds a mid-flight free-disk watchdog to the optimizer's slow path so that an out-of-disk condition between the pre-flight check and the segment-builder write surface fails with a clean "No space left on device:" error instead of crashing inside segment_builder.update / populate_vector_storages / segment_builder.build.

This addresses the open part of #4297 - the up-front-only check_segments_size (added in #4578) is correct, but it is also fundamentally racy against external disk consumers. xhjkl flagged exactly this in the review of #4578:

We still might run into OOD down the line because FS is non-atomic.
(...) some external process can occupy the disk and we can do nothing about it

There is no graceful fix for "the disk filled up while we were holding permit", but we can still detect it before the next slow phase begins and abort the optimization in the same shape as the WAL/insertion path (DiskUsageWatcher).

/claim #4297

What changed

  • check_segments_size now returns the computed space_needed estimate so callers can re-use it for mid-flight checks. Existing call site in execute_optimization is the only consumer.
  • New recheck_free_space helper in lib/shard/src/optimize.rs takes the same temp_path and space_needed and aborts the optimization with a canonical "No space left on device:" error if available space has dropped below the larger of (estimate, 8 MiB safety floor).
  • The watchdog is invoked twice in build_new_segment:
    • after segment_builder.update (i.e. before the HNSW indexing phase that historically blows past the conservative 2x pre-flight estimate when link tables are large),
    • after populate_vector_storages (i.e. immediately before segment_builder.build, which is where ENOSPC has historically surfaced as a panic).
  • The pre-flight error message is also normalized to lead with "No space left on device:" so it is logged in the same shape as the DiskUsageWatcher path on insertion and matches the assertion in tests/e2e_tests/test_low_disk.py:
    python expected_msg = "No space left on device:" assert expected_msg in logs
    Without this normalization, an OOD that trips the optimizer's own pre-flight check (rather than the WAL writer) would log "Not enough space available for optimization" instead, and the e2e assertion would only pass by accident through unrelated WAL log lines.
  • Three new unit tests in disk_watchdog_tests:
    • watchdog_passes_when_disk_has_room: healthy tempdir accepts both None and small estimates.
    • watchdog_fails_when_estimate_exceeds_available: u64::MAX estimate trips the watchdog and the rendered error contains the canonical OOD prefix, the optimizer name, and the temp path (so logs stay diagnostic).
    • watchdog_uses_max_of_estimate_and_safety_buffer: pins OPTIMIZER_DISK_WATCHDOG_BUFFER_BYTES into the safe range [1 MiB, 64 MiB] so future contributors don't accidentally make the buffer either pointless or false-positive-prone.

What this is not

  • It is not a replacement for check_segments_size. The pre-flight check is still the primary guard.
  • It is not a periodic timer. We piggy-back on the existing phase boundaries in build_new_segment rather than spinning up a background task - the maintainer's review on Fail early when encountering out-of-storage during optimization #4578 was explicit that an async watchdog "is fake anyway" on top of FS that already isn't atomic. Two synchronous checks at the obvious phase boundaries match the existing style and add no new threads/locks.
  • It is not a graceful mid-write ENOSPC handler. If a write(2) returns ENOSPC inside the segment builder while we're already mid-phase, the existing error path still applies - the watchdog just shrinks the window in which that can happen.

Why "8 MiB safety floor"

When space_needed is None (estimate failed, e.g. an unreadable segment dir), the watchdog falls back to a small fixed buffer so it isn't completely toothless. 8 MiB matches the smallest reasonable single-vector-storage write that a real optimization will perform and is consistent with DiskUsageWatcher::min_free_disk_size_mb defaults elsewhere in the codebase. The unit test pins this constant into a sane range so future contributors don't drift it.

Risks / things to look at in review

  1. No new dependencies. All used APIs (fs4::available_space, bytes_to_human, OperationError::service_error) were already imported by this file.
  2. No public API changes. check_segments_size, recheck_free_space, and build_new_segment are all crate-private.
  3. Rebased onto dev as required by CONTRIBUTING.md.

Test plan

  • cargo test -p shard --test disk_watchdog_tests (added; passes locally on a healthy fs)
  • tests/e2e_tests/test_low_disk.py::TestLowDisk::test_low_disk_handling[indexing] - requires the docker e2e harness, please run on CI
  • No regression in existing optimizer tests (the watchdog only fires on < required available space, and existing tests run on hosts with plenty of free disk)

Closes #4297 if accepted.


Co-authored-by: Cursor [email protected]

coderabbitai[bot]

This comment was marked as resolved.

@MilosM348 MilosM348 force-pushed the fix/optimizer-ood-watchdog branch from c115ed8 to 79f8d60 Compare May 7, 2026 18:51
@qdrant qdrant deleted a comment from coderabbitai Bot May 8, 2026
The pre-flight check_segments_size only runs once at the start of an
optimization, but the slow phases (segment_builder.update,
populate_vector_storages, segment_builder.build) can take many
minutes. During that window other parallel optimizations, snapshots,
WAL growth, or unrelated processes on the same volume can fill the disk
and crash the segment builder on a raw ENOSPC. xhjkl flagged exactly
this in the review of PR qdrant#4578 (we still might run into OOD down the
line because FS is non-atomic).

Changes
-------

* check_segments_size now returns an OptimizationSpaceEstimate carrying
  both the space_needed estimate AND the precheck-time available bytes,
  so the mid-flight watchdog can enforce headroom rather than the full
  initial estimate (the optimizer itself is expected to consume the
  estimate by design).
* New recheck_free_space helper aborts the optimization with a canonical
  No space left on device: error if available space has dropped below
  max(precheck_available - space_needed, 8 MiB safety floor). Per-IO
  available_space lookup is injectable via recheck_free_space_with for
  testability.
* The watchdog is invoked twice in build_new_segment: once after
  segment_builder.update and once after populate_vector_storages, i.e.
  before the two slow phases that historically exceed the conservative
  2x pre-flight estimate.
* The pre-flight error message is also normalized to lead with
  No space left on device: so it is logged in the same shape as the
  WAL/insertion path (DiskUsageWatcher) and matches the assertion in
  tests/e2e_tests/test_low_disk.py.
* Seven unit tests in disk_watchdog_tests pin the headroom semantics,
  the OOD message format, the statvfs-failure skip behaviour, and the
  one-call-per-checkpoint contract on available_space.

The watchdog only triggers when available space drops below the headroom
the up-front check accepted, and treats fs4::available_space errors as
skip, so neither the optimizer's own writes nor a transient statvfs
failure can abort an otherwise healthy optimization.

Refs: qdrant#4297, qdrant#4578
Co-authored-by: Cursor <[email protected]>
@MilosM348 MilosM348 force-pushed the fix/optimizer-ood-watchdog branch from 79f8d60 to dbf318d Compare May 8, 2026 10:13
@qdrant qdrant deleted a comment from coderabbitai Bot May 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant