Verify snapshot integrity with checksum in shard snapshot transfer (Fixes #3372)#8765
Open
Artur-Sulej wants to merge 1 commit into
Open
Verify snapshot integrity with checksum in shard snapshot transfer (Fixes #3372)#8765Artur-Sulej wants to merge 1 commit into
Artur-Sulej wants to merge 1 commit into
Conversation
During shard snapshot transfer, the received .snapshot file was restored without verifying its checksum. A corrupted transfer could silently produce a broken shard. - Add `checksum` parameter to `recover_shard_snapshot_from_url` - Capture `snapshot_description.checksum` on the sender side and forward it in the gRPC `RecoverShardSnapshotRequest` to the receiver - The receiver already verifies the checksum in `recover_shard_snapshot` when a non-None value is provided Fixes qdrant#3372
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Verify snapshot integrity with checksum in shard snapshot transfer
Fixes #3372
Problem
During shard snapshot transfer, the
.snapshotfile received by the remote node was restored without any integrity check. A corrupted or truncated transfer would silently produce a broken shard with no indication of failure.The checksum infrastructure was already fully in place —
SnapshotDescriptionincludes a SHA256 checksum computed at snapshot creation time, and the receiver-siderecover_shard_snapshotfunction already has checksum verification logic (used by user-facing snapshot recovery). The checksum simply wasn't being passed through the transfer call chain.Solution
Thread the snapshot checksum from the sender through to the receiver:
lib/collection/src/shards/remote_shard.rs— Added achecksum: Option<String>parameter torecover_shard_snapshot_from_urland forwarded it into theRecoverShardSnapshotRequestgRPC message (thechecksumfield already exists in the proto definition).lib/collection/src/shards/transfer/snapshot.rs— On the sender side, capturedsnapshot_description.checksum(populated during snapshot creation) and passed it torecover_shard_snapshot_from_url.The receiver already handles a non-
Nonechecksum inrecover_shard_snapshot: it computes the SHA256 of the downloaded file and compares it, returning abad_inputerror on mismatch. No changes were needed on the receiver side.Scope
SnapshotDescription, and is now forwarded to the receiver for verification.snapshot_checksumstaysNoneand behavior is identical to before.All Submissions:
devbranch. Did you create your branch fromdev?New Feature Submissions:
cargo +nightly fmt --allcommand prior to submission?cargo clippy --workspace --all-featurescommand?Changes to Core Features: