Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@wjones127
Copy link
Contributor

Summary

  • Fix vector search using ANN index regardless of query's metric type
  • Query.metric_type is now Option<DistanceType> (None = use index default)
  • If user specifies a metric that doesn't match the index, fall back to flat search
  • Explain plan now shows the metric being used

Test plan

  • Added regression test test_knn_metric_mismatch_falls_back_to_flat_search
  • Added test test_knn_no_metric_uses_index_metric
  • Existing tests pass with updated explain plan expectations

Fixes #5608

🤖 Generated with Claude Code

Previously, vector search would use an ANN index regardless of whether
the index's metric type matched the query's requested metric. This
produced incorrect distances when, for example, an index built with
metric="dot" was used for a query with metric="l2".

Now the scanner checks if the index's metric matches the user's requested
metric. If they don't match, it silently falls back to flat search. If
the user doesn't specify a metric, the index's metric is used.

Changes:
- Query.metric_type is now Option<DistanceType> (None = use index default)
- Scanner checks metric compatibility before using an index
- Explain plan now shows the metric being used
- Java bindings updated to make distanceType optional

Fixes lance-format#5608

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
@github-actions github-actions bot added bug Something isn't working java labels Dec 31, 2025
@github-actions
Copy link
Contributor

Code Review

Summary: This PR fixes a critical bug (#5608) where vector search incorrectly used the ANN index regardless of the query's metric type, returning wrong distances. The fix changes Query.metric_type from DistanceType to Option<DistanceType>, enabling metric compatibility checking before index selection.

No Major Issues Found

The implementation is correct and well-tested. The logic properly:

  1. Falls back to flat search when user-specified metric doesn't match the index
  2. Uses the index's metric when no preference is specified
  3. Resolves to a default metric for flat search when needed

Minor Observations (Non-blocking)

1. Double index open in scanner.rs (lines ~3037-3066)

The index is opened to check metric compatibility, but then _idx is discarded and the index is used again only via matching_index. This is fine since the opened index object isn't needed after getting the metric type, but you might consider documenting why the pattern is structured this way if it's intentional for future extensibility.

2. Java API backward compatibility

The change from required distanceType (defaulting to L2) to Optional<DistanceType> is a semantic change. Previously:

new Query.Builder().setKey(...).build()  // Used L2

Now uses index default or data-type default. This is the intended behavior per the fix, but downstream Java users relying on implicit L2 may see different results if their index uses a different metric. This seems acceptable for a bug fix, but worth noting in release notes.

Tests

Good test coverage with test_knn_metric_mismatch_falls_back_to_flat_search and test_knn_no_metric_uses_index_metric.

LGTM

@wjones127 wjones127 added the critical-fix Bugs that cause crashes, security vulnerabilities, or incorrect data. label Dec 31, 2025
@codecov
Copy link

codecov bot commented Dec 31, 2025

Codecov Report

❌ Patch coverage is 95.45455% with 5 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
rust/lance/src/dataset/scanner.rs 96.96% 0 Missing and 3 partials ⚠️
rust/lance/src/io/exec/knn.rs 77.77% 1 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

When a user specifies a metric that is incompatible with the data type
(e.g., L2 on binary vectors), use the index with its own metric rather
than falling back to flat search which would fail.

The logic now is:
- If metrics match: use the index
- If user metric is incompatible with data type: use the index
- If user metric is compatible but different from index: flat search

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working critical-fix Bugs that cause crashes, security vulnerabilities, or incorrect data. java

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Vector search uses ANN index regardless of metric

1 participant