perf: improve SQ query speed #5596

BubbleCal · 2025-12-30T11:22:16Z

create_hnsw_sq(100000x128)
                        time:   [7.1499 s 7.1644 s 7.1840 s]
                        change: [-1.1794% -0.9172% -0.6107%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 2 outliers among 10 measurements (20.00%)
  1 (10.00%) high mild
  1 (10.00%) high severe

search_hnsw_sq100000x128
                        time:   [253.49 µs 253.87 µs 254.24 µs]
                        change: [-3.6161% -3.4660% -3.3038%] (p = 0.00 < 0.05)
                        Performance has improved.

github-actions · 2025-12-30T11:24:03Z

Code Review

P1: Inconsistent optimization in `distance_all()` method

The PR optimizes distance() by pre-computing the scale factor and using direct multiplication (dist * self.scale), but the distance_all() method at lines 446-468 (in the new file) still uses inverse_scalar_dist() which allocates a Vec and recomputes the scale on every call.

For consistency and to get the full performance benefit, distance_all() should also use the pre-computed self.scale field:

fn distance_all(&self, _k_hint: usize) -> Vec<f32> {
    match self.storage.distance_type {
        DistanceType::L2 | DistanceType::Cosine => self
            .storage
            .chunks
            .iter()
            .flat_map(|c| {
                c.sq_codes
                    .values()
                    .chunks_exact(c.dim())
                    .map(|sq_codes| l2_distance_uint_scalar(sq_codes, &self.query_sq_code) * self.scale)
            })
            .collect(),
        DistanceType::Dot => self
            .storage
            .chunks
            .iter()
            .flat_map(|c| {
                c.sq_codes
                    .values()
                    .chunks_exact(c.dim())
                    .map(|sq_codes| dot_distance(sq_codes, &self.query_sq_code) * self.scale)
            })
            .collect(),
        _ => panic!("We should not reach here: sq distance can only be L2 or Dot"),
    }
}

This would:

Eliminate the per-call scale computation in inverse_scalar_dist
Avoid the intermediate Vec allocation from collect_vec() inside inverse_scalar_dist
Maintain consistency between the two distance calculation paths

Otherwise the changes look good - the dot_u8_scalar optimization with unrolling and the pre-computed scale factor are solid performance improvements.

codecov · 2025-12-30T12:12:20Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Signed-off-by: BubbleCal <[email protected]>

BubbleCal added 2 commits December 30, 2025 16:05

Optimize SQ distance and u8 dot

2ab62b3

Add SQ HNSW benchmark

37a6df0

github-actions bot added the performance label Dec 30, 2025

BubbleCal added 2 commits December 30, 2025 19:25

Revert u8 dot changes

a02b95d

Optimize SQ distance_all scaling

f5ca19b

fmt

ae1ca69

Signed-off-by: BubbleCal <[email protected]>

Xuanwo approved these changes Dec 30, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf: improve SQ query speed #5596

perf: improve SQ query speed #5596

Uh oh!

BubbleCal commented Dec 30, 2025

Uh oh!

github-actions bot commented Dec 30, 2025

Uh oh!

codecov bot commented Dec 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

perf: improve SQ query speed #5596

Are you sure you want to change the base?

perf: improve SQ query speed #5596

Uh oh!

Conversation

BubbleCal commented Dec 30, 2025

Uh oh!

github-actions bot commented Dec 30, 2025

Code Review

P1: Inconsistent optimization in distance_all() method

Uh oh!

codecov bot commented Dec 30, 2025

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

P1: Inconsistent optimization in `distance_all()` method