Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

mnorris11
Copy link

@mnorris11 mnorris11 commented Aug 18, 2025

Summary:
Description

Adds InvertedListScanner for IVFPQFastScan. Right now it does not support it because the codes are PQFS packed.

Summary of changes

  • Update HeapHandler in simd_result_handlers.h to return number of heap updates nup. Also add a bunch of junk aka io_simd_dis and io_simd_ids handling. More on that below.
  • Move make_knn_handler to the IndexIVFFastScan.h declaration so it can be used outside just IndexIVFFastScan.cpp. It is moved out of the anonymous namespace in the .cpp file too.
  • Add the IVFPQFastScanScanner to IndexIVFPQFastScan. It takes in SearchParams where the void* context is a struct containing max heap size when using this scanner (aka k). This impl for scan_codes is directly translated from search_implem_10 in in IndexIVFPQFastScan, because both search_implem_10 and scan_codes process 1 query and 1 invlist at a time (technically, it processes n queries at a time, just not batched. It is a simple loop over n, so it can be factored out to 1 query for this Scanner easily.). The only changes are removal of extraneous stuff from search_implem_10.
  • Changes to test_lowlevel_ivf.cpp to test this. Refactored it because it does not behave like other scanners, as it requires io_simd_dis and io_simd_ids handling.

Why do we need this io_simd_dis and io_simd_ids?

Other scanners can scan over multiple lists with set_query then set_list, then calling scan_codes. Why? Because they take the float* distances. The heap can operate on these floats to re-use values from last iteration.

IVFPQFastScan is different. Codes are pq4 packed. The HeapHandler has to get initialized with values from scanning the prior list.

If the user is only setting 1 list and 1 query, it does not matter. But if they want to use like the other Scanners, they can follow the example in test_lowlevel_ivf.cpp.

We can implement a scanner based on the search_implem_10. We can't do distance_to_code but we can do the whole scan. (that is what is done in this diff in the new IVFPQFastScanScanner.scan_codes).

void IndexIVFFastScan::search_implem_10(
idx_t n,
const float* x,
SIMDResultHandlerToFloat& handler,
const CoarseQuantized& cq,
size_t* ndis_out,
size_t* nlist_out,
const NormTableScaler* scaler,
const IVFSearchParameters* params) const {
size_t dim12 = ksub * M2;
AlignedTable<uint8_t> dis_tables;
AlignedTable<uint16_t> biases;
std::unique_ptr<float[]> normalizers(new float[2 * n]);
compute_LUT_uint8(n, x, cq, dis_tables, biases, normalizers.get());
bool single_LUT = !lookup_table_is_3d();
size_t ndis = 0, nlist_visited = 0;
int qmap1[1];
handler.q_map = qmap1;
handler.begin(skip & 16 ? nullptr : normalizers.get());
size_t nprobe = cq.nprobe;
for (idx_t i = 0; i < n; i++) {
const uint8_t* LUT = nullptr;
qmap1[0] = i;
if (single_LUT) {
LUT = dis_tables.get() + i * dim12;
}
for (idx_t j = 0; j < nprobe; j++) {
size_t ij = i * nprobe + j;
if (!single_LUT) {
LUT = dis_tables.get() + ij * dim12;
}
if (biases.get()) {
handler.dbias = biases.get() + ij;
}
idx_t list_no = cq.ids[ij];
if (list_no < 0) {
continue;
}
size_t ls = invlists->list_size(list_no);
if (ls == 0) {
continue;
}
InvertedLists::ScopedCodes codes(invlists, list_no);
InvertedLists::ScopedIds ids(invlists, list_no);
handler.ntotal = ls;
handler.id_map = ids.get();
pq4_accumulate_loop(
1,
roundup(ls, bbs),
bbs,
M2,
codes.get(),
LUT,
handler,
scaler);
ndis += ls;
nlist_visited++;
}
}
handler.end();
*ndis_out = ndis;
*nlist_out = nlist_visited;
}

Differential Revision: D80114737

@meta-cla meta-cla bot added the CLA Signed label Aug 18, 2025
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D80114737

mnorris11 pushed a commit to mnorris11/faiss that referenced this pull request Aug 18, 2025
Summary:

Description
-
Adds InvertedListScanner for IVFPQFastScan. Right now it does not support it because the codes are PQFS packed. This is required from Unicorn to simplify the integration layer.

Summary of changes
-
- Update HeapHandler in simd_result_handlers.h to return number of heap updates `nup`. Also add a bunch of junk aka `io_simd_dis` and `io_simd_ids` handling. More on that below.
- Move `make_knn_handler` to the IndexIVFFastScan.h declaration so it can be used outside just IndexIVFFastScan.cpp. It is moved out of the anonymous namespace in the .cpp file too.
- Add the IVFPQFastScanScanner to IndexIVFPQFastScan. It takes in SearchParams where the void* context is a struct containing max heap size when using this scanner (aka `k`). This impl for scan_codes is directly translated from search_implem_10 in in IndexIVFPQFastScan, because both search_implem_10 and scan_codes process 1 query and 1 invlist at a time. The only changes are removal of extraneous stuff from search_implem_10.
- Changes to test_lowlevel_ivf.cpp to test this. Refactored it because it does not behave like other scanners, as it requires `io_simd_dis` and `io_simd_ids` handling.

Why do we need this io_simd_dis and io_simd_ids?
-
Other scanners can scan over multiple lists with `set_query` then `set_list`, then calling `scan_codes`. Why? Because they take the `float* distances`. The heap can operate on these floats to re-use values from last iteration.

IVFPQFastScan is different. Codes are pq4 packed. The HeapHandler has to get initialized with values from scanning the prior list.

If the user is only setting 1 list and 1 query, it does not matter. But if they want to use like the other Scanners, they can follow the example in test_lowlevel_ivf.cpp.

We can implement a scanner based on the search_implem_10. We can't do distance_to_code but we can do the whole scan. (that is what is done in this diff in the new `IVFPQFastScanScanner.scan_codes`).

https://github.com/facebookresearch/faiss/blob/514b44fca8542bafe8640adcbf1cccce1900f74c/faiss/IndexIVFFastScan.cpp#L918-L992

Differential Revision: D80114737
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D80114737

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D80114737

mnorris11 pushed a commit to mnorris11/faiss that referenced this pull request Aug 18, 2025
Summary:

Description
-
Adds InvertedListScanner for IVFPQFastScan. Right now it does not support it because the codes are PQFS packed. This is required from Unicorn to simplify the integration layer.

Summary of changes
-
- Update HeapHandler in simd_result_handlers.h to return number of heap updates `nup`. Also add a bunch of junk aka `io_simd_dis` and `io_simd_ids` handling. More on that below.
- Move `make_knn_handler` to the IndexIVFFastScan.h declaration so it can be used outside just IndexIVFFastScan.cpp. It is moved out of the anonymous namespace in the .cpp file too.
- Add the IVFPQFastScanScanner to IndexIVFPQFastScan. It takes in SearchParams where the void* context is a struct containing max heap size when using this scanner (aka `k`). This impl for scan_codes is directly translated from search_implem_10 in in IndexIVFPQFastScan, because both search_implem_10 and scan_codes process 1 query and 1 invlist at a time. The only changes are removal of extraneous stuff from search_implem_10.
- Changes to test_lowlevel_ivf.cpp to test this. Refactored it because it does not behave like other scanners, as it requires `io_simd_dis` and `io_simd_ids` handling.

Why do we need this io_simd_dis and io_simd_ids?
-
Other scanners can scan over multiple lists with `set_query` then `set_list`, then calling `scan_codes`. Why? Because they take the `float* distances`. The heap can operate on these floats to re-use values from last iteration.

IVFPQFastScan is different. Codes are pq4 packed. The HeapHandler has to get initialized with values from scanning the prior list.

If the user is only setting 1 list and 1 query for the lifetime of this scanner, it does not matter because there's no need to use prior results. But if they want to use like the other Scanners, they can follow the example in test_lowlevel_ivf.cpp.

We can implement a scanner based on the search_implem_10. We can't do distance_to_code but we can do the whole scan. (that is what is done in this diff in the new `IVFPQFastScanScanner.scan_codes`).

https://github.com/facebookresearch/faiss/blob/514b44fca8542bafe8640adcbf1cccce1900f74c/faiss/IndexIVFFastScan.cpp#L918-L992

Differential Revision: D80114737
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D80114737

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D80114737

mnorris11 pushed a commit to mnorris11/faiss that referenced this pull request Aug 21, 2025
Summary:

Description
-
Adds InvertedListScanner for IVFPQFastScan. Right now it does not support it because the codes are PQFS packed. This is required from Unicorn to simplify the integration layer.

We can implement a scanner based on the search_implem_10. We can't do distance_to_code but we can do the whole scan. (that is what is done in this diff in the new `IVFPQFastScanScanner.scan_codes`).

Summary of changes
-
- Update HeapHandler in simd_result_handlers.h to return number of heap updates `nup`. Also add a bunch of junk aka `io_simd_dis` and `io_simd_ids` handling. More on that below.
- Move `make_knn_handler` to the IndexIVFFastScan.h declaration so it can be used outside just IndexIVFFastScan.cpp. It is moved out of the anonymous namespace in the .cpp file too.
- Add the IVFPQFastScanScanner to IndexIVFPQFastScan. It takes in SearchParams where the void* context is a struct containing max heap size when using this scanner (aka `k`). This impl for scan_codes is directly translated from search_implem_10 in in IndexIVFPQFastScan, because both search_implem_10 and scan_codes process 1 query and 1 invlist at a time (technically, it processes `n` queries at a time, just not batched. It is a simple loop over `n`, so it can be factored out to 1 query for this Scanner easily.). The only changes are removal of extraneous stuff from search_implem_10.
- Changes to test_lowlevel_ivf.cpp to test this. Refactored it because it does not behave like other scanners, as it requires `io_simd_dis` and `io_simd_ids` handling.

Why do we need this io_simd_dis and io_simd_ids?
-
Other scanners can scan over multiple lists with `set_query` then `set_list`, then calling `scan_codes`. Why? Because they take the `float* distances`. The heap can operate on these floats to re-use values from last iteration.

IVFPQFastScan is different. Codes are pq4 packed. The HeapHandler has to get initialized with values from scanning the prior list.

If the user is only setting 1 list and 1 query for the lifetime of this scanner (which I believe is Unicorn's use case), it does not matter because there's no need to use prior results and the user can follow the test named IVFPQFS_null_params. But if the user wants to use like the other Scanners, they can follow the other 2 tests in test_lowlevel_ivf.cpp that pass the heap size.

https://github.com/facebookresearch/faiss/blob/514b44fca8542bafe8640adcbf1cccce1900f74c/faiss/IndexIVFFastScan.cpp#L918-L992

How to use
-
If scanning multiple lists (mimicking nprobe > 1)
1. Specify the search params (see create_ivfpqfs_params)
2. Pass when initializing scanner
3. use as normal

If scanning single list over the lifetime of this scanner
1. Pass nullptr for params when initializing scanner
2. Use as normal

Differential Revision: D80114737
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D80114737

mnorris11 pushed a commit to mnorris11/faiss that referenced this pull request Aug 22, 2025
Summary:

Description
-
Adds InvertedListScanner for IVFPQFastScan. Right now it does not support it because the codes are PQFS packed. This is required from Unicorn to simplify the integration layer.

We can implement a scanner based on the search_implem_10. We can't do distance_to_code but we can do the whole scan. (that is what is done in this diff in the new `IVFPQFastScanScanner.scan_codes`).

Summary of changes
-
- Update HeapHandler in simd_result_handlers.h to return number of heap updates `nup`. Also add a bunch of junk aka `io_simd_dis` and `io_simd_ids` handling. More on that below.
- Move `make_knn_handler` to the IndexIVFFastScan.h declaration so it can be used outside just IndexIVFFastScan.cpp. It is moved out of the anonymous namespace in the .cpp file too.
- Add the IVFPQFastScanScanner to IndexIVFPQFastScan. It takes in SearchParams where the void* context is a struct containing max heap size when using this scanner (aka `k`). This impl for scan_codes is directly translated from search_implem_10 in in IndexIVFPQFastScan, because both search_implem_10 and scan_codes process 1 query and 1 invlist at a time (technically, it processes `n` queries at a time, just not batched. It is a simple loop over `n`, so it can be factored out to 1 query for this Scanner easily.). The only changes are removal of extraneous stuff from search_implem_10.
- Changes to test_lowlevel_ivf.cpp to test this. Refactored it because it does not behave like other scanners, as it requires `io_simd_dis` and `io_simd_ids` handling.

Why do we need this io_simd_dis and io_simd_ids?
-
Other scanners can scan over multiple lists with `set_query` then `set_list`, then calling `scan_codes`. Why? Because they take the `float* distances`. The heap can operate on these floats to re-use values from last iteration.

IVFPQFastScan is different. Codes are pq4 packed. The HeapHandler has to get initialized with values from scanning the prior list.

If the user is only setting 1 list and 1 query for the lifetime of this scanner (which I believe is Unicorn's use case), it does not matter because there's no need to use prior results and the user can follow the test named IVFPQFS_null_params. But if the user wants to use like the other Scanners, they can follow the other 2 tests in test_lowlevel_ivf.cpp that pass the heap size.

https://github.com/facebookresearch/faiss/blob/514b44fca8542bafe8640adcbf1cccce1900f74c/faiss/IndexIVFFastScan.cpp#L918-L992

How to use
-
If scanning multiple lists (mimicking nprobe > 1)
1. Specify the search params (see create_ivfpqfs_params)
2. Pass when initializing scanner
3. use as normal

If scanning single list over the lifetime of this scanner
1. Pass nullptr for params when initializing scanner
2. Use as normal

Differential Revision: D80114737
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D80114737

mnorris11 pushed a commit to mnorris11/faiss that referenced this pull request Aug 26, 2025
Summary:

Description
-
Adds InvertedListScanner for IVFPQFastScan. Right now IVFPQFastScan does not support a Scanner because the codes are PQFS packed. This is required from Unicorn to simplify the integration layer.

We can implement a scanner based on the search_implem_10. We can't do distance_to_code but we can do the whole scan. (that is what is done in this diff in the new `IVFPQFastScanScanner.scan_codes`).

Summary of changes
-
- Update HeapHandler in simd_result_handlers.h to return number of heap updates `nup`. Also add a bunch of junk aka `io_simd_dis` and `io_simd_ids` handling. More on that below.
- Move `make_knn_handler` to the IndexIVFFastScan.h declaration so it can be used outside just IndexIVFFastScan.cpp. It is moved out of the anonymous namespace in the .cpp file too.
- Add the IVFPQFastScanScanner to IndexIVFPQFastScan. IVFPQFastScanScanner takes in SearchParams where the void* context is a struct containing max heap size when using this scanner (aka `k`). This impl for scan_codes is directly translated from search_implem_10 in in IndexIVFPQFastScan, because both search_implem_10 and scan_codes process 1 query and 1 invlist at a time (technically, search_implem_10 processes `n` queries at a time, just not batched. search_implem_10 has a simple loop over `n`, so search_implem_10 can be factored out to 1 query for this Scanner easily.). The only changes are removal of extraneous stuff from search_implem_10.
- Changes to test_lowlevel_ivf.cpp to test this. Refactored the test because IVFPQFastScanScanner does not behave like other scanners, as IVFPQFastScanScanner requires `io_simd_dis` and `io_simd_ids` handling.

Why do we need this io_simd_dis and io_simd_ids?
-
Other scanners can scan over multiple lists with `set_query` then `set_list`, then calling `scan_codes`. Why? Because they take the `float* distances`. The heap can operate on these floats to re-use values from last iteration.

IVFPQFastScan is different. Codes are pq4 packed. The HeapHandler has to get initialized with values from scanning the prior list.

If the user is only setting 1 list and 1 query for the lifetime of this scanner (which I believe is Unicorn's use case), it does not matter because there's no need to use prior results and the user can follow the test named IVFPQFS_null_params. But if the user wants to use like the other Scanners, they can follow the other 2 tests in test_lowlevel_ivf.cpp that pass the heap size.

https://github.com/facebookresearch/faiss/blob/514b44fca8542bafe8640adcbf1cccce1900f74c/faiss/IndexIVFFastScan.cpp#L918-L992

How to use
-
If scanning multiple lists (mimicking nprobe > 1)
1. Specify the search params (see create_ivfpqfs_params)
2. Pass when initializing scanner
3. use as normal

If scanning single list over the lifetime of this scanner
1. Pass nullptr for params when initializing scanner
2. Use as normal

Differential Revision: D80114737
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D80114737

mnorris11 pushed a commit to mnorris11/faiss that referenced this pull request Aug 26, 2025
Summary:

Description
-
Adds InvertedListScanner for IVFPQFastScan. Right now IVFPQFastScan does not support a Scanner because the codes are PQFS packed. This is required from Unicorn to simplify the integration layer.

We can implement a scanner based on the search_implem_10. We can't do distance_to_code but we can do the whole scan. (that is what is done in this diff in the new `IVFPQFastScanScanner.scan_codes`).

Summary of changes
-
- Update HeapHandler in simd_result_handlers.h to return number of heap updates `nup`. Also add a bunch of junk aka `io_simd_dis` and `io_simd_ids` handling. More on that below.
- Move `make_knn_handler` to the IndexIVFFastScan.h declaration so it can be used outside just IndexIVFFastScan.cpp. It is moved out of the anonymous namespace in the .cpp file too.
- Add the IVFPQFastScanScanner to IndexIVFPQFastScan. IVFPQFastScanScanner takes in SearchParams where the void* context is a struct containing max heap size when using this scanner (aka `k`). This impl for scan_codes is directly translated from search_implem_10 in in IndexIVFPQFastScan, because both search_implem_10 and scan_codes process 1 query and 1 invlist at a time (technically, search_implem_10 processes `n` queries at a time, just not batched. search_implem_10 has a simple loop over `n`, so search_implem_10 can be factored out to 1 query for this Scanner easily.). The only changes are removal of extraneous stuff from search_implem_10.
- Changes to test_lowlevel_ivf.cpp to test this. Refactored the test because IVFPQFastScanScanner does not behave like other scanners, as IVFPQFastScanScanner requires `io_simd_dis` and `io_simd_ids` handling.

Why do we need this io_simd_dis and io_simd_ids?
-
Other scanners can scan over multiple lists with `set_query` then `set_list`, then calling `scan_codes`. Why? Because they take the `float* distances`. The heap can operate on these floats to re-use values from last iteration.

IVFPQFastScan is different. Codes are pq4 packed. The HeapHandler has to get initialized with values from scanning the prior list.

If the user is only setting 1 list and 1 query for the lifetime of this scanner (which I believe is Unicorn's use case), it does not matter because there's no need to use prior results and the user can follow the test named IVFPQFS_null_params. But if the user wants to use like the other Scanners, they can follow the other 2 tests in test_lowlevel_ivf.cpp that pass the heap size.

https://github.com/facebookresearch/faiss/blob/514b44fca8542bafe8640adcbf1cccce1900f74c/faiss/IndexIVFFastScan.cpp#L918-L992

How to use
-
If scanning multiple lists (mimicking nprobe > 1)
1. Specify the search params (see create_ivfpqfs_params)
2. Pass when initializing scanner
3. use as normal

If scanning single list over the lifetime of this scanner
1. Pass nullptr for params when initializing scanner
2. Use as normal

Differential Revision: D80114737
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D80114737

mnorris11 pushed a commit to mnorris11/faiss that referenced this pull request Aug 26, 2025
Summary:

Description
-
Adds InvertedListScanner for IVFPQFastScan. Right now IVFPQFastScan does not support a Scanner because the codes are PQFS packed. This is required from Unicorn to simplify the integration layer.

We can implement a scanner based on the search_implem_10. We can't do distance_to_code but we can do the whole scan. (that is what is done in this diff in the new `IVFPQFastScanScanner.scan_codes`).

Summary of changes
-
- Update HeapHandler in simd_result_handlers.h to return number of heap updates `nup`. Also add a bunch of junk aka `io_simd_dis` and `io_simd_ids` handling. More on that below.
- Move `make_knn_handler` to the IndexIVFFastScan.h declaration so it can be used outside just IndexIVFFastScan.cpp. It is moved out of the anonymous namespace in the .cpp file too.
- Add the IVFPQFastScanScanner to IndexIVFPQFastScan. IVFPQFastScanScanner takes in SearchParams where the void* context is a struct containing max heap size when using this scanner (aka `k`). This impl for scan_codes is directly translated from search_implem_10 in in IndexIVFPQFastScan, because both search_implem_10 and scan_codes process 1 query and 1 invlist at a time (technically, search_implem_10 processes `n` queries at a time, just not batched. search_implem_10 has a simple loop over `n`, so search_implem_10 can be factored out to 1 query for this Scanner easily.). The only changes are removal of extraneous stuff from search_implem_10.
- Changes to test_lowlevel_ivf.cpp to test this. Refactored the test because IVFPQFastScanScanner does not behave like other scanners, as IVFPQFastScanScanner requires `io_simd_dis` and `io_simd_ids` handling.

Why do we need this io_simd_dis and io_simd_ids?
-
Other scanners can scan over multiple lists with `set_query` then `set_list`, then calling `scan_codes`. Why? Because they take the `float* distances`. The heap can operate on these floats to re-use values from last iteration.

IVFPQFastScan is different. Codes are pq4 packed. The HeapHandler has to get initialized with values from scanning the prior list.

If the user is only setting 1 list and 1 query for the lifetime of this scanner (which I believe is Unicorn's use case), it does not matter because there's no need to use prior results and the user can follow the test named IVFPQFS_null_params. But if the user wants to use like the other Scanners, they can follow the other 2 tests in test_lowlevel_ivf.cpp that pass the heap size.

https://github.com/facebookresearch/faiss/blob/514b44fca8542bafe8640adcbf1cccce1900f74c/faiss/IndexIVFFastScan.cpp#L918-L992

How to use
-
If scanning multiple lists (mimicking nprobe > 1)
1. Specify the search params (see create_ivfpqfs_params)
2. Pass when initializing scanner
3. use as normal

If scanning single list over the lifetime of this scanner
1. Pass nullptr for params when initializing scanner
2. Use as normal

ntotal vs number of blocks?
-
- index.bbs is often 32 https://github.com/facebookresearch/faiss/wiki/Fast-accumulation-of-PQ-and-AQ-codes-(FastScan)#implementation. This is the number of codes per block.

Differential Revision: D80114737
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D80114737

Summary:

Description
-
Adds InvertedListScanner for IVFPQFastScan. Right now IVFPQFastScan does not support a Scanner because the codes are PQFS packed. This is required from Unicorn to simplify the integration layer.

We can implement a scanner based on the search_implem_10. We can't do distance_to_code but we can do the whole scan. (that is what is done in this diff in the new `IVFPQFastScanScanner.scan_codes`).

Summary of changes
-
- Update HeapHandler in simd_result_handlers.h to return number of heap updates `nup`. Also add a bunch of junk aka `io_simd_dis` and `io_simd_ids` handling. More on that below.
- Move `make_knn_handler` to the IndexIVFFastScan.h declaration so it can be used outside just IndexIVFFastScan.cpp. It is moved out of the anonymous namespace in the .cpp file too.
- Add the IVFPQFastScanScanner to IndexIVFPQFastScan. IVFPQFastScanScanner takes in SearchParams where the void* context is a struct containing max heap size when using this scanner (aka `k`). This impl for scan_codes is directly translated from search_implem_10 in in IndexIVFPQFastScan, because both search_implem_10 and scan_codes process 1 query and 1 invlist at a time (technically, search_implem_10 processes `n` queries at a time, just not batched. search_implem_10 has a simple loop over `n`, so search_implem_10 can be factored out to 1 query for this Scanner easily.). The only changes are removal of extraneous stuff from search_implem_10.
- Changes to test_lowlevel_ivf.cpp to test this. Refactored the test because IVFPQFastScanScanner does not behave like other scanners, as IVFPQFastScanScanner requires `io_simd_dis` and `io_simd_ids` handling.

Why do we need this io_simd_dis and io_simd_ids?
-
Other scanners can scan over multiple lists with `set_query` then `set_list`, then calling `scan_codes`. Why? Because they take the `float* distances`. The heap can operate on these floats to re-use values from last iteration.

IVFPQFastScan is different. Codes are pq4 packed. The HeapHandler has to get initialized with values from scanning the prior list.

If the user is only setting 1 list and 1 query for the lifetime of this scanner (which I believe is Unicorn's use case), it does not matter because there's no need to use prior results and the user can follow the test named IVFPQFS_null_params. But if the user wants to use like the other Scanners, they can follow the other 2 tests in test_lowlevel_ivf.cpp that pass the heap size.

https://github.com/facebookresearch/faiss/blob/514b44fca8542bafe8640adcbf1cccce1900f74c/faiss/IndexIVFFastScan.cpp#L918-L992

How to use
-
If scanning multiple lists (mimicking nprobe > 1)
1. Specify the search params (see create_ivfpqfs_params)
2. Pass when initializing scanner
3. use as normal

If scanning single list over the lifetime of this scanner
1. Pass nullptr for params when initializing scanner
2. Use as normal

ntotal vs number of blocks?
-
- index.bbs is often 32 https://github.com/facebookresearch/faiss/wiki/Fast-accumulation-of-PQ-and-AQ-codes-(FastScan)#implementation. This is the number of codes per block.

Differential Revision: D80114737
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D80114737

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants