-
Notifications
You must be signed in to change notification settings - Fork 1k
Vectorizing vector_concat for improved performance #861
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
On an ARM chip this should generate SIMD instructions to copy the two incoming vectors to the new vector as opposed to doing it all in software.
|
@binarycleric Do you have any before/after benchmarks on this? There's some examples of benchmarks to run and what to test here. |
|
Thanks @jkatz. Working on that now. |
|
The following was generated using https://github.com/binarycleric/pgvectorbench/blob/main/benchmarks/checks/vector_concat.sql and tested against Postgres 16 on macOS with M4 Pro chip. I'm hitting EoD right now but tomorrow I'm going to run these tests on r7i/r7g instances and post results. With ChangeSource from the Without ChangeLatest source from the |
|
Here are some more benchmarks, this time on EC2. x86_64Instance Type: r7i.2xlarge main branchvectorize-vector-concat branchARMInstance Type: r7g.2xlarge main branchvectorize-vector-concat branch |
|
@binarycleric Can you please use some of the tests mentioned in that blog post (e.g. ANN Benchmark, VectorDBBench)? It's important to capture the before/after recall measurement, to see if any of the changes impact overall result quality. |
|
@jkatz Sure thing. Sorry about that. |
|
This function isn't used for nearest neighbor search, so additional benchmarks shouldn't be needed. Will take a look at this after #860. |
|
Looks good. Let's remove |
|
Sounds good. I'll update that in the morning. |
|
Thanks |
* Vectorizing vector_concat for improved performance On an ARM chip this should generate SIMD instructions to copy the two incoming vectors to the new vector as opposed to doing it all in software. * Moving declarations to above CheckDim * Removing const from dims * Formatting
Vectorizing
vector_concatfor some improved performance. On an ARM chip this should generate SIMD instructions to copy the two incoming vectors to the new vector as opposed to doing it all in software.I'm a little rusty at Assembly so forgive me if I make any mistakes. I used Cursor to help document what each line was doing as a sanity check.
With the change
The interesting bits are the calls to ARM's
ldpandstpinstructions to load/store pairs of CPU registers as opposed to doing them one at a time.For the first vector copy (a->x to result->x):
For the second vector copy (b->x to result->x + a->dim):
Without the change
For the first vector copy (a->x to result->x):
For the second vector copy (b->x to result->x + a->dim):