Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@binarycleric
Copy link
Contributor

@binarycleric binarycleric commented Jun 17, 2025

The functions VectorSumCenter and HalfvecSumCenter were not being vectorized by the compiler. A few slight changes will allow these optimizations to take place and get a performance boost by utilizing SIMD instructions.

This optimization helps improve performance of vector operations in IVF index building and updating.

Verified that the loops are being vectorized by checking PG_CFLAGS += -Rpass=loop-vectorize -Rpass-analysis=loop-vectorize.

Before

src/ivfutils.c:262:2: remark: vectorized loop (vectorization width: 4, interleaved count: 4) [-Rpass=loop-vectorize]
  262 |         for (int k = 0; k < dimensions; k++)
      |         ^
src/ivfutils.c:299:2: remark: loop not vectorized: could not determine number of loop iterations [-Rpass-analysis=loop-vectorize]
  299 |         for (int k = 0; k < vec->dim; k++)
      |         ^
src/ivfutils.c:274:2: remark: vectorized loop (vectorization width: 8, interleaved count: 4) [-Rpass=loop-vectorize]
  274 |         for (int k = 0; k < dimensions; k++)
      |         ^
src/ivfutils.c:308:2: remark: loop not vectorized: could not determine number of loop iterations [-Rpass-analysis=loop-vectorize]
  308 |         for (int k = 0; k < vec->dim; k++)
      |         ^
src/ivfutils.c:287:2: remark: vectorized loop (vectorization width: 16, interleaved count: 4) [-Rpass=loop-vectorize]
  287 |         for (uint32 k = 0; k < VARBITBYTES(vec); k++)
      |         ^
src/ivfutils.c:291:3: remark: loop not vectorized: cannot identify array bounds [-Rpass-analysis=loop-vectorize]
  291 |                 nx[k / 8] |= (x[k] > 0.5 ? 1 : 0) << (7 - (k % 8));
      |                 ^
src/ivfutils.c:318:22: remark: loop not vectorized: cannot identify array bounds [-Rpass-analysis=loop-vectorize]
  318 |                 x[k] += (float) (((VARBITS(vec)[k / 8]) >> (7 - (k % 8))) & 0x01);
      |                                    ^
src/ivfutils.c:317:2: remark: loop not vectorized: could not determine number of loop iterations [-Rpass-analysis=loop-vectorize]
  317 |         for (int k = 0; k < VARBITLEN(vec); k++)
      |   

After

src/ivfutils.c:262:2: remark: vectorized loop (vectorization width: 4, interleaved count: 4) [-Rpass=loop-vectorize]
  262 |         for (int k = 0; k < dimensions; k++)
      |         ^
src/ivfutils.c:301:2: remark: vectorized loop (vectorization width: 4, interleaved count: 4) [-Rpass=loop-vectorize]
  301 |         for (int k = 0; k < dim; k++)
      |         ^
src/ivfutils.c:274:2: remark: vectorized loop (vectorization width: 8, interleaved count: 4) [-Rpass=loop-vectorize]
  274 |         for (int k = 0; k < dimensions; k++)
      |         ^
src/ivfutils.c:312:2: remark: vectorized loop (vectorization width: 8, interleaved count: 4) [-Rpass=loop-vectorize]
  312 |         for (int k = 0; k < dim; k++)
      |         ^
In file included from src/ivfutils.c:7:
src/halfutils.h:68:9: remark: floating point conversion changes vector width. Mixed floating point precision requires an up/down cast that will negatively impact performance. [-Rpass-analysis=loop-vectorize]
   68 |         return (float) num;
      |                ^
src/ivfutils.c:287:2: remark: vectorized loop (vectorization width: 16, interleaved count: 4) [-Rpass=loop-vectorize]
  287 |         for (uint32 k = 0; k < VARBITBYTES(vec); k++)
      |         ^
src/ivfutils.c:291:3: remark: loop not vectorized: cannot identify array bounds [-Rpass-analysis=loop-vectorize]
  291 |                 nx[k / 8] |= (x[k] > 0.5 ? 1 : 0) << (7 - (k % 8));
      |                 ^
src/ivfutils.c:322:22: remark: loop not vectorized: cannot identify array bounds [-Rpass-analysis=loop-vectorize]
  322 |                 x[k] += (float) (((VARBITS(vec)[k / 8]) >> (7 - (k % 8))) & 0x01);
      |                                    ^
src/ivfutils.c:321:2: remark: loop not vectorized: could not determine number of loop iterations [-Rpass-analysis=loop-vectorize]
  321 |         for (int k = 0; k < VARBITLEN(vec); k++)
      |         ^

The functions VectorSumCenter and HalfvecSumCenter were not being
vectorized by the compiler. A few slight changes will allow these
optimizations to take place and get a performance boost by utilizing
SIMD instructions.

This optimization helps improve performance of vector operations in IVF
index building and updating.
@ankane
Copy link
Member

ankane commented Jun 17, 2025

Hi @binarycleric, thanks for the PR. Can you share data on how this impacts the k-means stage of the index build time?

@binarycleric
Copy link
Contributor Author

binarycleric commented Jun 17, 2025

@ankane Sure thing. I was just about to publish a test when I saw your comment. This this commit we're seeing a significant drop in k-means computation time.

Given the following SQL script:

CREATE EXTENSION IF NOT EXISTS vector;

CREATE OR REPLACE FUNCTION generate_random_floats(size integer)
RETURNS float[] AS $$
BEGIN
    RETURN (
        SELECT array_agg(random() * 2 - 1)
        FROM generate_series(1, size)
    );
END;
$$ LANGUAGE plpgsql;

CREATE TABLE t (val vector(1536));

INSERT INTO t (val)
SELECT generate_random_floats(1536)
FROM generate_series(1, 10000);

CREATE INDEX ON t USING ivfflat (val vector_l2_ops) WITH (lists = 1);

CREATE TABLE t2 (val halfvec(1536));

INSERT INTO t2 (val)
SELECT generate_random_floats(1536)
FROM generate_series(1, 10000);

CREATE INDEX ON t2 USING ivfflat (val halfvec_l2_ops) WITH (lists = 1);

DROP FUNCTION generate_random_floats(integer);
DROP TABLE t;
DROP TABLE t2;

Without autovectorization in Sums.

psql:ivfflat_bench.sql:1: NOTICE:  extension "vector" already exists, skipping
CREATE EXTENSION
CREATE FUNCTION
CREATE TABLE
INSERT 0 10000
psql:ivfflat_bench.sql:23: INFO:  k-means: 9.446 ms
psql:ivfflat_bench.sql:23: INFO:  assign tuples: 17.452 ms
psql:ivfflat_bench.sql:23: INFO:  sort tuples: 0.010 ms
psql:ivfflat_bench.sql:23: INFO:  load tuples: 330.834 ms
CREATE INDEX
CREATE TABLE
INSERT 0 10000
psql:ivfflat_bench.sql:32: INFO:  k-means: 9.079 ms
psql:ivfflat_bench.sql:32: INFO:  assign tuples: 11.688 ms
psql:ivfflat_bench.sql:32: INFO:  sort tuples: 0.010 ms
psql:ivfflat_bench.sql:32: INFO:  load tuples: 177.941 ms
CREATE INDEX
DROP FUNCTION
DROP TABLE
DROP TABLE

With autovectorization in Sums.

psql:ivfflat_bench.sql:1: NOTICE:  extension "vector" already exists, skipping
CREATE EXTENSION
CREATE FUNCTION
CREATE TABLE
INSERT 0 10000
psql:ivfflat_bench.sql:23: INFO:  k-means: 2.859 ms
psql:ivfflat_bench.sql:23: INFO:  assign tuples: 18.163 ms
psql:ivfflat_bench.sql:23: INFO:  sort tuples: 0.010 ms
psql:ivfflat_bench.sql:23: INFO:  load tuples: 327.269 ms
CREATE INDEX
CREATE TABLE
INSERT 0 10000
psql:ivfflat_bench.sql:32: INFO:  k-means: 2.420 ms
psql:ivfflat_bench.sql:32: INFO:  assign tuples: 12.136 ms
psql:ivfflat_bench.sql:32: INFO:  sort tuples: 0.012 ms
psql:ivfflat_bench.sql:32: INFO:  load tuples: 150.017 ms
CREATE INDEX
DROP FUNCTION
DROP TABLE
DROP TABLE

@binarycleric
Copy link
Contributor Author

I see that my change introduced the following compiler warning.

In file included from src/ivfutils.c:7:
src/halfutils.h:68:9: remark: floating point conversion changes vector width. Mixed floating point precision requires an up/down cast that will negatively impact performance. [-Rpass-analysis=loop-vectorize]
   68 |         return (float) num;
      |                ^

I'm looking into seeing if there is anything I can do to address this. Looks like any kind of change to use native ARM functions breaks vectorization. Despite the warning this commit does seem to improve performance (at least on ARM, which is what I have available to test).

Copy link
Member

@ankane ankane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some results from my testing for the k-means stage (with 1,000 lists):

Platform Dataset Type Before (sec) After (sec)
Linux x86-64 SIFT vector 7.3 7.2
Mac arm64 SIFT vector 3.52 3.48
Mac arm64 SIFT halfvec 4.33 4.25
Linux x86-64 GIST vector 36.0 34.8
Mac arm64 GIST vector 14.6 14.1
Mac arm64 GIST halfvec 15.2 14.5

It's not a big difference (especially when you factor in the rest of the index build time), but seems fine to include. Added a few comments inline.

@binarycleric
Copy link
Contributor Author

Updated to remove const.

@ankane ankane merged commit fe697e8 into pgvector:master Jun 18, 2025
@ankane
Copy link
Member

ankane commented Jun 18, 2025

Great, thanks!

@binarycleric binarycleric deleted the vectorize-ivfutils branch June 19, 2025 01:52
klmckeig pushed a commit to klmckeig/pgvector that referenced this pull request Dec 8, 2025
* vectorize: optimize VectorSumCenter and HalfvecSumCenter

The functions VectorSumCenter and HalfvecSumCenter were not being
vectorized by the compiler. A few slight changes will allow these
optimizations to take place and get a performance boost by utilizing
SIMD instructions.

This optimization helps improve performance of vector operations in IVF
index building and updating.

* Removing const, commenting that it is only vectoirzed on ARM
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants