vectorize: optimize VectorSumCenter and HalfvecSumCenter #860

binarycleric · 2025-06-17T15:40:28Z

The functions VectorSumCenter and HalfvecSumCenter were not being vectorized by the compiler. A few slight changes will allow these optimizations to take place and get a performance boost by utilizing SIMD instructions.

This optimization helps improve performance of vector operations in IVF index building and updating.

Verified that the loops are being vectorized by checking PG_CFLAGS += -Rpass=loop-vectorize -Rpass-analysis=loop-vectorize.

Before

src/ivfutils.c:262:2: remark: vectorized loop (vectorization width: 4, interleaved count: 4) [-Rpass=loop-vectorize]
  262 |         for (int k = 0; k < dimensions; k++)
      |         ^
src/ivfutils.c:299:2: remark: loop not vectorized: could not determine number of loop iterations [-Rpass-analysis=loop-vectorize]
  299 |         for (int k = 0; k < vec->dim; k++)
      |         ^
src/ivfutils.c:274:2: remark: vectorized loop (vectorization width: 8, interleaved count: 4) [-Rpass=loop-vectorize]
  274 |         for (int k = 0; k < dimensions; k++)
      |         ^
src/ivfutils.c:308:2: remark: loop not vectorized: could not determine number of loop iterations [-Rpass-analysis=loop-vectorize]
  308 |         for (int k = 0; k < vec->dim; k++)
      |         ^
src/ivfutils.c:287:2: remark: vectorized loop (vectorization width: 16, interleaved count: 4) [-Rpass=loop-vectorize]
  287 |         for (uint32 k = 0; k < VARBITBYTES(vec); k++)
      |         ^
src/ivfutils.c:291:3: remark: loop not vectorized: cannot identify array bounds [-Rpass-analysis=loop-vectorize]
  291 |                 nx[k / 8] |= (x[k] > 0.5 ? 1 : 0) << (7 - (k % 8));
      |                 ^
src/ivfutils.c:318:22: remark: loop not vectorized: cannot identify array bounds [-Rpass-analysis=loop-vectorize]
  318 |                 x[k] += (float) (((VARBITS(vec)[k / 8]) >> (7 - (k % 8))) & 0x01);
      |                                    ^
src/ivfutils.c:317:2: remark: loop not vectorized: could not determine number of loop iterations [-Rpass-analysis=loop-vectorize]
  317 |         for (int k = 0; k < VARBITLEN(vec); k++)
      |

After

src/ivfutils.c:262:2: remark: vectorized loop (vectorization width: 4, interleaved count: 4) [-Rpass=loop-vectorize]
  262 |         for (int k = 0; k < dimensions; k++)
      |         ^
src/ivfutils.c:301:2: remark: vectorized loop (vectorization width: 4, interleaved count: 4) [-Rpass=loop-vectorize]
  301 |         for (int k = 0; k < dim; k++)
      |         ^
src/ivfutils.c:274:2: remark: vectorized loop (vectorization width: 8, interleaved count: 4) [-Rpass=loop-vectorize]
  274 |         for (int k = 0; k < dimensions; k++)
      |         ^
src/ivfutils.c:312:2: remark: vectorized loop (vectorization width: 8, interleaved count: 4) [-Rpass=loop-vectorize]
  312 |         for (int k = 0; k < dim; k++)
      |         ^
In file included from src/ivfutils.c:7:
src/halfutils.h:68:9: remark: floating point conversion changes vector width. Mixed floating point precision requires an up/down cast that will negatively impact performance. [-Rpass-analysis=loop-vectorize]
   68 |         return (float) num;
      |                ^
src/ivfutils.c:287:2: remark: vectorized loop (vectorization width: 16, interleaved count: 4) [-Rpass=loop-vectorize]
  287 |         for (uint32 k = 0; k < VARBITBYTES(vec); k++)
      |         ^
src/ivfutils.c:291:3: remark: loop not vectorized: cannot identify array bounds [-Rpass-analysis=loop-vectorize]
  291 |                 nx[k / 8] |= (x[k] > 0.5 ? 1 : 0) << (7 - (k % 8));
      |                 ^
src/ivfutils.c:322:22: remark: loop not vectorized: cannot identify array bounds [-Rpass-analysis=loop-vectorize]
  322 |                 x[k] += (float) (((VARBITS(vec)[k / 8]) >> (7 - (k % 8))) & 0x01);
      |                                    ^
src/ivfutils.c:321:2: remark: loop not vectorized: could not determine number of loop iterations [-Rpass-analysis=loop-vectorize]
  321 |         for (int k = 0; k < VARBITLEN(vec); k++)
      |         ^

The functions VectorSumCenter and HalfvecSumCenter were not being vectorized by the compiler. A few slight changes will allow these optimizations to take place and get a performance boost by utilizing SIMD instructions. This optimization helps improve performance of vector operations in IVF index building and updating.

ankane · 2025-06-17T16:02:40Z

Hi @binarycleric, thanks for the PR. Can you share data on how this impacts the k-means stage of the index build time?

binarycleric · 2025-06-17T16:17:07Z

@ankane Sure thing. I was just about to publish a test when I saw your comment. This this commit we're seeing a significant drop in k-means computation time.

Given the following SQL script:

CREATE EXTENSION IF NOT EXISTS vector;

CREATE OR REPLACE FUNCTION generate_random_floats(size integer)
RETURNS float[] AS $$
BEGIN
    RETURN (
        SELECT array_agg(random() * 2 - 1)
        FROM generate_series(1, size)
    );
END;
$$ LANGUAGE plpgsql;

CREATE TABLE t (val vector(1536));

INSERT INTO t (val)
SELECT generate_random_floats(1536)
FROM generate_series(1, 10000);

CREATE INDEX ON t USING ivfflat (val vector_l2_ops) WITH (lists = 1);

CREATE TABLE t2 (val halfvec(1536));

INSERT INTO t2 (val)
SELECT generate_random_floats(1536)
FROM generate_series(1, 10000);

CREATE INDEX ON t2 USING ivfflat (val halfvec_l2_ops) WITH (lists = 1);

DROP FUNCTION generate_random_floats(integer);
DROP TABLE t;
DROP TABLE t2;

Without autovectorization in Sums.

psql:ivfflat_bench.sql:1: NOTICE:  extension "vector" already exists, skipping
CREATE EXTENSION
CREATE FUNCTION
CREATE TABLE
INSERT 0 10000
psql:ivfflat_bench.sql:23: INFO:  k-means: 9.446 ms
psql:ivfflat_bench.sql:23: INFO:  assign tuples: 17.452 ms
psql:ivfflat_bench.sql:23: INFO:  sort tuples: 0.010 ms
psql:ivfflat_bench.sql:23: INFO:  load tuples: 330.834 ms
CREATE INDEX
CREATE TABLE
INSERT 0 10000
psql:ivfflat_bench.sql:32: INFO:  k-means: 9.079 ms
psql:ivfflat_bench.sql:32: INFO:  assign tuples: 11.688 ms
psql:ivfflat_bench.sql:32: INFO:  sort tuples: 0.010 ms
psql:ivfflat_bench.sql:32: INFO:  load tuples: 177.941 ms
CREATE INDEX
DROP FUNCTION
DROP TABLE
DROP TABLE

With autovectorization in Sums.

psql:ivfflat_bench.sql:1: NOTICE:  extension "vector" already exists, skipping
CREATE EXTENSION
CREATE FUNCTION
CREATE TABLE
INSERT 0 10000
psql:ivfflat_bench.sql:23: INFO:  k-means: 2.859 ms
psql:ivfflat_bench.sql:23: INFO:  assign tuples: 18.163 ms
psql:ivfflat_bench.sql:23: INFO:  sort tuples: 0.010 ms
psql:ivfflat_bench.sql:23: INFO:  load tuples: 327.269 ms
CREATE INDEX
CREATE TABLE
INSERT 0 10000
psql:ivfflat_bench.sql:32: INFO:  k-means: 2.420 ms
psql:ivfflat_bench.sql:32: INFO:  assign tuples: 12.136 ms
psql:ivfflat_bench.sql:32: INFO:  sort tuples: 0.012 ms
psql:ivfflat_bench.sql:32: INFO:  load tuples: 150.017 ms
CREATE INDEX
DROP FUNCTION
DROP TABLE
DROP TABLE

binarycleric · 2025-06-17T17:49:02Z

I see that my change introduced the following compiler warning.

In file included from src/ivfutils.c:7:
src/halfutils.h:68:9: remark: floating point conversion changes vector width. Mixed floating point precision requires an up/down cast that will negatively impact performance. [-Rpass-analysis=loop-vectorize]
   68 |         return (float) num;
      |                ^

I'm looking into seeing if there is anything I can do to address this. Looks like any kind of change to use native ARM functions breaks vectorization. Despite the warning this commit does seem to improve performance (at least on ARM, which is what I have available to test).

ankane

Some results from my testing for the k-means stage (with 1,000 lists):

Platform	Dataset	Type	Before (sec)	After (sec)
Linux x86-64	SIFT	vector	7.3	7.2
Mac arm64	SIFT	vector	3.52	3.48
Mac arm64	SIFT	halfvec	4.33	4.25
Linux x86-64	GIST	vector	36.0	34.8
Mac arm64	GIST	vector	14.6	14.1
Mac arm64	GIST	halfvec	15.2	14.5

It's not a big difference (especially when you factor in the rest of the index build time), but seems fine to include. Added a few comments inline.

src/ivfutils.c

binarycleric · 2025-06-18T21:21:27Z

Updated to remove const.

ankane · 2025-06-18T23:10:49Z

Great, thanks!

* vectorize: optimize VectorSumCenter and HalfvecSumCenter The functions VectorSumCenter and HalfvecSumCenter were not being vectorized by the compiler. A few slight changes will allow these optimizations to take place and get a performance boost by utilizing SIMD instructions. This optimization helps improve performance of vector operations in IVF index building and updating. * Removing const, commenting that it is only vectoirzed on ARM

ankane mentioned this pull request Jun 18, 2025

Vectorizing vector_concat for improved performance #861

Merged

ankane reviewed Jun 18, 2025

View reviewed changes

src/ivfutils.c Outdated Show resolved Hide resolved

src/ivfutils.c Outdated Show resolved Hide resolved

Removing const, commenting that it is only vectoirzed on ARM

14427e7

ankane merged commit fe697e8 into pgvector:master Jun 18, 2025

binarycleric deleted the vectorize-ivfutils branch June 19, 2025 01:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vectorize: optimize VectorSumCenter and HalfvecSumCenter #860

vectorize: optimize VectorSumCenter and HalfvecSumCenter #860

binarycleric commented Jun 17, 2025 •

edited

Loading

Uh oh!

ankane commented Jun 17, 2025

Uh oh!

binarycleric commented Jun 17, 2025 •

edited

Loading

Uh oh!

binarycleric commented Jun 17, 2025

Uh oh!

ankane left a comment

Uh oh!

Uh oh!

Uh oh!

binarycleric commented Jun 18, 2025

Uh oh!

ankane commented Jun 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

vectorize: optimize VectorSumCenter and HalfvecSumCenter #860

vectorize: optimize VectorSumCenter and HalfvecSumCenter #860

Conversation

binarycleric commented Jun 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Before

After

Uh oh!

ankane commented Jun 17, 2025

Uh oh!

binarycleric commented Jun 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Given the following SQL script:

Without autovectorization in Sums.

With autovectorization in Sums.

Uh oh!

binarycleric commented Jun 17, 2025

Uh oh!

ankane left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

binarycleric commented Jun 18, 2025

Uh oh!

ankane commented Jun 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

binarycleric commented Jun 17, 2025 •

edited

Loading

binarycleric commented Jun 17, 2025 •

edited

Loading