Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@jkatz
Copy link
Contributor

@jkatz jkatz commented Jan 5, 2026

This commit adds the ability to cast JSONB arrays directly into vector and halfvec vector types, for example:

SELECT '[1,2,3]'::jsonb::vector(3);
SELECT '[1,2,3]'::jsonb::halfvec(3);

There are several TODOs if the overall idea of the patch is acceptable:

  • Determine if the method should be applied to sparsevec (and possibly bit vectors, but maybe a separate discussion given bit is a core type).
  • Update README and other docs with examples.
  • For discussion: to go from vector to jsonb, one must cast to an array (to_jsonb(vector::real[])) otherwise the value is stored as a string. It may be helpful to have a vector_to_jsonb function similar to how PostgreSQL has array_to_json.

Motivation

There are several vector data sources that come from JSON documents, whether the embeddings are contained in them or they're being transformed from other sources, such as from data lake files. Additionally, some cases prefer not to duplicate the vector data between the JSON file and a separate vector column, though are fine to use the vector as an expression in a HNSW/IVFFlat index. The previous discussion concluded that the cast from jsonb::text::vector would work, but for cases with bulk imports or transformations, this adds nontrivial overhead.

Testing

The tests show how the jsonb_to_vector functions performs compared to jsonb::text::vector cast. This was executed as k=10 exact-nearest neighbor queries on a dataset of 100,000 vectors that all fit into memory. Each test was run until 50 or 500 transactions was completed, and the average of these transactions taken. The times are in milliseconds.

I went through a few different variations of the tests including:

  1. Baselining against a regular, non-casted K-NN query
  2. Casting from JSONB to an array to a vector type. This was 10x slower overall than the other results, thus not printing.
  3. Casting from JSONB to text to a vector type
  4. Casting directly from JSONB to a vector type

The below show the results from the last two tests

jsonb::text::vector

SELECT id,
    (embedding_jsonb::text::vector(1536)) <=>
        (SELECT embedding_vector FROM data WHERE id = ?) as distance
FROM data
ORDER BY distance
LIMIT 10;

jsonb_to_vector (similar for jsonb_to_halfvec)

SELECT id,
    (embedding_jsonb::vector(1536)) <=>
        (SELECT embedding_vector FROM data WHERE id = ?) as distance
FROM data
ORDER BY distance
LIMIT 10;

Results

Most tests showed a direct jsonb to vector cast had close to a 20% speedup over the jsonb::text::vector method.

vector

Dimension jsonb::text::vector jsonb_to_vector % Speedup
128 282.594 237.999 15.8%
768 2343.711 1881.062 19.7%
1536 (external) 4566.775 3723.697 18.5%
1536 (plain) 3139.608 2532.739 19.3%

halfvec

Dimension jsonb::text::halfvec jsonb_to_halfvec % Speedup
128 290.947 237.163 18.5%
768 1623.813 1295.505 20.2%
1536 (external) 4591.033 3695.503 19.5%
1536 (plain) 3139.492 2509.346 20.1%

This commit adds the ability to cast JSONB arrays directly into
vector and halfvec vector types, for example:

SELECT '[1,2,3]'::jsonb::vector(3);
SELECT '[1,2,3]'::jsonb::halfvec(3);
@ankane
Copy link
Member

ankane commented Jan 12, 2026

Thanks @jkatz. I'm hesitant about including a direct cast for this, since I don't think it's super common and the existing cast is fairly performant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants