upgrade to 0.5.2; Increased max dimensions for index from 2000 to 4096 #402

MarchLiu · 2024-01-09T12:07:01Z

Increased max dimensions for index from 2000 to 4096

Ollama generate embedding vector with 4096 dimensions. So I increase the constant.

jkatz · 2024-01-09T14:37:53Z

This PR will fail without additional changes. Currently, you can only index vectors up to 2K dimensions due to the PostgreSQL page size being 8KB by default. ~~You can increase this limit if you recompile PostgreSQL with a larger page size.~~

However, should the work from https://github.com/pgvector/pgvector/tree/tinyint and/or https://github.com/pgvector/pgvector/tree/half be merged in, this will allow increasing the indexable vector size for those dimension sizes.

ankane · 2024-01-09T17:58:15Z

Increasing the block size won't help either (#120), so the one option right now is dimensionality reduction.

jkatz · 2024-01-09T17:59:16Z

My mistake on that -- it shows how often I've tried to recompile to increase block size :)

TutubanaS · 2024-01-15T12:48:56Z

Hi again, other than PCA is there a method to create indexes on vectors more than 2000? I have 2048 ones, right on the limit :(. If PCA is the only solution, what are some outcomes, like does it efficiently work?

MichaelKarpe · 2024-01-15T21:12:06Z

Ollama generate embedding vector with 4096 dimensions. So I increase the constant.

Nice try! 😄

MichaelKarpe · 2024-01-15T21:12:18Z

Increasing the block size won't help either (#120), so the one option right now is dimensionality reduction.

@ankane @jkatz for my understanding, does it mean that it's not technically possible to increase to more than 2000 dimensions in pgvector or that it requires significant developments?

Most of the time requests on dimension size increase for indexation come for avoiding the dimensionality reduction, but there may be use cases where that many dimensions are necessary for ensuring a minimum search relevance. Thus, it'd be more than welcome to have greater limit as for many competitors.

For instance, would you know how pgvecto.rs is able to get the vector dimension limit for indexation to 65,535? See their comparison with pgvector..

Hi again, other than PCA is there a method to create indexes on vectors more than 2000? I have 2048 ones, right on the limit :(. If PCA is the only solution, what are some outcomes, like does it efficiently work?

@TutubanaS with pgvector it's currently not possible to go above 2000 dimensions AND having the indexation. However, it's possible to go up to 16,000 dimensions in pgvector, but without indexing chances are very high that the query per second (QPS) performance will be very bad and not reasonable for a business production use case.

Dimensionality reduction may (will most probably if you're using some pretrained model) create some loss of information and deteriorate the relevance of your search results. It's hard to evaluate how much the performance may deteriorate without more information. I'd recommend you perform the dimensionality reduction and evaluate the search relevance with and without it. Most probably here the quality loss will be very small since the dimensionality reduction is small from 2048 to 2000 dimensions.

jkatz · 2024-01-16T17:26:37Z

get the vector dimension limit for indexation to 65,535

When I last checked, this is due to not using the PostgreSQL storage system. The data is stored in an independent format, but the tradeoff is that does not use PostgreSQL's durability mechanisms.

MarchLiu added 2 commits January 9, 2024 20:03

upgrade to 0.5.2; Increased max dimensions for index from 2000 to 4096

d8944bc

upgrade readme to 0.5.2;

c919a36

ankane closed this Jan 9, 2024

MarchLiu mentioned this pull request Jun 4, 2024

Submit a simple vector dimensionality reduction function. #582

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

upgrade to 0.5.2; Increased max dimensions for index from 2000 to 4096 #402

upgrade to 0.5.2; Increased max dimensions for index from 2000 to 4096 #402

Uh oh!

MarchLiu commented Jan 9, 2024 •

edited

Loading

Uh oh!

jkatz commented Jan 9, 2024 •

edited

Loading

Uh oh!

ankane commented Jan 9, 2024

Uh oh!

jkatz commented Jan 9, 2024

Uh oh!

TutubanaS commented Jan 15, 2024

Uh oh!

MichaelKarpe commented Jan 15, 2024

Uh oh!

MichaelKarpe commented Jan 15, 2024

Uh oh!

jkatz commented Jan 16, 2024

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

5 participants

upgrade to 0.5.2; Increased max dimensions for index from 2000 to 4096 #402

upgrade to 0.5.2; Increased max dimensions for index from 2000 to 4096 #402

Uh oh!

Conversation

MarchLiu commented Jan 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Increased max dimensions for index from 2000 to 4096

Uh oh!

jkatz commented Jan 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ankane commented Jan 9, 2024

Uh oh!

jkatz commented Jan 9, 2024

Uh oh!

TutubanaS commented Jan 15, 2024

Uh oh!

MichaelKarpe commented Jan 15, 2024

Uh oh!

MichaelKarpe commented Jan 15, 2024

Uh oh!

jkatz commented Jan 16, 2024

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

5 participants

MarchLiu commented Jan 9, 2024 •

edited

Loading

jkatz commented Jan 9, 2024 •

edited

Loading