Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Define vector store interface#969

Merged
edwinyyyu merged 14 commits into
MemMachine:mainfrom
edwinyyyu:vector_store_interface
Feb 4, 2026
Merged

Define vector store interface#969
edwinyyyu merged 14 commits into
MemMachine:mainfrom
edwinyyyu:vector_store_interface

Conversation

@edwinyyyu
Copy link
Copy Markdown
Contributor

@edwinyyyu edwinyyyu commented Jan 22, 2026

Purpose of the change

Whereas:

  • Graph features are not heavily used now.
  • Current memory components primarily depend on vector database features.
  • It is more difficult to work with a completely schemaless data store.
  • There may be asynchronous setup logic for data storage.
  • Querying logic for different search methods can be better unified.
  • The existing VectorGraphStore interface is harder to implement with a pure vector database.
    Thus:
  • Add a simpler interface to implement for different database providers.

Description

Add a new ABC named VectorStore to wrap vector databases.

Will be part of larger refactor.

  • If keeping support for VectorGraphStore, we can do any of the following:
    • VectorGraphStore-based VectorStore (VectorGraphStore will be redefined but externally it will be mostly the same)
    • new memory implementation
      • based on what exists in the config, route to either new or old implementation

Type of change

[Please delete options that are not relevant.]

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Refactor (does not change functionality, e.g., code style improvements, linting)
  • Documentation update
  • Project Maintenance (updates to build scripts, CI, etc., that do not affect the main project)
  • Security (improves security without changing functionality)

How Has This Been Tested?

No tests for interface.

  • Unit Test
  • Integration Test
  • End-to-end Test
  • Test Script (please provide)
  • Manual verification (list step-by-step instructions)

Checklist

  • I have signed the commit(s) within this pull request
  • My code follows the style guidelines of this project (See STYLE_GUIDE.md)
  • I have performed a self-review of my own code
  • I have commented my code
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added unit tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • Any dependent changes have been merged and published in downstream modules
  • I have checked my code and corrected any misspellings

Maintainer Checklist

  • Confirmed all checks passed
  • Contributor has signed the commit(s)
  • Reviewed the code
  • Run, Tested, and Verified the change(s) work as expected

@edwinyyyu
Copy link
Copy Markdown
Contributor Author

edwinyyyu commented Jan 22, 2026

@o-love Any need for pure filtered search without similarity score? I think just SQL would be better for that. I've removed the filter-only search.

@edwinyyyu edwinyyyu force-pushed the vector_store_interface branch from 33cf8ec to 86c39a1 Compare January 22, 2026 22:30
self,
*,
collection: str,
query_vector: list[float],
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering if its possible to accept and array as well as a list with the Sequence[float] type.

I feel like we are generally overusing/enforcing python's lists instead of using arrays (either numpy or stdlib arrays).

Copy link
Copy Markdown
Contributor Author

@edwinyyyu edwinyyyu Jan 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Numpy arrays don't work with Sequence[float].

I was thinking to use list[float] because it's easy to guarantee behaviors work.

If we allow NumPy/PyTorch, it's more work to check the type at runtime, since the type hint would say nothing about the dimensions of the NDArray/Tensor.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed it to Sequence to support tuples and array.array.

Comment thread src/memmachine/common/vector_store/vector_store.py Outdated
@edwinyyyu edwinyyyu marked this pull request as draft January 23, 2026 22:13
@edwinyyyu
Copy link
Copy Markdown
Contributor Author

Before merge, I want to review the index creation parameters.

@edwinyyyu edwinyyyu marked this pull request as ready for review January 23, 2026 22:53
@edwinyyyu edwinyyyu force-pushed the vector_store_interface branch 3 times, most recently from a05d1f0 to 3cc5c7e Compare January 27, 2026 00:41
@jealous
Copy link
Copy Markdown
Contributor

jealous commented Jan 27, 2026

@edwinyyyu do you want to merge it?

@edwinyyyu edwinyyyu marked this pull request as draft January 27, 2026 01:27
@edwinyyyu
Copy link
Copy Markdown
Contributor Author

I reviewed several vector databases and it appears that there's a mix of ones that can take a collection name directly for performing data operations and ones that return a handle to a collection.

I will change this to use handles.

@edwinyyyu
Copy link
Copy Markdown
Contributor Author

Making it very barebones because there's a lot of fragmentation between different vector database providers.

@edwinyyyu edwinyyyu changed the title Create vector store interface Define vector store interface Jan 28, 2026
@edwinyyyu edwinyyyu requested review from jealous and o-love January 28, 2026 18:17
@edwinyyyu
Copy link
Copy Markdown
Contributor Author

edwinyyyu commented Jan 28, 2026

Notes:

  • Collection doesn't necessarily map to native collections for each vector database provider. See recommended multitenancy approach for each.
  • The Collection object (a handle) should be able to recreate itself if a provider-native collection of the same name gets deleted and recreated. This allows consistent behavior across providers (Chroma Python Collection object seems to persist, could be a bug; Qdrant only uses collection names).
  • Some provider(s) don't support add operation, only upsert/put (Amazon S3 Vector, Qdrant).
  • To get/delete by filter, an external database (likely SQL) will be maintained.
  • Consumer is responsible for ensuring that a collection is only created once.

@edwinyyyu edwinyyyu force-pushed the vector_store_interface branch from 745acd0 to a41437c Compare January 28, 2026 20:55
@edwinyyyu edwinyyyu marked this pull request as ready for review February 3, 2026 00:46
@edwinyyyu edwinyyyu linked an issue Feb 4, 2026 that may be closed by this pull request
@edwinyyyu edwinyyyu merged commit 8072b31 into MemMachine:main Feb 4, 2026
40 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Development experience/workflow friction

3 participants