Thanks to visit codestin.com
Credit goes to github.com

Skip to content

pingcap/tidb-vector-python

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

99 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

tidb-vector-python

This is a Python client for TiDB Vector.

Both TiDB Cloud Serverless (doc) and TiDB Open Source Version (>= 8.4 DMR) support vector data type.

Installation

pip install tidb-vector

Usage

TiDB vector supports below distance functions:

  • L1Distance
  • L2Distance
  • CosineDistance
  • NegativeInnerProduct

It also supports using hnsw index with l2 or cosine distance to speed up the search, for more details see Vector Search Indexes in TiDB

Supports following orm or framework:

SQLAlchemy

Learn how to connect to TiDB Serverless in the TiDB Cloud documentation.

Define table with vector field

from sqlalchemy import Column, Integer, create_engine
from sqlalchemy.orm import declarative_base
from tidb_vector.sqlalchemy import VectorType

engine = create_engine('mysql://****.root:******@gateway01.xxxxxx.shared.aws.tidbcloud.com:4000/test')
Base = declarative_base()

class Document(Base):
    __tablename__ = 'sqlalchemy_documents'
    id = Column(Integer, primary_key=True)
    embedding = Column(VectorType(3))

Base.metadata.create_all(engine)

Insert vector data

doc = Document(embedding=[1, 2, 3])
session.add(doc)
session.commit()

Get the nearest neighbors

session.scalars(select(Document).order_by(Document.embedding.l2_distance([1, 2, 3.1])).limit(5))

Get the distance

session.scalars(select(Document.embedding.l2_distance([1, 2, 3.1])))

Get within a certain distance

session.scalars(select(Document).filter(Document.embedding.l2_distance([1, 2, 3.1]) < 0.2))

Add vector index to speed up query

# vector index currently depends on tiflash
session.execute(text('ALTER TABLE sqlalchemy_documents SET TIFLASH REPLICA 1'))
index = Index(
    'idx_embedding',
    func.vec_cosine_distance(Document.embedding),
    mysql_prefix="vector",
)
index.create(engine)

Django

To use vector field in Django, you need to use django-tidb.

Peewee

Define peewee table with vector field

from peewee import Model, MySQLDatabase
from tidb_vector.peewee import VectorField

# Using `pymysql` as the driver
connect_kwargs = {
    'ssl_verify_cert': True,
    'ssl_verify_identity': True,
}

# Using `mysqlclient` as the driver
connect_kwargs = {
    'ssl_mode': 'VERIFY_IDENTITY',
    'ssl': {
        # Root certificate default path
        # https://docs.pingcap.com/tidbcloud/secure-connections-to-serverless-clusters/#root-certificate-default-path
        'ca': '/etc/ssl/cert.pem'  # MacOS
    },
}

db = MySQLDatabase(
    'peewee_test',
    user='xxxxxxxx.root',
    password='xxxxxxxx',
    host='xxxxxxxx.shared.aws.tidbcloud.com',
    port=4000,
    **connect_kwargs,
)

class DocumentModel(Model):
    embedding = VectorField(3)
    class Meta:
        database = db
        table_name = 'peewee_documents'

db.connect()
db.create_tables([DocumentModel])

Insert vector data

DocumentModel.create(embedding=[1, 2, 3])

Get the nearest neighbors

DocumentModel.select().order_by(DocumentModel.embedding.l2_distance([1, 2, 3.1])).limit(5)

Get the distance

DocumentModel.select(DocumentModel.embedding.cosine_distance([1, 2, 3.1]).alias('distance'))

Get within a certain distance

DocumentModel.select().where(DocumentModel.embedding.l2_distance([1, 2, 3.1]) < 0.5)

Add vector index to speed up query

# vector index currently depends on tiflash
db.execute_sql(SQL(
    "ALTER TABLE peewee_documents SET TIFLASH REPLICA 1;"
))
DocumentModel.add_index(SQL(
    "CREATE VECTOR INDEX idx_embedding ON peewee_documents ((vec_cosine_distance(embedding)))"
))

TiDB Vector Client

Within the framework, you can directly utilize the built-in TiDBVectorClient, as demonstrated by integrations like Langchain and Llama index, to seamlessly interact with TiDB Vector. This approach abstracts away the need to manage the underlying ORM, simplifying your interaction with the vector store.

We provide TiDBVectorClient which is based on sqlalchemy, you need to use pip install tidb-vector[client] to install it.

Create a TiDBVectorClient instance:

from tidb_vector.integrations import TiDBVectorClient

TABLE_NAME = 'vector_test'
CONNECTION_STRING = 'mysql+pymysql://<USER>:<PASSWORD>@<HOST>:4000/<DB>?ssl_verify_cert=true&ssl_verify_identity=true'

tidb_vs = TiDBVectorClient(
    # the table which will store the vector data
    table_name=TABLE_NAME,
    # tidb connection string
    connection_string=CONNECTION_STRING,
    # the dimension of the vector, in this example, we use the ada model, which has 1536 dimensions
    vector_dimension=1536,
    # if recreate the table if it already exists
    drop_existing_table=True,
)

Bulk insert:

ids = [
    "f8e7dee2-63b6-42f1-8b60-2d46710c1971",
    "8dde1fbc-2522-4ca2-aedf-5dcb2966d1c6",
    "e4991349-d00b-485c-a481-f61695f2b5ae",
]
documents = ["foo", "bar", "baz"]
embeddings = [
    text_to_embedding("foo"),
    text_to_embedding("bar"),
    text_to_embedding("baz"),
]
metadatas = [
    {"page": 1, "category": "P1"},
    {"page": 2, "category": "P1"},
    {"page": 3, "category": "P2"},
]

tidb_vs.insert(
    ids=ids,
    texts=documents,
    embeddings=embeddings,
    metadatas=metadatas,
)

Query:

tidb_vs.query(text_to_embedding("foo"), k=3)

# query with filter
tidb_vs.query(text_to_embedding("foo"), k=3, filter={"category": "P1"})

Bulk delete:

tidb_vs.delete(["f8e7dee2-63b6-42f1-8b60-2d46710c1971"])

# delete with filter
tidb_vs.delete(["f8e7dee2-63b6-42f1-8b60-2d46710c1971"], filter={"category": "P1"})

Examples

There are some examples to show how to use the tidb-vector-python to interact with TiDB Vector in different scenarios.

for more examples, see the examples directory.

Contributing

Please feel free to reach out to the maintainers if you have any questions or need help with the project. Before contributing, please read the CONTRIBUTING.md file.

About

TiDB Vector SDK for Python, including code examples. Join our Discord: https://discord.gg/XzSW23Jg9p

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published