- πΉ pgvector/
pgvectorβ Original repository, serving as the upstream forfk-pgvector. - πΉ robsoncombr/
fk-pgvectorβ A direct fork ofpgvectorwithout modifications, used as an upstream for this repository. - πΉ robsoncombr/
rbs-pgvectorβ This repository, maintaining self-driven enhancements and improvements.
._rbs/β π A dedicated space for project-related files, including a base template for quickly startingpgvectorwith Docker Compose.README.upstream.mdβ The originalREADME.mdfile from the upstream repository.README.mdβ This file.
rbs-pgvector/upstreamβ The upstream branch, regularly merged fromrobsoncombr/fk-pgvector.rbs-pgvector/*β All other branches contain custom changes, fixes, or new features, typically committed with the[rbs]prefix.
- π§ Efficient Vector Similarity Search (L2, Cosine, Inner Product).
- β‘ Optimized for Performance on PostgreSQL.
- π Scalable & Production-Ready with advanced indexing support.
- π Seamless Integration with AI applications and machine learning models.
- π Fully Compatible with
pgvectorand enhanced for better performance. - π Geo Extensions β Includes PostGIS, pgRouting, and related extensions for geospatial data.
Ensure PostgreSQL is installed and activate the pgvector extension:
# Install the pgvector extension
# psql -d your_database -c "CREATE EXTENSION IF NOT EXISTS vector;"
### 2οΈβ£ Clone the Repository
```sh
git clone https://github.com/robsoncombr/rbs-pgvector.git
cd rbs-pgvectorRun schema migrations:
psql -d your_database -f schema.sqlCREATE TABLE embeddings (
id SERIAL PRIMARY KEY,
data VECTOR(1536) -- Example for OpenAI's 1536-dimension embeddings
);
-- Insert a vector
INSERT INTO embeddings (data) VALUES ('[0.1, 0.2, 0.3, ..., 0.9]');
-- Find the closest match using Cosine Similarity
SELECT \* FROM embeddings ORDER BY data <=> '[0.1, 0.2, 0.3, ..., 0.9]' LIMIT 5;The Dockerfile in this repository rebuilds the pgvector image with additional geo extensions, including PostGIS, pgRouting, and related tools. Here's how it works:
ARG PG_MAJOR=17
# Stage 1: Use postgres image to get the apt lists
FROM postgres:$PG_MAJOR AS postgres_lists
ARG PG_MAJOR
# Stage 2: Use pgvector image and copy the apt lists from Stage 1
FROM pgvector/pgvector:0.8.0-pg17
# Copy the apt lists from the postgres image
COPY --from=postgres_lists /var/lib/apt/lists /var/lib/apt/lists
# Install the required packages
RUN apt-get update && \
apt-mark hold locales && \
apt-get install -y --no-install-recommends \
curl \
wget \
unzip \
osm2pgsql \
osmium-tool \
postgis \
postgresql-17-pgrouting && \
apt-get autoremove -y && \
apt-mark unhold locales && \
apt-get clean autoclean && \
apt-get autoremove --yes && \
rm -rf /var/lib/{apt,dpkg,cache,log}/- PostGIS: For geospatial data support.
- pgRouting: For routing and network analysis.
- osm2pgsql & osmium-tool: For OpenStreetMap data integration.
The init.sql script is used to initialize the database with extensions, tables, and sample data. It is mounted in the docker-compose.yml file instead of being copied into the image, allowing for easy customization.
- Enables
pgvector,PostGIS, andpgRoutingextensions. - Creates tables for testing vector embeddings and geospatial data.
- Inserts sample data for testing.
- Includes a cleanup block to ensure no stale data is left behind.
services:
pgvector:
image: pgvector
volumes: - ./init.sql:/docker-entrypoint-initdb.d/init.sqlWhen the container starts, the init.sql script runs and initializes the database. Here's what happens:
-
Extensions Created:
- All required extensions (
vector,postgis,postgis_topology,postgis_raster, andpgrouting) are created successfully.
- All required extensions (
-
Tables Created:
- Tables for testing (
document_embeddingsandlocations) are created.
- Tables for testing (
-
Data Inserted:
- Sample data is inserted into both tables:
document_embeddings: 2 rows inserted.locations: 4 rows inserted (including SΓ£o Paulo, New York, London, and Tokyo).
- Sample data is inserted into both tables:
-
Indexes Created:
- Indexes are created for both tables to optimize performance.
-
Warnings:
- A warning about the
ivfflatindex being created with little data is expected. This is because the table only contains 2 rows of sample data. The index is more effective with larger datasets.
- A warning about the
-
Success Message:
- A success message confirms that all extensions, tables, and test data have been initialized successfully.
This project follows a dual licensing structure:
- π
._rbs/β Licensed under the terms specified in ._rbs/LICENSE. - π
pgvectorβ Maintained under its original license as defined in LICENSE.
For full details, please refer to the respective license files.
If you find this project useful, please β star the repository and share your feedback!
β€οΈ Maintained by github.com/robsoncombr