Thanks to visit codestin.com
Credit goes to github.com

Skip to content

robsoncombr/rbs-pgvector

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸŽ‰ Welcome to rbs-pgvector! πŸš€βœ¨

GitHub Repo PostgreSQL License


πŸ”— Related Repositories

  • πŸ”Ή pgvector/pgvector β†’ Original repository, serving as the upstream for fk-pgvector.
  • πŸ”Ή robsoncombr/fk-pgvector β†’ A direct fork of pgvector without modifications, used as an upstream for this repository.
  • πŸ”Ή robsoncombr/rbs-pgvector β†’ This repository, maintaining self-driven enhancements and improvements.

πŸ” Repository Highlights

πŸ“‚ Artifacts

  • ._rbs/ β†’ πŸš€ A dedicated space for project-related files, including a base template for quickly starting pgvector with Docker Compose.
  • README.upstream.md β†’ The original README.md file from the upstream repository.
  • README.md β†’ This file.

🌿 Branches

  • rbs-pgvector/upstream β†’ The upstream branch, regularly merged from robsoncombr/fk-pgvector.
  • rbs-pgvector/* β†’ All other branches contain custom changes, fixes, or new features, typically committed with the [rbs] prefix.

✨ Features

  • 🧠 Efficient Vector Similarity Search (L2, Cosine, Inner Product).
  • ⚑ Optimized for Performance on PostgreSQL.
  • πŸ— Scalable & Production-Ready with advanced indexing support.
  • πŸ›  Seamless Integration with AI applications and machine learning models.
  • πŸ” Fully Compatible with pgvector and enhanced for better performance.
  • 🌍 Geo Extensions β†’ Includes PostGIS, pgRouting, and related extensions for geospatial data.

πŸ“¦ Installation

1️⃣ Install PostgreSQL and pgvector

Ensure PostgreSQL is installed and activate the pgvector extension:

# Install the pgvector extension

# psql -d your_database -c "CREATE EXTENSION IF NOT EXISTS vector;"

### 2️⃣ Clone the Repository

```sh
git clone https://github.com/robsoncombr/rbs-pgvector.git
cd rbs-pgvector

3️⃣ Set Up the Database

Run schema migrations:

psql -d your_database -f schema.sql

πŸš€ Usage Example

Storing & Searching Vectors

CREATE TABLE embeddings (
id SERIAL PRIMARY KEY,
data VECTOR(1536) -- Example for OpenAI's 1536-dimension embeddings
);

-- Insert a vector
INSERT INTO embeddings (data) VALUES ('[0.1, 0.2, 0.3, ..., 0.9]');

-- Find the closest match using Cosine Similarity
SELECT \* FROM embeddings ORDER BY data <=> '[0.1, 0.2, 0.3, ..., 0.9]' LIMIT 5;

πŸ›  Rebuilding pgvector with Geo Extensions

The Dockerfile in this repository rebuilds the pgvector image with additional geo extensions, including PostGIS, pgRouting, and related tools. Here's how it works:

Dockerfile Overview

ARG PG_MAJOR=17

# Stage 1: Use postgres image to get the apt lists

FROM postgres:$PG_MAJOR AS postgres_lists
ARG PG_MAJOR

# Stage 2: Use pgvector image and copy the apt lists from Stage 1

FROM pgvector/pgvector:0.8.0-pg17

# Copy the apt lists from the postgres image

COPY --from=postgres_lists /var/lib/apt/lists /var/lib/apt/lists

# Install the required packages

RUN apt-get update && \
 apt-mark hold locales && \
 apt-get install -y --no-install-recommends \
 curl \
 wget \
 unzip \
 osm2pgsql \
 osmium-tool \
 postgis \
 postgresql-17-pgrouting && \
 apt-get autoremove -y && \
 apt-mark unhold locales && \
 apt-get clean autoclean && \
 apt-get autoremove --yes && \
 rm -rf /var/lib/{apt,dpkg,cache,log}/

Key Additions

  • PostGIS: For geospatial data support.
  • pgRouting: For routing and network analysis.
  • osm2pgsql & osmium-tool: For OpenStreetMap data integration.

πŸ“‚ Initialization Script (init.sql)

The init.sql script is used to initialize the database with extensions, tables, and sample data. It is mounted in the docker-compose.yml file instead of being copied into the image, allowing for easy customization.

Key Features of init.sql

  • Enables pgvector, PostGIS, and pgRouting extensions.
  • Creates tables for testing vector embeddings and geospatial data.
  • Inserts sample data for testing.
  • Includes a cleanup block to ensure no stale data is left behind.

Mounting in docker-compose.yml

services:
pgvector:
image: pgvector
volumes: - ./init.sql:/docker-entrypoint-initdb.d/init.sql

πŸ“– Initialization Analysis

When the container starts, the init.sql script runs and initializes the database. Here's what happens:

  1. Extensions Created:

    • All required extensions (vector, postgis, postgis_topology, postgis_raster, and pgrouting) are created successfully.
  2. Tables Created:

    • Tables for testing (document_embeddings and locations) are created.
  3. Data Inserted:

    • Sample data is inserted into both tables:
      • document_embeddings: 2 rows inserted.
      • locations: 4 rows inserted (including SΓ£o Paulo, New York, London, and Tokyo).
  4. Indexes Created:

    • Indexes are created for both tables to optimize performance.
  5. Warnings:

    • A warning about the ivfflat index being created with little data is expected. This is because the table only contains 2 rows of sample data. The index is more effective with larger datasets.
  6. Success Message:

    • A success message confirms that all extensions, tables, and test data have been initialized successfully.

πŸ“œ License

This project follows a dual licensing structure:

  • πŸ“‚ ._rbs/ β†’ Licensed under the terms specified in ._rbs/LICENSE.
  • πŸ›  pgvector β†’ Maintained under its original license as defined in LICENSE.

For full details, please refer to the respective license files.


🌟 Support & Community

If you find this project useful, please ⭐ star the repository and share your feedback!


❀️ Maintained by github.com/robsoncombr

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages