From 2bd415cbd30a69a1e0772df2b70a502b2f4b936a Mon Sep 17 00:00:00 2001
From: Jongho Lee <harry0308@naver.com>
Date: Sun, 4 May 2025 01:24:15 +0900
Subject: [PATCH 1/5] pgvector revised

---
 09-VectorStore/08-PGVector.ipynb           | 990 +++++++++++----------
 09-VectorStore/utils/pgvector_interface.py | 180 ++--
 2 files changed, 572 insertions(+), 598 deletions(-)

diff --git a/09-VectorStore/08-PGVector.ipynb b/09-VectorStore/08-PGVector.ipynb
index 3608d1790..07c465595 100644
--- a/09-VectorStore/08-PGVector.ipynb
+++ b/09-VectorStore/08-PGVector.ipynb
@@ -2,6 +2,7 @@
  "cells": [
   {
    "cell_type": "markdown",
+   "id": "25733da0",
    "metadata": {},
    "source": [
     "# PGVector\n",
@@ -13,20 +14,23 @@
     "\n",
     "[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/LangChain-OpenTutorial/LangChain-OpenTutorial/blob/main/09-VectorStore/07-PGVector.ipynb) [![Open in GitHub](https://img.shields.io/badge/Open%20in%20GitHub-181717?style=flat-square&logo=github&logoColor=white)](https://github.com/LangChain-OpenTutorial/LangChain-OpenTutorial/blob/main/09-VectorStore/07-PGVector.ipynb)\n",
     "\n",
-    "## Overview  \n",
+    "## Overview\n",
+    "\n",
+    "This tutorial covers how to use ```PGVector``` with **LangChain** .\n",
     "\n",
     "[```PGVector```](https://github.com/pgvector/pgvector) is an open-source extension for PostgreSQL that allows you to store and search vector data alongside your regular database information.\n",
     "\n",
-    "This notebook shows how to use functionality related to ```PGVector```, implementing LangChain vectorstore abstraction using postgres as the backend and utilizing the pgvector extension.\n",
+    "This tutorial walks you through using **CRUD** operations with the ```PGVector``` **storing** , **updating** , **deleting** documents, and performing **similarity-based retrieval** .\n",
     "\n",
     "### Table of Contents\n",
     "\n",
     "- [Overview](#overview)\n",
     "- [Environment Setup](#environment-setup)\n",
-    "- [What is PGVector?](#what-is-pgvector)\n",
-    "- [Initialization](#initialization)\n",
-    "- [Manage vector store](#manage-vector-store)\n",
-    "- [Similarity search](#similarity-search)\n",
+    "- [What is PGVector?](#what-is-pgvector?)\n",
+    "- [Data](#data)\n",
+    "- [Initial Setting PGVector](#initial-setting-PGVector)\n",
+    "- [Document Manager](#document-manager)\n",
+    "\n",
     "\n",
     "### References\n",
     "\n",
@@ -40,6 +44,7 @@
   },
   {
    "cell_type": "markdown",
+   "id": "c1fac085",
    "metadata": {},
    "source": [
     "## Environment Setup\n",
@@ -47,13 +52,14 @@
     "Set up the environment. You may refer to [Environment Setup](https://wikidocs.net/257836) for more details.\n",
     "\n",
     "**[Note]**\n",
-    "- `langchain-opentutorial` is a package that provides a set of easy-to-use environment setup, useful functions and utilities for tutorials. \n",
-    "- You can checkout the [`langchain-opentutorial`](https://github.com/LangChain-OpenTutorial/langchain-opentutorial-pypi) for more details."
+    "- ```langchain-opentutorial``` is a package that provides a set of easy-to-use environment setup, useful functions and utilities for tutorials. \n",
+    "- You can checkout the [```langchain-opentutorial```](https://github.com/LangChain-OpenTutorial/langchain-opentutorial-pypi) for more details."
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 1,
+   "id": "98da7994",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -63,7 +69,8 @@
   },
   {
    "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 2,
+   "id": "800c732b",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -88,6 +95,7 @@
   {
    "cell_type": "code",
    "execution_count": 3,
+   "id": "5b36bafa",
    "metadata": {},
    "outputs": [
     {
@@ -113,9 +121,20 @@
     ")"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "8011a0c7",
+   "metadata": {},
+   "source": [
+    "You can alternatively set API keys such as ```OPENAI_API_KEY``` in a ```.env``` file and load them.\n",
+    "\n",
+    "[Note] This is not necessary if you've already set the required API keys in previous steps."
+   ]
+  },
   {
    "cell_type": "code",
-   "execution_count": 5,
+   "execution_count": 4,
+   "id": "70d7e764",
    "metadata": {},
    "outputs": [
     {
@@ -124,7 +143,7 @@
        "True"
       ]
      },
-     "execution_count": 5,
+     "execution_count": 4,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -137,39 +156,154 @@
   },
   {
    "cell_type": "markdown",
+   "id": "6890920d",
+   "metadata": {},
+   "source": [
+    "Please write down what you need to set up the Vectorstore here."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6f3b5bd2",
+   "metadata": {},
+   "source": [
+    "## Data\n",
+    "\n",
+    "This part walks you through the **data preparation process** .\n",
+    "\n",
+    "This section includes the following components:\n",
+    "\n",
+    "- Introduce Data\n",
+    "\n",
+    "- Preprocessing Data\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "508ae7f7",
+   "metadata": {},
+   "source": [
+    "### Introduce Data\n",
+    "\n",
+    "In this tutorial, we will use the fairy tale **📗 The Little Prince** in PDF format as our data.\n",
+    "\n",
+    "This material complies with the **Apache 2.0 license** .\n",
+    "\n",
+    "The data is used in a text (.txt) format converted from the original PDF.\n",
+    "\n",
+    "You can view the data at the link below.\n",
+    "- [Data Link](https://huggingface.co/datasets/sohyunwriter/the_little_prince)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "004ea4f4",
+   "metadata": {},
+   "source": [
+    "### Preprocessing Data\n",
+    "\n",
+    "In this tutorial section, we will preprocess the text data from The Little Prince and convert it into a list of ```LangChain Document``` objects with metadata. \n",
+    "\n",
+    "Each document chunk will include a ```title``` field in the metadata, extracted from the first line of each section."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "id": "8e4cac64",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from langchain.schema import Document\n",
+    "from langchain.text_splitter import RecursiveCharacterTextSplitter\n",
+    "import re\n",
+    "from typing import List\n",
+    "\n",
+    "\n",
+    "def preprocessing_data(content: str) -> List[Document]:\n",
+    "    # 1. Split the text by double newlines to separate sections\n",
+    "    blocks = content.split(\"\\n\\n\")\n",
+    "\n",
+    "    # 2. Initialize the text splitter\n",
+    "    text_splitter = RecursiveCharacterTextSplitter(\n",
+    "        chunk_size=500,  # Maximum number of characters per chunk\n",
+    "        chunk_overlap=50,  # Overlap between chunks to preserve context\n",
+    "        separators=[\"\\n\\n\", \"\\n\", \" \"],  # Order of priority for splitting\n",
+    "    )\n",
+    "\n",
+    "    documents = []\n",
+    "\n",
+    "    # 3. Loop through each section\n",
+    "    for block in blocks:\n",
+    "        lines = block.strip().splitlines()\n",
+    "        if not lines:\n",
+    "            continue\n",
+    "\n",
+    "        # Extract title from the first line using square brackets [ ]\n",
+    "        first_line = lines[0]\n",
+    "        title_match = re.search(r\"\\[(.*?)\\]\", first_line)\n",
+    "        title = title_match.group(1).strip() if title_match else None\n",
+    "\n",
+    "        # Remove the title line from content\n",
+    "        body = \"\\n\".join(lines[1:]).strip()\n",
+    "        if not body:\n",
+    "            continue\n",
+    "\n",
+    "        # 4. Chunk the section using the text splitter\n",
+    "        chunks = text_splitter.split_text(body)\n",
+    "\n",
+    "        # 5. Create a LangChain Document for each chunk with the same title metadata\n",
+    "        for chunk in chunks:\n",
+    "            documents.append(Document(page_content=chunk, metadata={\"title\": title}))\n",
+    "\n",
+    "    print(f\"Generated {len(documents)} chunked documents.\")\n",
+    "\n",
+    "    return documents"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "id": "1d091a51",
    "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Generated 262 chunked documents.\n"
+     ]
+    }
+   ],
    "source": [
-    "## What is PGVector?\n",
+    "# Load the entire text file\n",
+    "with open(\"./data/the_little_prince.txt\", \"r\", encoding=\"utf-8\") as f:\n",
+    "    content = f.read()\n",
     "\n",
-    "`PGVector` is a ```PostgreSQL``` extension that enables vector similarity search directly within your ```PostgreSQL``` database, making it ideal for AI applications, semantic search, and recommendation systems.\n",
+    "# Preprocessing Data\n",
     "\n",
-    "This is particularly valuable for who already use ```PostgreSQL``` who want to add vector search capabilities without managing separate infrastructure or learning new query languages.\n",
+    "docs = preprocessing_data(content=content)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1977d4ff",
+   "metadata": {},
+   "source": [
+    "## Initial Setting PGVector\n",
     "\n",
-    "**Features** :\n",
-    "1. Native ```PostgreSQL``` integration with standard SQL queries\n",
-    "2. Multiple similarity search methods including L2, Inner Product, Cosine\n",
-    "3. Several indexing options including HNSW and IVFFlat\n",
-    "4. Support for up to 2,000 dimensions per vector\n",
-    "5. ACID compliance inherited from ```PostgreSQL```\n",
+    "This part walks you through the initial setup of ```PGVector```\n",
     "\n",
-    "**Advantages** :\n",
+    "This section includes the following components:\n",
     "\n",
-    "1. Free and open-source\n",
-    "2. Easy integration with existing ```PostgreSQL``` databases\n",
-    "3. Full SQL functionality and transactional support\n",
-    "4. No additional infrastructure needed\n",
-    "5. Supports hybrid searches combining vector and traditional SQL queries\n",
+    "- Load Embedding Model\n",
     "\n",
-    "**Disadvantages** :\n",
-    "1. Performance limitations with very large datasets (billions of vectors)\n",
-    "2. Limited to single-node deployment\n",
-    "3. Memory-intensive for large vector dimensions\n",
-    "4. Requires manual optimization for best performance\n",
-    "5. Less specialized features compared to dedicated vector databases"
+    "- Load ```PGVector``` Client"
    ]
   },
   {
    "cell_type": "markdown",
+   "id": "835e5c9e",
    "metadata": {},
    "source": [
     "### Set up PGVector\n",
@@ -186,7 +320,7 @@
     "\n",
     "For more detailed instructions, please refer to [the official documentation](https://github.com/pgvector/pgvector) \n",
     "\n",
-    "** [ NOTE ] **\n",
+    "**[ NOTE ]**\n",
     "* If you want to maintain the stored data even after container being deleted, you must mount volume like below:\n",
     "```bash\n",
     "docker run --name pgvector-container -v {/mount/path}:/var/lib/postgresql/data -e POSTGRES_USER=langchain -e POSTGRES_PASSWORD=langchain -e POSTGRES_DB=langchain -p 6024:5432 -d pgvector/pgvector:pg16\n",
@@ -195,32 +329,133 @@
   },
   {
    "cell_type": "markdown",
+   "id": "7eee56b2",
    "metadata": {},
    "source": [
-    "## Initialization\n",
+    "### Load Embedding Model\n",
+    "\n",
+    "In the **Load Embedding Model** section, you'll learn how to load an embedding model.\n",
+    "\n",
+    "This tutorial uses **OpenAI's** **API-Key** for loading the model.\n",
+    "\n",
+    "*💡 If you prefer to use another embedding model, see the instructions below.*\n",
+    "- [Embedding Models](https://python.langchain.com/docs/integrations/text_embedding/)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "id": "5bd5c3c9",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import os\n",
+    "from langchain_openai import OpenAIEmbeddings\n",
     "\n",
-    "If you are successfully running the pgvector container, you can use ```pgVectorIndexManager``` from ```pgvector_interface``` in utils directory to handle collections.\n",
+    "embedding = OpenAIEmbeddings(model=\"text-embedding-3-large\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "40f65795",
+   "metadata": {},
+   "source": [
+    "### Load PGVector Client\n",
     "\n",
-    "To initialize ```pgVectorIndexManager``` you can pass full connection string or pass each parameter separately."
+    "In the **Load ```PGVector``` Client** section, we cover how to load the **database client object** using the **Python SDK** for ```PGVector``` .\n",
+    "- [PGVector Python SDK Docs](https://github.com/pgvector/pgvector)"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 8,
+   "id": "eed0ebad",
    "metadata": {},
    "outputs": [],
    "source": [
-    "from utils.pgvector_interface import pgVectorIndexManager\n",
+    "from sqlalchemy import create_engine\n",
+    "\n",
+    "# Create Database Client Object Function\n",
+    "\n",
+    "\n",
+    "def get_db_client(conn_str):\n",
+    "    \"\"\"\n",
+    "\n",
     "\n",
-    "# Setup connection infomation\n",
-    "conn_str = \"postgresql+psycopg://langchain:langchain@localhost:6024/langchain\"\n",
+    "    Initializes and returns a VectorStore client instance.\n",
+    "\n",
+    "\n",
+    "\n",
+    "    This function loads configuration (e.g., API key, host) from environment\n",
+    "\n",
+    "\n",
+    "    variables or default values and creates a client object to interact\n",
+    "\n",
+    "\n",
+    "    with the {vectordb} Python SDK.\n",
+    "\n",
+    "\n",
+    "\n",
+    "    Returns:\n",
+    "\n",
+    "\n",
+    "        client:ClientType - An instance of the {vectordb} client.\n",
+    "\n",
+    "\n",
+    "\n",
+    "    Raises:\n",
+    "\n",
+    "\n",
+    "        ValueError: If required configuration is missing.\n",
+    "\n",
+    "\n",
+    "    \"\"\"\n",
+    "    try:\n",
+    "        client = create_engine(url=conn_str, **({}))\n",
+    "    except Exception as e:\n",
+    "        raise e\n",
+    "    else:\n",
+    "        return client"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "id": "2b5f4116",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Get DB Client Object\n",
+    "conn_str = \"postgresql+psycopg://langchain:langchain@localhost:6088/langchain\"\n",
+    "client = get_db_client(conn_str)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2e8f4075",
+   "metadata": {},
+   "source": [
+    "If you are successfully running the ```PGVector``` container and get client objecct, you can use ```pgVectorIndexManager``` from ```pgvector_interface``` in utils directory to handle collections.\n",
+    "\n",
+    "You can also initialize ```pgVectorIndexManager``` by passing full connection string or each parameter separately instead of passing client."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "id": "ba8f2308",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from utils.pgvector_interface import pgVectorIndexManager\n",
     "\n",
     "# Initialize pgVectorIndexManaer\n",
-    "index_manager = pgVectorIndexManager(connection=conn_str)"
+    "index_manager = pgVectorIndexManager(client=client)"
    ]
   },
   {
    "cell_type": "markdown",
+   "id": "734dc3da",
    "metadata": {},
    "source": [
     "When you initialize ```pgVectorIndexManager```, the procedure will automatically create two tables\n",
@@ -247,6 +482,7 @@
   },
   {
    "cell_type": "markdown",
+   "id": "f83b661d",
    "metadata": {},
    "source": [
     "## Create collection\n",
@@ -263,13 +499,13 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 6,
+   "execution_count": 11,
+   "id": "4742c2ff",
    "metadata": {},
    "outputs": [],
    "source": [
     "import getpass\n",
     "import os\n",
-    "\n",
     "if not os.environ.get(\"OPENAI_API_KEY\"):\n",
     "    os.environ[\"OPENAI_API_KEY\"] = getpass.getpass(\"Enter API key for OpenAI: \")\n",
     "\n",
@@ -280,115 +516,46 @@
   },
   {
    "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 12,
+   "id": "d92c6846",
    "metadata": {},
    "outputs": [],
    "source": [
     "# create new collection\n",
-    "col_manager = index_manager.create_index(\n",
-    "    collection_name=\"langchain_opentutorial\", embedding=embeddings\n",
+    "_ = index_manager.create_index(\n",
+    "    collection_name=\"tutorial_collection\", embedding=embeddings\n",
     ")"
    ]
   },
   {
    "cell_type": "markdown",
+   "id": "3a5a97a0",
    "metadata": {},
    "source": [
-    "### List collections\n",
+    "## Document Manager\n",
     "\n",
-    "As we have created a new collection, we will call the ```list_indexes``` method to check if the collection is created."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 8,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "['langchain_opentutorial']\n"
-     ]
-    }
-   ],
-   "source": [
-    "# check collections\n",
-    "indexes = index_manager.list_indexes()\n",
-    "print(indexes)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "### Delete collections\n",
+    "To support the **Langchain-Opentutorial** , we implemented a custom set of **CRUD** functionalities for VectorDBs. \n",
     "\n",
-    "We can also delete collection by calling the ```delete_index``` method by pass the name of the collection to delete.\n",
+    "The following operations are included:\n",
     "\n",
-    "We delete **langchain_opentutorial** collection, and then create it again."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "[]\n"
-     ]
-    }
-   ],
-   "source": [
-    "# delete collection\n",
-    "index_manager.delete_index(\"langchain_opentutorial\")\n",
+    "- ```upsert``` : Update existing documents or insert if they don’t exist\n",
     "\n",
-    "# check collections\n",
-    "indexes = index_manager.list_indexes()\n",
-    "print(indexes)\n",
+    "- ```upsert_parallel``` : Perform upserts in parallel for large-scale data\n",
     "\n",
-    "# Create again\n",
-    "col_manager_tmp1 = index_manager.create_index(\n",
-    "    collection_name=\"langchain_opentutorial\", embedding=embeddings\n",
-    ")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Get collection\n",
-    "As we said, when you create a new collection by calling the ```create_index``` method, this will automatically return ```pgVectorDocumentManager``` instance.\n",
+    "- ```similarity_search``` : Search for similar documents based on embeddings\n",
     "\n",
-    "But if you want to re-use already created collection, you can call the ```get_index``` method with name of the collection and embedding model you used to create the collection to get manager."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# Get collection\n",
-    "col_manager_tmp2 = index_manager.get_index(\n",
-    "    embedding=embeddings, collection_name=\"langchain_opentutorial\"\n",
-    ")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Manage vector store\n",
+    "- ```delete``` : Remove documents based on filter conditions\n",
     "\n",
-    "Once you have created your vector store, we can interact with it by adding and deleting different items."
+    "Each of these features is implemented as class methods specific to each VectorDB.\n",
+    "\n",
+    "In this tutorial, you can easily utilize these methods to interact with your VectorDB.\n",
+    "\n",
+    "*We plan to continuously expand the functionality by adding more common operations in the future.*"
    ]
   },
   {
    "cell_type": "markdown",
+   "id": "4e89549b-fbb4-4a9d-b01d-1898d129b1e2",
    "metadata": {},
    "source": [
     "### Filtering\n",
@@ -410,7 +577,7 @@
     "| \\$and     | Logical (and)           |\n",
     "| \\$or      | Logical (or)            |\n",
     "\n",
-    "Filter can be used with ```scroll```, ```delete```, and ```search``` methods.\n",
+    "Filter can be used with ```delete```, and ```search``` methods.\n",
     "\n",
     "To apply filter, we create a dictionary and pass it to ```filter``` parameter like the following\n",
     "```python\n",
@@ -420,513 +587,382 @@
   },
   {
    "cell_type": "markdown",
+   "id": "65a40601",
    "metadata": {},
    "source": [
-    "### Connect to index\n",
-    "To add, delete, search items, we need to initialize an object which connected to the index we operate on.\n",
+    "### Create Instance\n",
     "\n",
-    "We will connect to **langchain_opentutorial** . Recall that we used basic ```OpenAIEmbedding``` as a embedding function, and thus we need to pass it when we initialize ```index_manager``` object.\n",
+    "First, we create an instance of the **{vectordb}** helper class to use its CRUD functionalities.\n",
     "\n",
-    "Remember that we also can get ```pgVectorDocumentManager``` object when we create an index with ```pgVectorIndexManager``` object or ```pgVectorIndexManager.get_index``` method, but this time we call it directly to get an ```pgVectorDocumentManager``` object."
+    "This class is initialized with the **{vectordb} Python SDK client instance** and the **embedding model instance** , both of which were defined in the previous section."
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 13,
+   "id": "dccab807",
    "metadata": {},
    "outputs": [],
    "source": [
-    "from utils.pgvector_interface import pgVectorDocumentManager\n",
+    "from utils.pgvector_interface import pgVectorCRUDManager\n",
     "\n",
-    "# Get document manager\n",
-    "col_manager = pgVectorDocumentManager(\n",
-    "    embedding=embeddings,\n",
-    "    connection_info=conn_str,\n",
-    "    collection_name=\"langchain_opentutorial\",\n",
+    "crud_manager = pgVectorCRUDManager(\n",
+    "    client=client, embedding=embedding, collection_name=\"tutorial_collection\"\n",
     ")"
    ]
   },
   {
    "cell_type": "markdown",
+   "id": "c1c0c67f",
    "metadata": {},
    "source": [
-    "### Data Preprocessing\n",
+    "Now you can use the following **CRUD** operations with the ```crud_manager``` instance.\n",
     "\n",
-    "Below is the preprocessing process for general documents.\n",
-    "\n",
-    "- Need to extract **metadata** from documents\n",
-    "- Filter documents by minimum length.\n",
-    "  \n",
-    "- Determine whether to use ```basename``` or not. Default is ```False```.\n",
-    "  - ```basename``` denotes the last value of the filepath.\n",
-    "  - For example, **document.pdf** will be the ```basename``` for the filepath **./data/document.pdf** ."
+    "These instance allow you to easily manage documents in your ```PGVector```."
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": 12,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# This is a long document we can split up.\n",
-    "data_path = \"./data/the_little_prince.txt\"\n",
-    "with open(data_path, encoding=\"utf8\") as f:\n",
-    "    raw_text = f.read()"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 13,
+   "cell_type": "markdown",
+   "id": "7c6c53c5",
    "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "page_content='The Little Prince\n",
-      "Written By Antoine de Saiot-Exupery (1900〜1944)'\n"
-     ]
-    }
-   ],
    "source": [
-    "from langchain_text_splitters import RecursiveCharacterTextSplitter\n",
-    "from uuid import uuid4\n",
+    "### Upsert Document\n",
     "\n",
-    "# define text splitter\n",
-    "text_splitter = RecursiveCharacterTextSplitter(\n",
-    "    # Set a really small chunk size, just to show.\n",
-    "    chunk_size=100,\n",
-    "    chunk_overlap=20,\n",
-    "    length_function=len,\n",
-    "    is_separator_regex=False,\n",
-    ")\n",
+    "**Update** existing documents or **insert** if they don’t exist\n",
+    "\n",
+    "**✅ Args**\n",
+    "\n",
+    "- ```texts``` : Iterable[str] – List of text contents to be inserted/updated.\n",
+    "\n",
+    "- ```metadatas``` : Optional[List[Dict]] – List of metadata dictionaries for each text (optional).\n",
+    "\n",
+    "- ```ids``` : Optional[List[str]] – Custom IDs for the documents. If not provided, IDs will be auto-generated.\n",
     "\n",
-    "# split raw text by splitter.\n",
-    "split_docs = text_splitter.create_documents([raw_text])\n",
+    "- ```**kwargs``` : Extra arguments for the underlying vector store.\n",
     "\n",
-    "# print one of documents to check its structure\n",
-    "print(split_docs[0])"
+    "**🔄 Return**\n",
+    "\n",
+    "- ```ids``` : IDs of the upserted documents."
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 14,
+   "id": "f3a6c32b",
    "metadata": {},
    "outputs": [],
    "source": [
-    "# define document preprocessor\n",
-    "def preprocess_documents(\n",
-    "    split_docs, metadata_keys, min_length, use_basename=False, **kwargs\n",
-    "):\n",
-    "    metadata = kwargs\n",
+    "from uuid import uuid4\n",
     "\n",
-    "    if use_basename:\n",
-    "        assert metadata.get(\"source\", None) is not None, \"source must be provided\"\n",
-    "        metadata[\"source\"] = metadata[\"source\"].split(\"/\")[-1]\n",
+    "ids = [str(uuid4()) for _ in docs]\n",
     "\n",
-    "    result_docs = []\n",
-    "    for idx, doc in enumerate(split_docs):\n",
-    "        if len(doc.page_content) < min_length:\n",
-    "            continue\n",
-    "        for k in metadata_keys:\n",
-    "            doc.metadata.update({k: metadata.get(k, \"\")})\n",
-    "        doc.metadata.update({\"page\": idx + 1, \"id\": str(uuid4())})\n",
-    "        result_docs.append(doc)\n",
     "\n",
-    "    return result_docs"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 15,
-   "metadata": {
-    "tags": []
-   },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "page_content='The Little Prince\n",
-      "Written By Antoine de Saiot-Exupery (1900〜1944)' metadata={'source': 'the_little_prince.txt', 'page': 1, 'author': 'Saiot-Exupery', 'id': 'cc23e228-2540-4e5c-8eb3-be6df7a3bf77'}\n"
-     ]
-    }
-   ],
-   "source": [
-    "# preprocess raw documents\n",
-    "processed_docs = preprocess_documents(\n",
-    "    split_docs=split_docs,\n",
-    "    metadata_keys=[\"source\", \"page\", \"author\"],\n",
-    "    min_length=5,\n",
-    "    use_basename=True,\n",
-    "    source=data_path,\n",
-    "    author=\"Saiot-Exupery\",\n",
-    ")\n",
+    "args = {\n",
+    "    \"texts\": [doc.page_content for doc in docs[:2]],\n",
+    "    \"metadatas\": [doc.metadata for doc in docs[:2]],\n",
+    "    \"ids\": ids[:2],\n",
+    "}\n",
+    "\n",
     "\n",
-    "# print one of preprocessed document to chekc its structure\n",
-    "print(processed_docs[0])"
+    "upsert_result = crud_manager.upsert(**args)"
    ]
   },
   {
    "cell_type": "markdown",
+   "id": "278fe1ed",
    "metadata": {},
    "source": [
-    "### Add items to vector store\n",
+    "### Upsert Parallel Document\n",
     "\n",
-    "We can add items to our vector store by using the ```upsert``` or ```upsert_parallel``` method.\n",
+    "Perform **upserts** in **parallel** for large-scale data\n",
     "\n",
-    "If you pass ids along with documents, then ids will be used, but if you do not pass ids, it will be created based `page_content` using md5 hash function.\n",
+    "**✅ Args**\n",
     "\n",
-    "Basically, ```upsert``` and ```upsert_parallel``` methods do upsert not insert, based on **id** of the item.\n",
+    "- ```texts``` : Iterable[str] – List of text contents to be inserted/updated.\n",
     "\n",
-    "So if you provided id and want to update data, you must provide the same id that you provided at first upsertion.\n",
+    "- ```metadatas``` : Optional[List[Dict]] – List of metadata dictionaries for each text (optional).\n",
     "\n",
-    "We will upsert data to collection, **langchain_opentutorial** , with ```upsert``` method for the first half, and with ```upsert_parallel``` for the second half."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 16,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Number of documents: 1359\n"
-     ]
-    }
-   ],
-   "source": [
-    "# Gather uuids, texts, metadatas\n",
-    "uuids = [doc.metadata[\"id\"] for doc in processed_docs]\n",
-    "texts = [doc.page_content for doc in processed_docs]\n",
-    "metadatas = [doc.metadata for doc in processed_docs]\n",
+    "- ```ids``` : Optional[List[str]] – Custom IDs for the documents. If not provided, IDs will be auto-generated.\n",
     "\n",
-    "# Get total number of documents\n",
-    "total_number = len(processed_docs)\n",
-    "print(\"Number of documents:\", total_number)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 17,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "CPU times: user 1.57 s, sys: 140 ms, total: 1.71 s\n",
-      "Wall time: 5.46 s\n"
-     ]
-    }
-   ],
-   "source": [
-    "%%time\n",
-    "# upsert documents\n",
-    "upsert_result = col_manager.upsert(\n",
-    "    \n",
-    "    texts=texts[:total_number//2], metadatas=metadatas[:total_number//2], ids=uuids[:total_number//2]\n",
-    ")"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 18,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "CPU times: user 1.79 s, sys: 82.9 ms, total: 1.88 s\n",
-      "Wall time: 4.96 s\n"
-     ]
-    }
-   ],
-   "source": [
-    "%%time\n",
-    "# upsert documents parallel\n",
-    "upsert_parallel_result = col_manager.upsert_parallel(\n",
-    "    texts = texts[total_number//2 :],\n",
-    "    metadatas = metadatas[total_number//2:],\n",
-    "    ids = uuids[total_number//2:],\n",
-    "    batch_size=32,\n",
-    "    max_workers=8\n",
-    ")"
+    "- ```batch_size``` : int – Number of documents per batch (default: 32).\n",
+    "\n",
+    "- ```workers``` : int – Number of parallel workers (default: 10).\n",
+    "\n",
+    "- ```**kwargs``` : Extra arguments for the underlying vector store.\n",
+    "\n",
+    "**🔄 Return**\n",
+    "\n",
+    "- ```ids``` : IDs of the upserted documents."
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 19,
+   "execution_count": 15,
+   "id": "a89dd8e0",
    "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "1359\n",
-      "Manual Ids == Output Ids: True\n"
-     ]
-    }
-   ],
+   "outputs": [],
    "source": [
-    "result = upsert_result + upsert_parallel_result\n",
-    "\n",
-    "# check number of ids upserted\n",
-    "print(len(result))\n",
+    "args = {\n",
+    "    \"texts\": [doc.page_content for doc in docs],\n",
+    "    \"metadatas\": [doc.metadata for doc in docs],\n",
+    "    \"ids\": ids,\n",
+    "    \"batch_size\": 32,\n",
+    "    \"max_workers\": 8,\n",
+    "}\n",
     "\n",
-    "# check manual ids are the same as output ids\n",
-    "print(\"Manual Ids == Output Ids:\", sorted(result) == sorted(uuids))"
+    "upsert_parallel_result = crud_manager.upsert_parallel(**args)"
    ]
   },
   {
    "cell_type": "markdown",
+   "id": "6beea197",
    "metadata": {},
    "source": [
-    "**[ NOTE ]**\n",
+    "### Similarity Search\n",
     "\n",
-    "As we have only one table, **langchain_pg_embedding** to store data, we have only one column **cmetadata** to store metadata for each document.\n",
+    "Search for **similar documents** based on **embeddings** .\n",
     "\n",
-    "The **cmetadata** column is jsonb type, and thus if you want to update the metadata, you should provide not only the new metadata key-value you want to update, but with all the metadata already stored."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "### Scroll items from vector store\n",
-    "As we have added some items to our first vector store, named **langchain_opentutorial** , we can scroll items from the vector store.\n",
+    "This method uses **\"cosine similarity\"** .\n",
     "\n",
-    "This can be done by calling ```scroll``` method.\n",
     "\n",
-    "When we scroll items from the vector store we can pass ```ids``` or ```filter``` to get items that we want, or just call ```scroll``` to get ```k```(*default 10*) items.\n",
+    "**✅ Args**\n",
     "\n",
-    "We can get embedded vector values of each items by set ```include_embedding``` True."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 20,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Number of items scrolled: 10\n",
-      "{'content': 'The Little Prince\\nWritten By Antoine de Saiot-Exupery (1900〜1944)', 'metadata': {'id': 'cc23e228-2540-4e5c-8eb3-be6df7a3bf77', 'page': 1, 'author': 'Saiot-Exupery', 'source': 'the_little_prince.txt'}, 'embedding': None}\n"
-     ]
-    }
-   ],
-   "source": [
-    "# Do scroll without ids or filter\n",
-    "scroll_result = col_manager.scroll()\n",
+    "- ```query``` : str – The text query for similarity search.\n",
+    "\n",
+    "- ```k``` : int – Number of top results to return (default: 10).\n",
     "\n",
-    "# print the number of items scrolled and first item that returned.\n",
-    "print(f\"Number of items scrolled: {len(scroll_result)}\")\n",
-    "print(scroll_result[0])"
+    "```**kwargs``` : Additional search options (e.g., filters).\n",
+    "\n",
+    "**🔄 Return**\n",
+    "\n",
+    "- ```results``` : List[Document] – A list of LangChain Document objects ranked by similarity."
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 21,
+   "execution_count": 16,
+   "id": "5859782b",
    "metadata": {},
    "outputs": [
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "Number of items scrolled: 3\n",
-      "{'content': 'The Little Prince\\nWritten By Antoine de Saiot-Exupery (1900〜1944)', 'metadata': {'id': 'cc23e228-2540-4e5c-8eb3-be6df7a3bf77', 'page': 1, 'author': 'Saiot-Exupery', 'source': 'the_little_prince.txt'}, 'embedding': None}\n",
-      "{'content': '[ Antoine de Saiot-Exupery ]', 'metadata': {'id': 'd4bf8981-2af4-4288-8aaf-6586381973c4', 'page': 2, 'author': 'Saiot-Exupery', 'source': 'the_little_prince.txt'}, 'embedding': None}\n",
-      "{'content': 'Over the past century, the thrill of flying has inspired some to perform remarkable feats of', 'metadata': {'id': '31dc52cf-530b-449c-a3db-ec64d9e1a10c', 'page': 3, 'author': 'Saiot-Exupery', 'source': 'the_little_prince.txt'}, 'embedding': None}\n"
+      "Rank 1\n",
+      "Contents : And he went back to meet the fox. \n",
+      "\"Goodbye,\" he said. \n",
+      "\"Goodbye,\" said the fox. \"And now here is my secret, a very simple secret: It is only with the heart that one can see rightly; what is essential is invisible to the eye.\" \n",
+      "\"What is essential is invisible to the eye,\" the little prince repeated, so that he would be sure to remember.\n",
+      "\"It is the time you have wasted for your rose that makes your rose so important.\"\n",
+      "Metadata: {'title': 'Chapter 21'}\n",
+      "Similarity Score: 0.5095177281477812\n",
+      "\n",
+      "Rank 2\n",
+      "Contents : \"Yes,\" I said to the little prince. \"The house, the stars, the desert-- what gives them their beauty is something that is invisible!\" \n",
+      "\"I am glad,\" he said, \"that you agree with my fox.\"\n",
+      "Metadata: {'title': 'Chapter 24'}\n",
+      "Similarity Score: 0.4950920951146853\n",
+      "\n",
+      "Rank 3\n",
+      "Contents : \"The men where you live,\" said the little prince, \"raise five thousand roses in the same garden-- and they do not find in it what they are looking for.\" \n",
+      "\"They do not find it,\" I replied. \n",
+      "\"And yet what they are looking for could be found in one single rose, or in a little water.\" \n",
+      "\"Yes, that is true,\" I said. \n",
+      "And the little prince added: \n",
+      "\"But the eyes are blind. One must look with the heart...\"\n",
+      "Metadata: {'title': 'Chapter 25'}\n",
+      "Similarity Score: 0.4223722219467283\n",
+      "\n"
      ]
     }
    ],
    "source": [
-    "# Do scroll with filter\n",
-    "scroll_result = col_manager.scroll(filter={\"page\": {\"$in\": [1, 2, 3]}})\n",
+    "# Search by Query\n",
     "\n",
-    "# print the number of items scrolled and all items that returned.\n",
-    "print(f\"Number of items scrolled: {len(scroll_result)}\")\n",
-    "for r in scroll_result:\n",
-    "    print(r)"
+    "results = crud_manager.search(query=\"What is essential is invisible to the eye.\", k=3)\n",
+    "for idx, result in enumerate(results):\n",
+    "    print(f\"Rank {idx+1}\")\n",
+    "    print(f\"Contents : {result['content']}\")\n",
+    "    print(f\"Metadata: {result['metadata']}\")\n",
+    "    print(f\"Similarity Score: {result['score']}\")\n",
+    "    print()"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 22,
+   "execution_count": 17,
+   "id": "2577dd4a",
    "metadata": {},
    "outputs": [
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "Number of items scrolled: 3\n",
-      "{'content': 'The Little Prince\\nWritten By Antoine de Saiot-Exupery (1900〜1944)', 'metadata': {'id': 'cc23e228-2540-4e5c-8eb3-be6df7a3bf77', 'page': 1, 'author': 'Saiot-Exupery', 'source': 'the_little_prince.txt'}, 'embedding': None}\n",
-      "{'content': '[ Antoine de Saiot-Exupery ]', 'metadata': {'id': 'd4bf8981-2af4-4288-8aaf-6586381973c4', 'page': 2, 'author': 'Saiot-Exupery', 'source': 'the_little_prince.txt'}, 'embedding': None}\n",
-      "{'content': 'Over the past century, the thrill of flying has inspired some to perform remarkable feats of', 'metadata': {'id': '31dc52cf-530b-449c-a3db-ec64d9e1a10c', 'page': 3, 'author': 'Saiot-Exupery', 'source': 'the_little_prince.txt'}, 'embedding': None}\n"
+      "Rank 1\n",
+      "Contents : \"The men where you live,\" said the little prince, \"raise five thousand roses in the same garden-- and they do not find in it what they are looking for.\" \n",
+      "\"They do not find it,\" I replied. \n",
+      "\"And yet what they are looking for could be found in one single rose, or in a little water.\" \n",
+      "\"Yes, that is true,\" I said. \n",
+      "And the little prince added: \n",
+      "\"But the eyes are blind. One must look with the heart...\"\n",
+      "Metadata: {'title': 'Chapter 25'}\n",
+      "Similarity Score: 0.4223722219467283\n",
+      "\n",
+      "Rank 2\n",
+      "Contents : \"The men where you live,\" said the little prince, \"raise five thousand roses in the same garden-- and they do not find in it what they are looking for.\" \n",
+      "\"They do not find it,\" I replied. \n",
+      "\"And yet what they are looking for could be found in one single rose, or in a little water.\" \n",
+      "\"Yes, that is true,\" I said. \n",
+      "And the little prince added: \n",
+      "\"But the eyes are blind. One must look with the heart...\"\n",
+      "Metadata: {'title': 'Chapter 25'}\n",
+      "Similarity Score: 0.4223722219467283\n",
+      "\n",
+      "Rank 3\n",
+      "Contents : \"The men where you live,\" said the little prince, \"raise five thousand roses in the same garden-- and they do not find in it what they are looking for.\" \n",
+      "\"They do not find it,\" I replied. \n",
+      "\"And yet what they are looking for could be found in one single rose, or in a little water.\" \n",
+      "\"Yes, that is true,\" I said. \n",
+      "And the little prince added: \n",
+      "\"But the eyes are blind. One must look with the heart...\"\n",
+      "Metadata: {'title': 'Chapter 25'}\n",
+      "Similarity Score: 0.4223722219467283\n",
+      "\n"
      ]
     }
    ],
    "source": [
-    "# Do scroll with ids\n",
-    "scroll_result = col_manager.scroll(ids=uuids[:3])\n",
+    "# Filter Search\n",
     "\n",
-    "# print the number of items scrolled and all items that returned.\n",
-    "print(f\"Number of items scrolled: {len(scroll_result)}\")\n",
-    "for r in scroll_result:\n",
-    "    print(r)"
+    "results = crud_manager.search(\n",
+    "    query=\"Which asteroid did the little prince come from?\",\n",
+    "    k=3,\n",
+    "    filter={\"title\": \"Chapter 4\"},\n",
+    ")\n",
+    "for idx, doc in enumerate(results):\n",
+    "    print(f\"Rank {idx+1}\")\n",
+    "    print(f\"Contents : {result['content']}\")\n",
+    "    print(f\"Metadata: {result['metadata']}\")\n",
+    "    print(f\"Similarity Score: {result['score']}\")\n",
+    "    print()"
    ]
   },
   {
    "cell_type": "markdown",
+   "id": "9ad0ed0c",
    "metadata": {},
    "source": [
-    "### Delete items from vector store\n",
+    "### Delete Document\n",
     "\n",
-    "We can delete items by filter or ids with ```delete``` method.\n",
+    "Remove documents based on filter conditions\n",
     "\n",
+    "**✅ Args**\n",
     "\n",
-    "For example, we will delete **the first page**, that is ```page``` 1, of the little prince, and try to scroll it."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 23,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Delete done successfully\n",
-      "[]\n"
-     ]
-    }
-   ],
-   "source": [
-    "# delete an item\n",
-    "col_manager.delete(filter={\"page\": {\"$eq\": 1}})\n",
+    "- ```ids``` : Optional[List[str]] – List of document IDs to delete. If None, deletion is based on filter.\n",
     "\n",
-    "# check if it remains in DB.\n",
-    "print(col_manager.scroll(filter={\"page\": {\"$eq\": 1}}))"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Now we delete 5 items using ```ids```."
+    "- ```filters``` : Optional[Dict] – Dictionary specifying filter conditions (e.g., metadata match).\n",
+    "\n",
+    "- ```**kwargs``` : Any additional parameters.\n",
+    "\n",
+    "**🔄 Return**\n",
+    "\n",
+    "- Boolean"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 24,
+   "execution_count": 18,
+   "id": "0e3a2c33",
    "metadata": {},
    "outputs": [
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "Delete done successfully\n",
-      "[]\n"
+      "Delete done successfully\n"
      ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "True"
+      ]
+     },
+     "execution_count": 18,
+     "metadata": {},
+     "output_type": "execute_result"
     }
    ],
    "source": [
-    "# delete item by ids\n",
-    "ids = uuids[1:6]\n",
-    "\n",
-    "# call delete_node method\n",
-    "col_manager.delete(ids=ids)\n",
-    "\n",
-    "# check if it remains in DB.\n",
-    "print(col_manager.scroll(ids=ids))"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Similarity search\n",
-    "\n",
-    "As a vector store, ```pgVector``` support similarity search with various distance metric, **l2** , **inner** (max inner product), **cosine** .\n",
-    "\n",
-    "By default, distance strategy is set to **cosine.** \n",
-    "\n",
-    "Similarity search can be done by calling the ```search``` method.\n",
+    "# Delete by ids\n",
     "\n",
-    "You can set the number of retrieved documents by passing ```k```(*default to 4*)."
+    "crud_manager.delete(ids=ids[:10])"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 25,
+   "execution_count": 19,
+   "id": "60bcb4cf",
    "metadata": {},
    "outputs": [
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "{'content': '\"My friend the fox--\" the little prince said to me.', 'metadata': {'id': 'b02aaaa0-9352-403a-8924-cfff4973b926', 'page': 1087, 'author': 'Saiot-Exupery', 'source': 'the_little_prince.txt'}, 'embedding': None, 'score': 0.631413271508214}\n",
-      "{'content': '\"No,\" said the little prince. \"I am looking for friends. What does that mean-- ‘tame‘?\"', 'metadata': {'id': '48adae15-36ba-4384-8762-0ef3f0ac33a3', 'page': 958, 'author': 'Saiot-Exupery', 'source': 'the_little_prince.txt'}, 'embedding': None, 'score': 0.6050397117589812}\n",
-      "{'content': 'the little prince returns to his planet', 'metadata': {'id': '4ed37f54-5619-4fc9-912b-4a37fb5a5625', 'page': 1202, 'author': 'Saiot-Exupery', 'source': 'the_little_prince.txt'}, 'embedding': None, 'score': 0.5846221199406966}\n",
-      "{'content': 'midst of the Sahara where he meets a tiny prince from another world traveling the universe in order', 'metadata': {'id': '28b44d4b-cf4e-4cb9-983b-7fb3ec735609', 'page': 25, 'author': 'Saiot-Exupery', 'source': 'the_little_prince.txt'}, 'embedding': None, 'score': 0.5682375512406654}\n",
-      "{'content': '[ Chapter 2 ]\\n- the narrator crashes in the desert and makes the acquaintance of the little prince', 'metadata': {'id': '2a4e0184-bc2c-4558-8eaa-63a1a13da3a0', 'page': 85, 'author': 'Saiot-Exupery', 'source': 'the_little_prince.txt'}, 'embedding': None, 'score': 0.555493427632688}\n"
+      "Delete done successfully\n"
      ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "True"
+      ]
+     },
+     "execution_count": 19,
+     "metadata": {},
+     "output_type": "execute_result"
     }
    ],
    "source": [
-    "results = col_manager.search(query=\"Does the little prince have a friend?\", k=5)\n",
-    "for doc in results:\n",
-    "    print(doc)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "### Similarity search with filters\n",
+    "# Delete by filters\n",
     "\n",
-    "You can also do similarity search with filter as we have done in ```scroll``` or ```delete```."
+    "crud_manager.delete(filters={\"title\": {\"$eq\": \"chapter 4\"}})"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 20,
+   "id": "30d42d2e",
    "metadata": {},
    "outputs": [
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "{'content': 'inhabited region. And yet my little man seemed neither to be straying uncertainly among the sands,', 'metadata': {'id': '1be69712-f0f4-4728-b6f2-d4cf12cddfdb', 'page': 107, 'author': 'Saiot-Exupery', 'source': 'the_little_prince.txt'}, 'embedding': None, 'score': 0.23158187113240447}\n",
-      "{'content': 'Nothing about him gave any suggestion of a child lost in the middle of the desert, a thousand miles', 'metadata': {'id': 'df4ece8c-dcb6-400e-9d8e-0eb5820a5c4e', 'page': 109, 'author': 'Saiot-Exupery', 'source': 'the_little_prince.txt'}, 'embedding': None, 'score': 0.18018012822748797}\n",
-      "{'content': 'among the sands, nor to be fainting from fatigue or hunger or thirst or fear. Nothing about him', 'metadata': {'id': '71b4297c-3b76-43cb-be6a-afca5f59388d', 'page': 108, 'author': 'Saiot-Exupery', 'source': 'the_little_prince.txt'}, 'embedding': None, 'score': 0.17715921622781305}\n",
-      "{'content': 'less charming than its model.', 'metadata': {'id': '507267bc-7076-42f7-ad7c-ed1f835663f2', 'page': 100, 'author': 'Saiot-Exupery', 'source': 'the_little_prince.txt'}, 'embedding': None, 'score': 0.16131896837723747}\n",
-      "{'content': 'a thousand miles from any human habitation. When at last I was able to speak, I said to him:', 'metadata': {'id': '524af6ff-1370-4c20-ad94-1b37e45fe0c5', 'page': 110, 'author': 'Saiot-Exupery', 'source': 'the_little_prince.txt'}, 'embedding': None, 'score': 0.15769872390077566}\n"
+      "Delete done successfully\n"
      ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "True"
+      ]
+     },
+     "execution_count": 20,
+     "metadata": {},
+     "output_type": "execute_result"
     }
    ],
    "source": [
-    "# search with filter\n",
-    "result_with_filter = col_manager.search(\n",
-    "    \"Does the little prince have a friend?\",\n",
-    "    filter={\"page\": {\"$between\": [100, 110]}},\n",
-    "    k=5,\n",
-    ")\n",
+    "# Delete All\n",
     "\n",
-    "for doc in result_with_filter:\n",
-    "    print(doc)"
+    "crud_manager.delete()"
    ]
   }
  ],
  "metadata": {
   "kernelspec": {
-   "display_name": "testbed",
+   "display_name": "Python 3 (ipykernel)",
    "language": "python",
    "name": "python3"
   },
@@ -944,5 +980,5 @@
   }
  },
  "nbformat": 4,
- "nbformat_minor": 4
+ "nbformat_minor": 5
 }
diff --git a/09-VectorStore/utils/pgvector_interface.py b/09-VectorStore/utils/pgvector_interface.py
index ed6002d9f..470f9aa27 100644
--- a/09-VectorStore/utils/pgvector_interface.py
+++ b/09-VectorStore/utils/pgvector_interface.py
@@ -188,6 +188,7 @@ class EmbeddingStore(Base):
 class pgVectorIndexManager:
     def __init__(
         self,
+        client=None,
         connection=None,
         host=None,
         port=None,
@@ -198,28 +199,34 @@ def __init__(
         dbname=None,
         db=None,
     ):
-        if connection is not None:
-            self.connection_str = connection
-
+        if client is not None:
+            self.client = client
+            self.connection_str = None
+            self._engine = client
         else:
-            assert host is not None, "host is missing"
-            assert port is not None, "port is missing"
-            assert (
-                username is not None or user is not None
-            ), "username(or user) is missing"
-            assert (
-                password is not None or passwd is not None
-            ), "password(or passwd) is missing"
-            assert dbname is not None or db is not None, "dbname(or db) is missing"
-
-            self.host = host
-            self.port = port
-            self.userName = username if username is not None else user
-            self.passWord = password if password is not None else passwd
-            self.dbName = dbname if dbname is not None else db
-            self.connection_str = f"postgresql+psycopg://{self.userName}:{self.passWord}@{self.host}:{self.port}/{self.dbName}"
-
-        self._engine = create_engine(url=self.connection_str, **({}))
+            self.client = None
+            if connection is not None:
+                self.connection_str = connection
+
+            else:
+                assert host is not None, "host is missing"
+                assert port is not None, "port is missing"
+                assert (
+                    username is not None or user is not None
+                ), "username(or user) is missing"
+                assert (
+                    password is not None or passwd is not None
+                ), "password(or passwd) is missing"
+                assert dbname is not None or db is not None, "dbname(or db) is missing"
+
+                self.host = host
+                self.port = port
+                self.userName = username if username is not None else user
+                self.passWord = password if password is not None else passwd
+                self.dbName = dbname if dbname is not None else db
+                self.connection_str = f"postgresql+psycopg://{self.userName}:{self.passWord}@{self.host}:{self.port}/{self.dbName}"
+
+            self._engine = create_engine(url=self.connection_str, **({}))
         self.session_maker: scoped_session
         self.session_maker = scoped_session(sessionmaker(bind=self._engine))
         self.collection_metadata = None
@@ -334,29 +341,44 @@ def create_index(self, collection_name, embedding=None, dimension=None):
             )
             return False
         else:
-            return pgVectorDocumentManager(
-                embedding=embedding,
-                connection_info=self.connection_str,
-                collection_name=collection_name,
-            )
+            if self.client is not None:
+                return pgVectorCRUDManager(
+                    embedding=embedding,
+                    client=self.client,
+                    collection_name=collection_name,
+                )
+            else:
+                return pgVectorCRUDManager(
+                    embedding=embedding,
+                    connection_info=self.connection_str,
+                    collection_name=collection_name,
+                )
 
     def get_index(self, embedding, collection_name):
-        return pgVectorDocumentManager(
+        return pgVectorCRUDManager(
             embedding=embedding,
             connection_info=self.connection_str,
             collection_name=collection_name,
         )
 
 
-class pgVectorDocumentManager(DocumentManager):
+class pgVectorCRUDManager(DocumentManager):
     def __init__(
-        self, embedding, connection_info=None, collection_name=None, distance="cosine"
+        self,
+        embedding,
+        client=None,
+        connection_info=None,
+        collection_name=None,
+        distance="cosine",
     ):
-        if isinstance(connection_info, str):
-            self.connection_info = connection_info
-        elif isinstance(connection_info, dict):
-            self.connection_info = self._make_conn_string(connection_info)
-        self._engine = create_engine(url=self.connection_info, **({}))
+        if client is not None:
+            self._engine = client
+        else:
+            if isinstance(connection_info, str):
+                self.connection_info = connection_info
+            elif isinstance(connection_info, dict):
+                self.connection_info = self._make_conn_string(connection_info)
+            self._engine = create_engine(url=self.connection_info, **({}))
         self.session_maker: scoped_session
         self.session_maker = scoped_session(sessionmaker(bind=self._engine))
         self.collection_metadata = None
@@ -791,13 +813,15 @@ def delete(self, ids=None, filter=None, **kwargs):
                     stmt = stmt.where(self.EmbeddingStore.id.in_(ids))
                     session.execute(stmt)
 
-                elif filter:
+                elif filter is not None:
                     filter_by = [self.EmbeddingStore.collection_id == collection.uuid]
                     filter_clauses = self._create_filter_clause(filter)
                     if filter_clauses is not None:
                         filter_by.append(filter_clauses)
                     stmt = stmt.where(filter_clauses)
                     session.execute(stmt)
+                else:
+                    session.execute(stmt)
                 session.commit()
         except Exception as e:
             msg = f"Delete failed due to {type(e)} {str(e)}"
@@ -815,10 +839,6 @@ def _get_retriever_tags(self) -> list[str]:
             tags.append(self.embeddings.__class__.__name__)
         return tags
 
-    def as_retriever(self, **kwargs):
-        tags = kwargs.pop("tags", None) or [] + self._get_retriever_tags()
-        return pgVectorRetriever(vectorstore=self, tags=tags, **kwargs)
-
     def scroll(self, ids=None, filter=None, k=10, **kwargs):
         with self._make_sync_session() as session:  # type: ignore[arg-type]
             collection = self.CollectionStore.get_by_name(
@@ -857,85 +877,3 @@ def scroll(self, ids=None, filter=None, k=10, **kwargs):
         ]
 
         return docs
-
-
-class pgVectorRetriever(BaseRetriever):
-    vectorstore: pgVectorDocumentManager
-    search_type: str = "similarity"
-    search_kwargs: dict = Field(default_factory=dict)
-    allowed_search_types: ClassVar[Collection[str]] = (
-        "similarity",
-        "similarity_score_threshold",
-        "mmr",
-    )
-
-    model_config = ConfigDict(
-        arbitrary_types_allowed=True,
-    )
-
-    @model_validator(mode="before")
-    @classmethod
-    def validate_search_type(cls, values: dict) -> Any:
-        search_type = values.get("search_type", "similarity")
-        if search_type not in cls.allowed_search_types:
-            msg = (
-                f"search_type of {search_type} not allowed. Valid values are: "
-                f"{cls.allowed_search_types}"
-            )
-            raise ValueError(msg)
-        if search_type == "similarity_score_threshold":
-            score_threshold = values.get("search_kwargs", {}).get("score_threshold")
-            if (score_threshold is None) or (not isinstance(score_threshold, float)):
-                msg = (
-                    "`score_threshold` is not specified with a float value(0~1) "
-                    "in `search_kwargs`."
-                )
-                raise ValueError(msg)
-        return values
-
-    def _get_ls_params(self, **kwargs: Any) -> LangSmithRetrieverParams:
-        """Get standard params for tracing."""
-
-        _kwargs = self.search_kwargs | kwargs
-
-        ls_params = super()._get_ls_params(**_kwargs)
-        ls_params["ls_vector_store_provider"] = self.vectorstore.__class__.__name__
-
-        if self.vectorstore.embeddings:
-            ls_params["ls_embedding_provider"] = (
-                self.vectorstore.embeddings.__class__.__name__
-            )
-        elif hasattr(self.vectorstore, "embedding") and isinstance(
-            self.vectorstore.embedding, Embeddings
-        ):
-            ls_params["ls_embedding_provider"] = (
-                self.vectorstore.embedding.__class__.__name__
-            )
-
-        return ls_params
-
-    def _get_relevant_documents(
-        self, query: str, *, run_manager, **kwargs: Any
-    ) -> list[Document]:
-        _kwargs = self.search_kwargs | kwargs
-        print(f"_kwargs: {_kwargs}")
-        if self.search_type == "similarity":
-            docs = self.vectorstore.search(query, **_kwargs)
-        else:
-            msg = f"search_type of {self.search_type} not allowed."
-            raise ValueError(msg)
-        return docs
-
-    def add_documents(self, documents: list[Document], **kwargs: Any) -> list[str]:
-        """Add documents to the vectorstore.
-
-        Args:
-            documents: Documents to add to the vectorstore.
-            **kwargs: Other keyword arguments that subclasses might use.
-
-        Returns:
-            List of IDs of the added texts.
-        """
-        texts = [doc.page_content for doc in documents]
-        metadatas = [doc.metadata for doc in documents]
-        return self.vectorstore.upsert(texts, metadatas, **kwargs)

From 5a1803ba5e43b613d882beea9f5da56e08672d1d Mon Sep 17 00:00:00 2001
From: XaviereKU <kumathmatics@gmail.com>
Date: Sun, 4 May 2025 16:47:28 +0900
Subject: [PATCH 2/5] Revision

---
 09-VectorStore/08-PGVector.ipynb | 55 ++++++++++++++++----------------
 1 file changed, 28 insertions(+), 27 deletions(-)

diff --git a/09-VectorStore/08-PGVector.ipynb b/09-VectorStore/08-PGVector.ipynb
index 07c465595..247fa85c2 100644
--- a/09-VectorStore/08-PGVector.ipynb
+++ b/09-VectorStore/08-PGVector.ipynb
@@ -159,7 +159,34 @@
    "id": "6890920d",
    "metadata": {},
    "source": [
-    "Please write down what you need to set up the Vectorstore here."
+    "### Set up PGVector\n",
+    "\n",
+    "If you are using Windows and have installed postgresql for Windows, you are required to install **vector** extension for postgresql. The following may help [Install pgvector on Windows](https://dev.to/mehmetakar/install-pgvector-on-windows-6gl).\n",
+    "\n",
+    "But in this tutorial, we will use ```Docker``` container. If you are using Mac or Windows, check [Docker Desktop for Mac](https://docs.docker.com/desktop/setup/install/mac-install/) or [Docker Desktop for Windows](https://docs.docker.com/desktop/setup/install/windows-install).\n",
+    "\n",
+    "If you are using ```Docker``` desktop, you can easily set up `PGVector` by running the following command that spins up a ```Docker``` container:\n",
+    "\n",
+    "```bash\n",
+    "docker run --name pgvector-container -e POSTGRES_USER=langchain -e POSTGRES_PASSWORD=langchain -e POSTGRES_DB=langchain -p 6024:5432 -d pgvector/pgvector:pg16\n",
+    "```\n",
+    "\n",
+    "For more detailed instructions, please refer to [the official documentation](https://github.com/pgvector/pgvector) \n",
+    "\n",
+    "**[ NOTE ]**\n",
+    "* If you want to maintain the stored data even after container being deleted, you must mount volume like below:\n",
+    "```bash\n",
+    "docker run --name pgvector-container -v {/mount/path}:/var/lib/postgresql/data -e POSTGRES_USER=langchain -e POSTGRES_PASSWORD=langchain -e POSTGRES_DB=langchain -p 6024:5432 -d pgvector/pgvector:pg16\n",
+    "```\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8afc0863",
+   "metadata": {},
+   "source": [
+    "## What is PGVector?\n",
+    "\n"
    ]
   },
   {
@@ -301,32 +328,6 @@
     "- Load ```PGVector``` Client"
    ]
   },
-  {
-   "cell_type": "markdown",
-   "id": "835e5c9e",
-   "metadata": {},
-   "source": [
-    "### Set up PGVector\n",
-    "\n",
-    "If you are using Windows and have installed postgresql for Windows, you are required to install **vector** extension for postgresql. The following may help [Install pgvector on Windows](https://dev.to/mehmetakar/install-pgvector-on-windows-6gl).\n",
-    "\n",
-    "But in this tutorial, we will use ```Docker``` container. If you are using Mac or Windows, check [Docker Desktop for Mac](https://docs.docker.com/desktop/setup/install/mac-install/) or [Docker Desktop for Windows](https://docs.docker.com/desktop/setup/install/windows-install).\n",
-    "\n",
-    "If you are using ```Docker``` desktop, you can easily set up `PGVector` by running the following command that spins up a ```Docker``` container:\n",
-    "\n",
-    "```bash\n",
-    "docker run --name pgvector-container -e POSTGRES_USER=langchain -e POSTGRES_PASSWORD=langchain -e POSTGRES_DB=langchain -p 6024:5432 -d pgvector/pgvector:pg16\n",
-    "```\n",
-    "\n",
-    "For more detailed instructions, please refer to [the official documentation](https://github.com/pgvector/pgvector) \n",
-    "\n",
-    "**[ NOTE ]**\n",
-    "* If you want to maintain the stored data even after container being deleted, you must mount volume like below:\n",
-    "```bash\n",
-    "docker run --name pgvector-container -v {/mount/path}:/var/lib/postgresql/data -e POSTGRES_USER=langchain -e POSTGRES_PASSWORD=langchain -e POSTGRES_DB=langchain -p 6024:5432 -d pgvector/pgvector:pg16\n",
-    "```\n"
-   ]
-  },
   {
    "cell_type": "markdown",
    "id": "7eee56b2",

From 74066d145f8b6765d02b024fc87b7f7cd3e6773b Mon Sep 17 00:00:00 2001
From: Jongho Lee <harry0308@naver.com>
Date: Sun, 4 May 2025 21:05:17 +0900
Subject: [PATCH 3/5] Update

---
 09-VectorStore/08-PGVector.ipynb | 36 +++++++++++++++++++++++++++-----
 1 file changed, 31 insertions(+), 5 deletions(-)

diff --git a/09-VectorStore/08-PGVector.ipynb b/09-VectorStore/08-PGVector.ipynb
index 247fa85c2..8e6299ae7 100644
--- a/09-VectorStore/08-PGVector.ipynb
+++ b/09-VectorStore/08-PGVector.ipynb
@@ -26,7 +26,7 @@
     "\n",
     "- [Overview](#overview)\n",
     "- [Environment Setup](#environment-setup)\n",
-    "- [What is PGVector?](#what-is-pgvector?)\n",
+    "- [What is PGVector?](#what-is-pgvector)\n",
     "- [Data](#data)\n",
     "- [Initial Setting PGVector](#initial-setting-PGVector)\n",
     "- [Document Manager](#document-manager)\n",
@@ -186,7 +186,32 @@
    "metadata": {},
    "source": [
     "## What is PGVector?\n",
-    "\n"
+    "\n",
+    "`PGVector` is a ```PostgreSQL``` extension that enables vector similarity search directly within your ```PostgreSQL``` database, making it ideal for AI applications, semantic search, and recommendation systems.\n",
+    "\n",
+    "This is particularly valuable for who already use ```PostgreSQL``` who want to add vector search capabilities without managing separate infrastructure or learning new query languages.\n",
+    "\n",
+    "**Features** :\n",
+    "1. Native ```PostgreSQL``` integration with standard SQL queries\n",
+    "2. Multiple similarity search methods including L2, Inner Product, Cosine\n",
+    "3. Several indexing options including HNSW and IVFFlat\n",
+    "4. Support for up to 2,000 dimensions per vector\n",
+    "5. ACID compliance inherited from ```PostgreSQL```\n",
+    "\n",
+    "**Advantages** :\n",
+    "\n",
+    "1. Free and open-source\n",
+    "2. Easy integration with existing ```PostgreSQL``` databases\n",
+    "3. Full SQL functionality and transactional support\n",
+    "4. No additional infrastructure needed\n",
+    "5. Supports hybrid searches combining vector and traditional SQL queries\n",
+    "\n",
+    "**Disadvantages** :\n",
+    "1. Performance limitations with very large datasets (billions of vectors)\n",
+    "2. Limited to single-node deployment\n",
+    "3. Memory-intensive for large vector dimensions\n",
+    "4. Requires manual optimization for best performance\n",
+    "5. Less specialized features compared to dedicated vector databases"
    ]
   },
   {
@@ -486,7 +511,7 @@
    "id": "f83b661d",
    "metadata": {},
    "source": [
-    "## Create collection\n",
+    "### Create collection\n",
     "Now we can create collection with ```index_manager```.\n",
     "\n",
     "To create collection, you need to pass **embedding** model and **collection_name** when calling the ```create_index``` method.\n",
@@ -500,13 +525,14 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 11,
+   "execution_count": null,
    "id": "4742c2ff",
    "metadata": {},
    "outputs": [],
    "source": [
     "import getpass\n",
     "import os\n",
+    "\n",
     "if not os.environ.get(\"OPENAI_API_KEY\"):\n",
     "    os.environ[\"OPENAI_API_KEY\"] = getpass.getpass(\"Enter API key for OpenAI: \")\n",
     "\n",
@@ -963,7 +989,7 @@
  ],
  "metadata": {
   "kernelspec": {
-   "display_name": "Python 3 (ipykernel)",
+   "display_name": "testbed",
    "language": "python",
    "name": "python3"
   },

From f3a8561a5595548cb6f7ab9c8ad017b93463a77a Mon Sep 17 00:00:00 2001
From: Jongho Lee <harry0308@naver.com>
Date: Sun, 4 May 2025 21:09:41 +0900
Subject: [PATCH 4/5] update

---
 09-VectorStore/08-PGVector.ipynb | 22 ++--------------------
 1 file changed, 2 insertions(+), 20 deletions(-)

diff --git a/09-VectorStore/08-PGVector.ipynb b/09-VectorStore/08-PGVector.ipynb
index 8e6299ae7..d7039edf2 100644
--- a/09-VectorStore/08-PGVector.ipynb
+++ b/09-VectorStore/08-PGVector.ipynb
@@ -394,7 +394,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 8,
+   "execution_count": null,
    "id": "eed0ebad",
    "metadata": {},
    "outputs": [],
@@ -406,36 +406,18 @@
     "\n",
     "def get_db_client(conn_str):\n",
     "    \"\"\"\n",
-    "\n",
-    "\n",
     "    Initializes and returns a VectorStore client instance.\n",
-    "\n",
-    "\n",
-    "\n",
     "    This function loads configuration (e.g., API key, host) from environment\n",
-    "\n",
-    "\n",
     "    variables or default values and creates a client object to interact\n",
-    "\n",
-    "\n",
     "    with the {vectordb} Python SDK.\n",
     "\n",
-    "\n",
-    "\n",
     "    Returns:\n",
-    "\n",
-    "\n",
     "        client:ClientType - An instance of the {vectordb} client.\n",
     "\n",
-    "\n",
-    "\n",
     "    Raises:\n",
-    "\n",
-    "\n",
     "        ValueError: If required configuration is missing.\n",
-    "\n",
-    "\n",
     "    \"\"\"\n",
+    "\n",
     "    try:\n",
     "        client = create_engine(url=conn_str, **({}))\n",
     "    except Exception as e:\n",

From c497b02d39f77429cad278822909793da4664697 Mon Sep 17 00:00:00 2001
From: Jongho Lee <harry0308@naver.com>
Date: Sun, 4 May 2025 22:16:44 +0900
Subject: [PATCH 5/5] Match port

---
 09-VectorStore/08-PGVector.ipynb | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/09-VectorStore/08-PGVector.ipynb b/09-VectorStore/08-PGVector.ipynb
index d7039edf2..fd0103df6 100644
--- a/09-VectorStore/08-PGVector.ipynb
+++ b/09-VectorStore/08-PGVector.ipynb
@@ -168,7 +168,7 @@
     "If you are using ```Docker``` desktop, you can easily set up `PGVector` by running the following command that spins up a ```Docker``` container:\n",
     "\n",
     "```bash\n",
-    "docker run --name pgvector-container -e POSTGRES_USER=langchain -e POSTGRES_PASSWORD=langchain -e POSTGRES_DB=langchain -p 6024:5432 -d pgvector/pgvector:pg16\n",
+    "docker run --name pgvector-container -e POSTGRES_USER=langchain -e POSTGRES_PASSWORD=langchain -e POSTGRES_DB=langchain -p 6088:5432 -d pgvector/pgvector:pg16\n",
     "```\n",
     "\n",
     "For more detailed instructions, please refer to [the official documentation](https://github.com/pgvector/pgvector) \n",
@@ -176,7 +176,7 @@
     "**[ NOTE ]**\n",
     "* If you want to maintain the stored data even after container being deleted, you must mount volume like below:\n",
     "```bash\n",
-    "docker run --name pgvector-container -v {/mount/path}:/var/lib/postgresql/data -e POSTGRES_USER=langchain -e POSTGRES_PASSWORD=langchain -e POSTGRES_DB=langchain -p 6024:5432 -d pgvector/pgvector:pg16\n",
+    "docker run --name pgvector-container -v {/mount/path}:/var/lib/postgresql/data -e POSTGRES_USER=langchain -e POSTGRES_PASSWORD=langchain -e POSTGRES_DB=langchain -p 6088:5432 -d pgvector/pgvector:pg16\n",
     "```\n"
    ]
   },