From 81e4d2ffd82992248687f1be3d8ecb33c1e277f7 Mon Sep 17 00:00:00 2001
From: Gwangwon Jung <rhdk5148@gmail.com>
Date: Sun, 6 Apr 2025 02:11:58 +0900
Subject: [PATCH 1/5] [N-2] 09-Vector Store / 99-Mater-Template

- vectorstore interface notebook template file
- unified content structure
---
 09-VectorStore/99-Master-Template.ipynb | 693 ++++++++++++++++++++++++
 1 file changed, 693 insertions(+)
 create mode 100644 09-VectorStore/99-Master-Template.ipynb

diff --git a/09-VectorStore/99-Master-Template.ipynb b/09-VectorStore/99-Master-Template.ipynb
new file mode 100644
index 000000000..24003d355
--- /dev/null
+++ b/09-VectorStore/99-Master-Template.ipynb
@@ -0,0 +1,693 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "25733da0",
+   "metadata": {},
+   "source": [
+    "# {VectorStore Name}\n",
+    "\n",
+    "- Author: [Author Name](#Author's-Profile-Link)\n",
+    "- Design: [Designer](#Designer's-Profile-Link)\n",
+    "- Peer Review: [Reviewer Name](#Reviewer-Profile-Link)\n",
+    "- This is a part of [LangChain Open Tutorial](https://github.com/LangChain-OpenTutorial/LangChain-OpenTutorial)\n",
+    "\n",
+    "[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/LangChain-OpenTutorial/LangChain-OpenTutorial/blob/main/09-VectorStore/your-notebook-file-name) [![Open in GitHub](https://img.shields.io/badge/Open%20in%20GitHub-181717?style=flat-square&logo=github&logoColor=white)](https://github.com/LangChain-OpenTutorial/LangChain-OpenTutorial/blob/main/09-VectorStore/your-notebook-file-name)\n",
+    "\n",
+    "## Overview\n",
+    "\n",
+    "This tutorial covers how to use **{Vector Store Name}** with **LangChain** .\n",
+    "\n",
+    "{A short introduction to vectordb}\n",
+    "\n",
+    "This tutorial walks you through using **CRUD** operations with the **{VectorDB}** **storing** , **updating** , **deleting** documents, and performing **similarity-based retrieval** .\n",
+    "\n",
+    "### Table of Contents\n",
+    "\n",
+    "- [Overview](#overview)\n",
+    "- [Environment Setup](#environment-setup)\n",
+    "- [What is {vectordb}?](#what-is-{vectordb}?)\n",
+    "- [Data](#data)\n",
+    "  - [Introduce Data](#introduce-data)\n",
+    "  - [Preprocessing Data](#preprocessing-data)\n",
+    "- [Initial Setting {vectordb}](#initial-setting-{vectordb})\n",
+    "  - [Load Embedding Model](#load-embedding-model)\n",
+    "  - [Load {vectordb} Client](#load-{vectordb}-client)\n",
+    "- [Document Manager](#document-manager)\n",
+    "  - [Create Instance](#create-instance)\n",
+    "  - [Upsert Document](#upsert-document)\n",
+    "  - [Upsert Parallel Document](#upsert-parallel-document)\n",
+    "  - [Similarity Search](#similarity-search)\n",
+    "  - [Delete Document](#delete-document)\n",
+    "\n",
+    "\n",
+    "### References\n",
+    "----"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c1fac085",
+   "metadata": {},
+   "source": [
+    "## Environment Setup\n",
+    "\n",
+    "Set up the environment. You may refer to [Environment Setup](https://wikidocs.net/257836) for more details.\n",
+    "\n",
+    "**[Note]**\n",
+    "- `langchain-opentutorial` is a package that provides a set of easy-to-use environment setup, useful functions and utilities for tutorials. \n",
+    "- You can checkout the [`langchain-opentutorial`](https://github.com/LangChain-OpenTutorial/langchain-opentutorial-pypi) for more details."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "98da7994",
+   "metadata": {
+    "vscode": {
+     "languageId": "plaintext"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "%%capture --no-stderr\n",
+    "%pip install langchain-opentutorial"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "800c732b",
+   "metadata": {
+    "vscode": {
+     "languageId": "plaintext"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "# Install required packages\n",
+    "from langchain_opentutorial import package\n",
+    "\n",
+    "package.install(\n",
+    "    [\n",
+    "        \"langsmith\",\n",
+    "        \"langchain-core\",\n",
+    "        \"python-dotenv\",\n",
+    "    ],\n",
+    "    verbose=False,\n",
+    "    upgrade=False,\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "5b36bafa",
+   "metadata": {
+    "vscode": {
+     "languageId": "plaintext"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "# Set environment variables\n",
+    "from langchain_opentutorial import set_env\n",
+    "\n",
+    "set_env(\n",
+    "    {\n",
+    "        \"OPENAI_API_KEY\": \"\",\n",
+    "        \"LANGCHAIN_API_KEY\": \"\",\n",
+    "        \"LANGCHAIN_TRACING_V2\": \"true\",\n",
+    "        \"LANGCHAIN_ENDPOINT\": \"https://api.smith.langchain.com\",\n",
+    "        \"LANGCHAIN_PROJECT\": \"{Project Name}\",\n",
+    "    }\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8011a0c7",
+   "metadata": {},
+   "source": [
+    "You can alternatively set API keys such as `OPENAI_API_KEY` in a `.env` file and load them.\n",
+    "\n",
+    "[Note] This is not necessary if you've already set the required API keys in previous steps."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "70d7e764",
+   "metadata": {
+    "vscode": {
+     "languageId": "plaintext"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "from dotenv import load_dotenv\n",
+    "\n",
+    "load_dotenv(override=True)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6890920d",
+   "metadata": {},
+   "source": [
+    "Please write down what you need to set up the Vectorstore here."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6f3b5bd2",
+   "metadata": {},
+   "source": [
+    "## Data\n",
+    "\n",
+    "This part walks you through the **data preparation process** .\n",
+    "\n",
+    "This section includes the following components:\n",
+    "\n",
+    "- Introduce Data\n",
+    "\n",
+    "- Preprocessing Data\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "508ae7f7",
+   "metadata": {},
+   "source": [
+    "### Introduce Data\n",
+    "\n",
+    "In this tutorial, we will use the fairy tale **📗 The Little Prince** in PDF format as our data.\n",
+    "\n",
+    "This material complies with the **Apache 2.0 license** .\n",
+    "\n",
+    "The data is used in a text (.txt) format converted from the original PDF.\n",
+    "\n",
+    "You can view the data at the link below.\n",
+    "- [Data Link](https://huggingface.co/datasets/sohyunwriter/the_little_prince)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "004ea4f4",
+   "metadata": {},
+   "source": [
+    "### Preprocessing Data\n",
+    "\n",
+    "In this tutorial section, we will preprocess the text data from The Little Prince and convert it into a list of `LangChain Document` objects with metadata. \n",
+    "\n",
+    "Each document chunk will include a `title` field in the metadata, extracted from the first line of each section."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "8e4cac64",
+   "metadata": {
+    "vscode": {
+     "languageId": "plaintext"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "from langchain.schema import Document\n",
+    "from langchain.text_splitter import RecursiveCharacterTextSplitter\n",
+    "import re\n",
+    "from typing import List\n",
+    "\n",
+    "def preprocessing_data(content:str)->List[Document]:\n",
+    "    # 1. Split the text by double newlines to separate sections\n",
+    "    blocks = content.split(\"\\n\\n\")\n",
+    "\n",
+    "    # 2. Initialize the text splitter\n",
+    "    text_splitter = RecursiveCharacterTextSplitter(\n",
+    "        chunk_size=500,              # Maximum number of characters per chunk\n",
+    "        chunk_overlap=50,            # Overlap between chunks to preserve context\n",
+    "        separators=[\"\\n\\n\", \"\\n\", \" \"]  # Order of priority for splitting\n",
+    "    )\n",
+    "\n",
+    "    documents = []\n",
+    "\n",
+    "    # 3. Loop through each section\n",
+    "    for block in blocks:\n",
+    "        lines = block.strip().splitlines()\n",
+    "        if not lines:\n",
+    "            continue\n",
+    "\n",
+    "        # Extract title from the first line using square brackets [ ]\n",
+    "        first_line = lines[0]\n",
+    "        title_match = re.search(r\"\\[(.*?)\\]\", first_line)\n",
+    "        title = title_match.group(1).strip() if title_match else None\n",
+    "\n",
+    "        # Remove the title line from content\n",
+    "        body = \"\\n\".join(lines[1:]).strip()\n",
+    "        if not body:\n",
+    "            continue\n",
+    "\n",
+    "        # 4. Chunk the section using the text splitter\n",
+    "        chunks = text_splitter.split_text(body)\n",
+    "\n",
+    "        # 5. Create a LangChain Document for each chunk with the same title metadata\n",
+    "        for chunk in chunks:\n",
+    "            documents.append(Document(page_content=chunk, metadata={\"title\": title}))\n",
+    "\n",
+    "    print(f\"Generated {len(documents)} chunked documents.\")\n",
+    "\n",
+    "    return documents"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "1d091a51",
+   "metadata": {
+    "vscode": {
+     "languageId": "plaintext"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "# Load the entire text file\n",
+    "with open(\"the_little_prince.txt\", \"r\", encoding=\"utf-8\") as f:\n",
+    "    content = f.read()\n",
+    "\n",
+    "# Preprocessing Data\n",
+    "\n",
+    "docs = preprocessing_data(content=content)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1977d4ff",
+   "metadata": {},
+   "source": [
+    "## Initial Setting {vectordb}\n",
+    "\n",
+    "This part walks you through the initial setup of **{vectordb}** .\n",
+    "\n",
+    "This section includes the following components:\n",
+    "\n",
+    "- Load Embedding Model\n",
+    "\n",
+    "- Load {vectordb} Client"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7eee56b2",
+   "metadata": {},
+   "source": [
+    "### Load Embedding Model\n",
+    "\n",
+    "In the **Load Embedding Model** section, you'll learn how to load an embedding model.\n",
+    "\n",
+    "This tutorial uses **OpenAI** 's **API-Key** for loading the model.\n",
+    "\n",
+    "*💡 If you prefer to use another embedding model, see the instructions below.*\n",
+    "- [Embedding Models](https://python.langchain.com/docs/integrations/text_embedding/)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "5bd5c3c9",
+   "metadata": {
+    "vscode": {
+     "languageId": "plaintext"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "import os\n",
+    "from langchain_openai import OpenAIEmbeddings\n",
+    "\n",
+    "embedding = OpenAIEmbeddings(model=\"text-embedding-3-large\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "40f65795",
+   "metadata": {},
+   "source": [
+    "### Load {vectordb} Client\n",
+    "\n",
+    "In the **Load {vectordb} Client** section, we cover how to load the **database client object** using the **Python SDK** for **{vectordb}** .\n",
+    "- [Python SDK Docs]()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "eed0ebad",
+   "metadata": {
+    "vscode": {
+     "languageId": "plaintext"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "# Create Database Client Object Function\n",
+    "\n",
+    "def get_db_client():\n",
+    "    \"\"\"\n",
+    "    Initializes and returns a VectorStore client instance.\n",
+    "\n",
+    "    This function loads configuration (e.g., API key, host) from environment\n",
+    "    variables or default values and creates a client object to interact\n",
+    "    with the {vectordb} Python SDK.\n",
+    "\n",
+    "    Returns:\n",
+    "        client:ClientType - An instance of the {vectordb} client.\n",
+    "\n",
+    "    Raises:\n",
+    "        ValueError: If required configuration is missing.\n",
+    "    \"\"\"\n",
+    "    return client"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "2b5f4116",
+   "metadata": {
+    "vscode": {
+     "languageId": "plaintext"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "# Get DB Client Object\n",
+    "\n",
+    "client = get_db_client()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3a5a97a0",
+   "metadata": {},
+   "source": [
+    "## Document Manager\n",
+    "\n",
+    "To support the **Langchain-Opentutorial** , we implemented a custom set of **CRUD** functionalities for VectorDBs. \n",
+    "\n",
+    "The following operations are included:\n",
+    "\n",
+    "- `upsert` : Update existing documents or insert if they don’t exist\n",
+    "\n",
+    "- `upsert_parallel` : Perform upserts in parallel for large-scale data\n",
+    "\n",
+    "- `similarity_search` : Search for similar documents based on embeddings\n",
+    "\n",
+    "- `delete` : Remove documents based on filter conditions\n",
+    "\n",
+    "Each of these features is implemented as class methods specific to each VectorDB.\n",
+    "\n",
+    "In this tutorial, you can easily utilize these methods to interact with your VectorDB.\n",
+    "\n",
+    "*We plan to continuously expand the functionality by adding more common operations in the future.*"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "65a40601",
+   "metadata": {},
+   "source": [
+    "### Create Instance\n",
+    "\n",
+    "First, we create an instance of the **{vectordb}** helper class to use its CRUD functionalities.\n",
+    "\n",
+    "This class is initialized with the **{vectordb} Python SDK client instance** and the **embedding model instance** , both of which were defined in the previous section."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "dccab807",
+   "metadata": {
+    "vscode": {
+     "languageId": "plaintext"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "# crud_manager = <Your Vectordb Class>(client=client, embedding=embedding)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c1c0c67f",
+   "metadata": {},
+   "source": [
+    "Now you can use the following **CRUD** operations with the `crud_manager` instance.\n",
+    "\n",
+    "These instance allow you to easily manage documents in your **{vectordb}** ."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7c6c53c5",
+   "metadata": {},
+   "source": [
+    "### Upsert Document\n",
+    "\n",
+    "**Update** existing documents or **insert** if they don’t exist\n",
+    "\n",
+    "**✅ Args**\n",
+    "\n",
+    "- `texts` : Iterable[str] – List of text contents to be inserted/updated.\n",
+    "\n",
+    "- `metadatas` : Optional[List[Dict]] – List of metadata dictionaries for each text (optional).\n",
+    "\n",
+    "- `ids` : Optional[List[str]] – Custom IDs for the documents. If not provided, IDs will be auto-generated.\n",
+    "\n",
+    "- `**kwargs` : Extra arguments for the underlying vector store.\n",
+    "\n",
+    "**🔄 Return**\n",
+    "\n",
+    "- None"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "f3a6c32b",
+   "metadata": {
+    "vscode": {
+     "languageId": "plaintext"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "from uuid import uuid4\n",
+    "\n",
+    "args = {\n",
+    "    \"texts\": [doc.page_content for doc in docs[:2]],\n",
+    "    \"metadatas\": [doc.metadata[\"title\"] for doc in docs[:2]],\n",
+    "    \"ids\": [str(uuid4()) for _ in docs[:2]]\n",
+    "    # if you want args, add params.\n",
+    "}\n",
+    "\n",
+    "# crud_manager.upsert(**args)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "278fe1ed",
+   "metadata": {},
+   "source": [
+    "### Upsert Parallel Document\n",
+    "\n",
+    "Perform **upserts** in **parallel** for large-scale data\n",
+    "\n",
+    "**✅ Args**\n",
+    "\n",
+    "- `texts` : Iterable[str] – List of text contents to be inserted/updated.\n",
+    "\n",
+    "- `metadatas` : Optional[List[Dict]] – List of metadata dictionaries for each text (optional).\n",
+    "\n",
+    "- `ids` : Optional[List[str]] – Custom IDs for the documents. If not provided, IDs will be auto-generated.\n",
+    "\n",
+    "- `batch_size` : int – Number of documents per batch (default: 32).\n",
+    "\n",
+    "- `workers` : int – Number of parallel workers (default: 10).\n",
+    "\n",
+    "- `**kwargs` : Extra arguments for the underlying vector store.\n",
+    "\n",
+    "**🔄 Return**\n",
+    "\n",
+    "- None"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "a89dd8e0",
+   "metadata": {
+    "vscode": {
+     "languageId": "plaintext"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "from uuid import uuid4\n",
+    "\n",
+    "args = {\n",
+    "    \"texts\": [doc.page_content for doc in docs],\n",
+    "    \"metadatas\": [doc.metadata[\"title\"] for doc in docs],\n",
+    "    \"ids\": [str(uuid4()) for _ in docs]\n",
+    "    # if you want args, add params.\n",
+    "}\n",
+    "\n",
+    "# crud_manager.upsert_parallel(**args)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6beea197",
+   "metadata": {},
+   "source": [
+    "### Similarity Search\n",
+    "\n",
+    "Search for **similar documents** based on **embeddings** .\n",
+    "\n",
+    "This method uses **\"cosine similarity\"** .\n",
+    "\n",
+    "\n",
+    "**✅ Args**\n",
+    "\n",
+    "- `query` : str – The text query for similarity search.\n",
+    "\n",
+    "- `k` : int – Number of top results to return (default: 10).\n",
+    "\n",
+    "`**kwargs` : Additional search options (e.g., filters).\n",
+    "\n",
+    "**🔄 Return**\n",
+    "\n",
+    "- `results` : List[Document] – A list of LangChain Document objects ranked by similarity."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "5859782b",
+   "metadata": {
+    "vscode": {
+     "languageId": "plaintext"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "# Search by Query\n",
+    "\n",
+    "# results = crud_manager.search(query=\"\",k=3)\n",
+    "# for idx,doc in enumerate(results):\n",
+    "#     print(f\"Rank {idx} | Title : {doc.metadata['title']}\")\n",
+    "#     print(f\"Contents : {doc.page_content}\")\n",
+    "#     print()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "2577dd4a",
+   "metadata": {
+    "vscode": {
+     "languageId": "plaintext"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "# Filter Search\n",
+    "\n",
+    "# results = crud_manager.search(query=\"\",k=3,<filters>={\"title\":\"Chapter 4\"})\n",
+    "# for idx,doc in enumerate(results):\n",
+    "#     print(f\"Rank {idx} | Title : {doc.metadata['title']}\")\n",
+    "#     print(f\"Contents : {doc.page_content}\")\n",
+    "#     print()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9ad0ed0c",
+   "metadata": {},
+   "source": [
+    "### Delete Document\n",
+    "\n",
+    "Remove documents based on filter conditions\n",
+    "\n",
+    "**✅ Args**\n",
+    "\n",
+    "- `ids` : Optional[List[str]] – List of document IDs to delete. If None, deletion is based on filter.\n",
+    "\n",
+    "- `filters` : Optional[Dict] – Dictionary specifying filter conditions (e.g., metadata match).\n",
+    "\n",
+    "- `**kwargs` : Any additional parameters.\n",
+    "\n",
+    "**🔄 Return**\n",
+    "\n",
+    "- None"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "0e3a2c33",
+   "metadata": {
+    "vscode": {
+     "languageId": "plaintext"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "# Delete by ids\n",
+    "\n",
+    "# ids = [] # The 'ids' value you want to delete\n",
+    "# crud_manager.delete(ids=ids)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "60bcb4cf",
+   "metadata": {
+    "vscode": {
+     "languageId": "plaintext"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "# Delete by ids with filters\n",
+    "\n",
+    "# ids = [] # The `ids` value corresponding to chapter 6\n",
+    "# crud_manager.delete(ids=ids,filters={\"title\":\"chapter 6\"}) "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "30d42d2e",
+   "metadata": {
+    "vscode": {
+     "languageId": "plaintext"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "# Delete All\n",
+    "\n",
+    "# crud_manager.delete()"
+   ]
+  }
+ ],
+ "metadata": {
+  "language_info": {
+   "name": "python"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}

From fb93ff8c2291ddfe98e80a9ec8d73d0f9b4fa21e Mon Sep 17 00:00:00 2001
From: Gwangwon Jung <rhdk5148@gmail.com>
Date: Sun, 6 Apr 2025 02:24:26 +0900
Subject: [PATCH 2/5] [N-2] 09-Vector Store / 99-Mater-Template

- Add a query to the `search` usage code.
---
 09-VectorStore/99-Master-Template.ipynb | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/09-VectorStore/99-Master-Template.ipynb b/09-VectorStore/99-Master-Template.ipynb
index 24003d355..b12e264a2 100644
--- a/09-VectorStore/99-Master-Template.ipynb
+++ b/09-VectorStore/99-Master-Template.ipynb
@@ -583,7 +583,7 @@
    "source": [
     "# Search by Query\n",
     "\n",
-    "# results = crud_manager.search(query=\"\",k=3)\n",
+    "# results = crud_manager.search(query=\"What is essential is invisible to the eye.\",k=3)\n",
     "# for idx,doc in enumerate(results):\n",
     "#     print(f\"Rank {idx} | Title : {doc.metadata['title']}\")\n",
     "#     print(f\"Contents : {doc.page_content}\")\n",
@@ -603,7 +603,7 @@
    "source": [
     "# Filter Search\n",
     "\n",
-    "# results = crud_manager.search(query=\"\",k=3,<filters>={\"title\":\"Chapter 4\"})\n",
+    "# results = crud_manager.search(query=\"Which asteroid did the little prince come from?\",k=3,<filters>={\"title\":\"Chapter 4\"})\n",
     "# for idx,doc in enumerate(results):\n",
     "#     print(f\"Rank {idx} | Title : {doc.metadata['title']}\")\n",
     "#     print(f\"Contents : {doc.page_content}\")\n",

From e117dfd655f6ad1533f344c15cb1611dbdba686a Mon Sep 17 00:00:00 2001
From: Gwangwon Jung <rhdk5148@gmail.com>
Date: Tue, 29 Apr 2025 22:57:50 +0900
Subject: [PATCH 3/5] [N-2] 09-Vector Store / 99-Master-Template - Remove `###`
 content in `Table of content`. - ` -> ``` changed backtick

---
 09-VectorStore/99-Master-Template.ipynb | 65 +++++++++++--------------
 1 file changed, 28 insertions(+), 37 deletions(-)

diff --git a/09-VectorStore/99-Master-Template.ipynb b/09-VectorStore/99-Master-Template.ipynb
index b12e264a2..25fb37698 100644
--- a/09-VectorStore/99-Master-Template.ipynb
+++ b/09-VectorStore/99-Master-Template.ipynb
@@ -28,17 +28,8 @@
     "- [Environment Setup](#environment-setup)\n",
     "- [What is {vectordb}?](#what-is-{vectordb}?)\n",
     "- [Data](#data)\n",
-    "  - [Introduce Data](#introduce-data)\n",
-    "  - [Preprocessing Data](#preprocessing-data)\n",
     "- [Initial Setting {vectordb}](#initial-setting-{vectordb})\n",
-    "  - [Load Embedding Model](#load-embedding-model)\n",
-    "  - [Load {vectordb} Client](#load-{vectordb}-client)\n",
     "- [Document Manager](#document-manager)\n",
-    "  - [Create Instance](#create-instance)\n",
-    "  - [Upsert Document](#upsert-document)\n",
-    "  - [Upsert Parallel Document](#upsert-parallel-document)\n",
-    "  - [Similarity Search](#similarity-search)\n",
-    "  - [Delete Document](#delete-document)\n",
     "\n",
     "\n",
     "### References\n",
@@ -55,8 +46,8 @@
     "Set up the environment. You may refer to [Environment Setup](https://wikidocs.net/257836) for more details.\n",
     "\n",
     "**[Note]**\n",
-    "- `langchain-opentutorial` is a package that provides a set of easy-to-use environment setup, useful functions and utilities for tutorials. \n",
-    "- You can checkout the [`langchain-opentutorial`](https://github.com/LangChain-OpenTutorial/langchain-opentutorial-pypi) for more details."
+    "- ```langchain-opentutorial``` is a package that provides a set of easy-to-use environment setup, useful functions and utilities for tutorials. \n",
+    "- You can checkout the [```langchain-opentutorial```](https://github.com/LangChain-OpenTutorial/langchain-opentutorial-pypi) for more details."
    ]
   },
   {
@@ -129,7 +120,7 @@
    "id": "8011a0c7",
    "metadata": {},
    "source": [
-    "You can alternatively set API keys such as `OPENAI_API_KEY` in a `.env` file and load them.\n",
+    "You can alternatively set API keys such as ```OPENAI_API_KEY``` in a ```.env``` file and load them.\n",
     "\n",
     "[Note] This is not necessary if you've already set the required API keys in previous steps."
    ]
@@ -198,9 +189,9 @@
    "source": [
     "### Preprocessing Data\n",
     "\n",
-    "In this tutorial section, we will preprocess the text data from The Little Prince and convert it into a list of `LangChain Document` objects with metadata. \n",
+    "In this tutorial section, we will preprocess the text data from The Little Prince and convert it into a list of ```LangChain Document``` objects with metadata. \n",
     "\n",
-    "Each document chunk will include a `title` field in the metadata, extracted from the first line of each section."
+    "Each document chunk will include a ```title``` field in the metadata, extracted from the first line of each section."
    ]
   },
   {
@@ -305,7 +296,7 @@
     "\n",
     "In the **Load Embedding Model** section, you'll learn how to load an embedding model.\n",
     "\n",
-    "This tutorial uses **OpenAI** 's **API-Key** for loading the model.\n",
+    "This tutorial uses **OpenAI's** **API-Key** for loading the model.\n",
     "\n",
     "*💡 If you prefer to use another embedding model, see the instructions below.*\n",
     "- [Embedding Models](https://python.langchain.com/docs/integrations/text_embedding/)"
@@ -396,13 +387,13 @@
     "\n",
     "The following operations are included:\n",
     "\n",
-    "- `upsert` : Update existing documents or insert if they don’t exist\n",
+    "- ```upsert``` : Update existing documents or insert if they don’t exist\n",
     "\n",
-    "- `upsert_parallel` : Perform upserts in parallel for large-scale data\n",
+    "- ```upsert_parallel``` : Perform upserts in parallel for large-scale data\n",
     "\n",
-    "- `similarity_search` : Search for similar documents based on embeddings\n",
+    "- ```similarity_search``` : Search for similar documents based on embeddings\n",
     "\n",
-    "- `delete` : Remove documents based on filter conditions\n",
+    "- ```delete``` : Remove documents based on filter conditions\n",
     "\n",
     "Each of these features is implemented as class methods specific to each VectorDB.\n",
     "\n",
@@ -442,7 +433,7 @@
    "id": "c1c0c67f",
    "metadata": {},
    "source": [
-    "Now you can use the following **CRUD** operations with the `crud_manager` instance.\n",
+    "Now you can use the following **CRUD** operations with the ```crud_manager``` instance.\n",
     "\n",
     "These instance allow you to easily manage documents in your **{vectordb}** ."
    ]
@@ -458,13 +449,13 @@
     "\n",
     "**✅ Args**\n",
     "\n",
-    "- `texts` : Iterable[str] – List of text contents to be inserted/updated.\n",
+    "- ```texts``` : Iterable[str] – List of text contents to be inserted/updated.\n",
     "\n",
-    "- `metadatas` : Optional[List[Dict]] – List of metadata dictionaries for each text (optional).\n",
+    "- ```metadatas``` : Optional[List[Dict]] – List of metadata dictionaries for each text (optional).\n",
     "\n",
-    "- `ids` : Optional[List[str]] – Custom IDs for the documents. If not provided, IDs will be auto-generated.\n",
+    "- ```ids``` : Optional[List[str]] – Custom IDs for the documents. If not provided, IDs will be auto-generated.\n",
     "\n",
-    "- `**kwargs` : Extra arguments for the underlying vector store.\n",
+    "- ```**kwargs``` : Extra arguments for the underlying vector store.\n",
     "\n",
     "**🔄 Return**\n",
     "\n",
@@ -505,17 +496,17 @@
     "\n",
     "**✅ Args**\n",
     "\n",
-    "- `texts` : Iterable[str] – List of text contents to be inserted/updated.\n",
+    "- ```texts``` : Iterable[str] – List of text contents to be inserted/updated.\n",
     "\n",
-    "- `metadatas` : Optional[List[Dict]] – List of metadata dictionaries for each text (optional).\n",
+    "- ```metadatas``` : Optional[List[Dict]] – List of metadata dictionaries for each text (optional).\n",
     "\n",
-    "- `ids` : Optional[List[str]] – Custom IDs for the documents. If not provided, IDs will be auto-generated.\n",
+    "- ```ids``` : Optional[List[str]] – Custom IDs for the documents. If not provided, IDs will be auto-generated.\n",
     "\n",
-    "- `batch_size` : int – Number of documents per batch (default: 32).\n",
+    "- ```batch_size``` : int – Number of documents per batch (default: 32).\n",
     "\n",
-    "- `workers` : int – Number of parallel workers (default: 10).\n",
+    "- ```workers``` : int – Number of parallel workers (default: 10).\n",
     "\n",
-    "- `**kwargs` : Extra arguments for the underlying vector store.\n",
+    "- ```**kwargs``` : Extra arguments for the underlying vector store.\n",
     "\n",
     "**🔄 Return**\n",
     "\n",
@@ -559,15 +550,15 @@
     "\n",
     "**✅ Args**\n",
     "\n",
-    "- `query` : str – The text query for similarity search.\n",
+    "- ```query``` : str – The text query for similarity search.\n",
     "\n",
-    "- `k` : int – Number of top results to return (default: 10).\n",
+    "- ```k``` : int – Number of top results to return (default: 10).\n",
     "\n",
-    "`**kwargs` : Additional search options (e.g., filters).\n",
+    "```**kwargs``` : Additional search options (e.g., filters).\n",
     "\n",
     "**🔄 Return**\n",
     "\n",
-    "- `results` : List[Document] – A list of LangChain Document objects ranked by similarity."
+    "- ```results``` : List[Document] – A list of LangChain Document objects ranked by similarity."
    ]
   },
   {
@@ -621,11 +612,11 @@
     "\n",
     "**✅ Args**\n",
     "\n",
-    "- `ids` : Optional[List[str]] – List of document IDs to delete. If None, deletion is based on filter.\n",
+    "- ```ids``` : Optional[List[str]] – List of document IDs to delete. If None, deletion is based on filter.\n",
     "\n",
-    "- `filters` : Optional[Dict] – Dictionary specifying filter conditions (e.g., metadata match).\n",
+    "- ```filters``` : Optional[Dict] – Dictionary specifying filter conditions (e.g., metadata match).\n",
     "\n",
-    "- `**kwargs` : Any additional parameters.\n",
+    "- ```**kwargs``` : Any additional parameters.\n",
     "\n",
     "**🔄 Return**\n",
     "\n",

From ea4014e7f4e6054fc996cede0ce0957a2ac72b5b Mon Sep 17 00:00:00 2001
From: Gwangwon Jung <rhdk5148@gmail.com>
Date: Mon, 5 May 2025 10:53:37 +0900
Subject: [PATCH 4/5] [N-2] 09-Vector Store / 99-Master-Template - add
 `as_retriever` : LightCustomRetriever for tutorial      - Fixed
 Master-Template      - vectordbinterface.py

---
 09-VectorStore/99-Master-Template.ipynb   | 60 +++++++++++++++++++++--
 09-VectorStore/utils/vectordbinterface.py | 41 +++++++++++++++-
 2 files changed, 97 insertions(+), 4 deletions(-)

diff --git a/09-VectorStore/99-Master-Template.ipynb b/09-VectorStore/99-Master-Template.ipynb
index 25fb37698..bd54df77e 100644
--- a/09-VectorStore/99-Master-Template.ipynb
+++ b/09-VectorStore/99-Master-Template.ipynb
@@ -232,7 +232,7 @@
     "        # Extract title from the first line using square brackets [ ]\n",
     "        first_line = lines[0]\n",
     "        title_match = re.search(r\"\\[(.*?)\\]\", first_line)\n",
-    "        title = title_match.group(1).strip() if title_match else None\n",
+    "        title = title_match.group(1).strip() if title_match else \"\"\n",
     "\n",
     "        # Remove the title line from content\n",
     "        body = \"\\n\".join(lines[1:]).strip()\n",
@@ -477,7 +477,7 @@
     "\n",
     "args = {\n",
     "    \"texts\": [doc.page_content for doc in docs[:2]],\n",
-    "    \"metadatas\": [doc.metadata[\"title\"] for doc in docs[:2]],\n",
+    "    \"metadatas\": [doc.metadata for doc in docs[:2]],\n",
     "    \"ids\": [str(uuid4()) for _ in docs[:2]]\n",
     "    # if you want args, add params.\n",
     "}\n",
@@ -528,7 +528,7 @@
     "\n",
     "args = {\n",
     "    \"texts\": [doc.page_content for doc in docs],\n",
-    "    \"metadatas\": [doc.metadata[\"title\"] for doc in docs],\n",
+    "    \"metadatas\": [doc.metadata for doc in docs],\n",
     "    \"ids\": [str(uuid4()) for _ in docs]\n",
     "    # if you want args, add params.\n",
     "}\n",
@@ -601,6 +601,60 @@
     "#     print()"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "f140c0e2",
+   "metadata": {},
+   "source": [
+    "### As_retrever\n",
+    "\n",
+    "The ```as_retriever()``` method creates a LangChain-compatible retriever wrapper.\n",
+    "\n",
+    "This function allows a ```DocumentManager``` class to return a retriever object by wrapping the internal ```search()``` method, while staying lightweight and independent from full LangChain ```VectorStore``` dependencies.\n",
+    "\n",
+    "The retriever obtained through this function can be used the same as the existing LangChain retriever and is **compatible with LangChain Pipeline(e.g. RetrievalQA,ConversationalRetrievalChain,Tool,...)**.\n",
+    "\n",
+    "**✅ Args**\n",
+    "\n",
+    "- ```search_fn``` : Callable - The function used to retrieve relevant documents. Typically this is ```self.search``` from a ```DocumentManager``` instance.\n",
+    "\n",
+    "- ```search_kwargs``` : Optional[Dict] - A dictionary of keyword arguments passed to ```search_fn```, such as ```k``` for top-K results or metadata filters.\n",
+    "\n",
+    "**🔄 Return**\n",
+    "\n",
+    "- ```LightCustomRetriever``` :BaseRetriever - A lightweight LangChain-compatible retriever that internally uses the given ```search_fn``` and ```search_kwargs```."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "86de7842",
+   "metadata": {
+    "vscode": {
+     "languageId": "plaintext"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "# ret = crud_manager.as_retriever(\n",
+    "#     search_fn=crud_manager.search, search_kwargs=<kwargs> # e.g. {\"k\": 1, \"where\": {\"title\": \"\"}}\n",
+    "# )"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "7142d29c",
+   "metadata": {
+    "vscode": {
+     "languageId": "plaintext"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "# ret.invoke(\"Which asteroid did the little prince come from?\")"
+   ]
+  },
   {
    "cell_type": "markdown",
    "id": "9ad0ed0c",
diff --git a/09-VectorStore/utils/vectordbinterface.py b/09-VectorStore/utils/vectordbinterface.py
index bc304bde3..23c330cdc 100644
--- a/09-VectorStore/utils/vectordbinterface.py
+++ b/09-VectorStore/utils/vectordbinterface.py
@@ -67,7 +67,18 @@ def delete(
 New Interface for VectorDB CRUD
 """
 
-from typing import Optional, List, Iterable, Any, Dict
+from typing import Optional, List, Iterable, Any, Dict, Callable
+from langchain_core.retrievers import BaseRetriever
+from langchain_core.documents import Document
+from pydantic import Field
+
+
+class LightCustomRetriever(BaseRetriever):
+    search_fn: Callable
+    search_kwargs: Dict = Field(default_factory=dict)
+
+    def get_relevant_documents(self, query: str) -> List[Document]:
+        return self.search_fn(query, **self.search_kwargs)
 
 
 class DocumentManager(ABC):
@@ -128,3 +139,31 @@ def delete(
 
         """
         pass
+
+    def as_retriever(
+        self, search_fn: Callable, search_kwargs: Dict = {}
+    ) -> LightCustomRetriever:
+        """
+        Create a LangChain-compatible retriever using a custom search function.
+
+        This method wraps a provided search function and its keyword arguments
+        into a `LightCustomRetriever` object that conforms to LangChain's `BaseRetriever` interface.
+        Useful for integrating lightweight, SDK-based CRUD search implementations with LangChain chains.
+
+        Args:
+            search_fn (Callable):
+                The function that performs the similarity search and returns a list of `Document` objects.
+                Typically this is the `search()` method of the DocumentManager.
+            search_kwargs (Dict, optional):
+                Additional keyword arguments to pass into the `search_fn`.
+                Example: {'k': 5} to retrieve top 5 similar documents.
+
+        Returns:
+            LightCustomRetriever:
+                A retriever instance that can be used with LangChain chains like `RetrievalQA`
+                or `ConversationalRetrievalChain`.
+        """
+        retriever = LightCustomRetriever(
+            search_fn=search_fn, search_kwargs=search_kwargs
+        )
+        return retriever

From ddb35c5fb71bee28a91809112d9f473e32c15095 Mon Sep 17 00:00:00 2001
From: Gwangwon Jung <rhdk5148@gmail.com>
Date: Tue, 6 May 2025 21:52:40 +0900
Subject: [PATCH 5/5] [N-2] 09-Vector Store / 99-Master-Template - HotFix     -
 Add `What is {vectordb}?`     - Changed `with open("the_little_prince.txt"~`
 -> `with open("./data/the_little_prince.txt"~`

---
 09-VectorStore/99-Master-Template.ipynb | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/09-VectorStore/99-Master-Template.ipynb b/09-VectorStore/99-Master-Template.ipynb
index bd54df77e..1fe051821 100644
--- a/09-VectorStore/99-Master-Template.ipynb
+++ b/09-VectorStore/99-Master-Template.ipynb
@@ -146,6 +146,8 @@
    "id": "6890920d",
    "metadata": {},
    "source": [
+    "## What is {vectordb}?\n",
+    "\n",
     "Please write down what you need to set up the Vectorstore here."
    ]
   },
@@ -263,7 +265,7 @@
    "outputs": [],
    "source": [
     "# Load the entire text file\n",
-    "with open(\"the_little_prince.txt\", \"r\", encoding=\"utf-8\") as f:\n",
+    "with open(\"./data/the_little_prince.txt\", \"r\", encoding=\"utf-8\") as f:\n",
     "    content = f.read()\n",
     "\n",
     "# Preprocessing Data\n",