From 94f9c2b4f77cd792a44bc9c75a787c17c4632126 Mon Sep 17 00:00:00 2001 From: solon Date: Sat, 18 Jan 2025 21:55:48 +0900 Subject: [PATCH 1/2] =?UTF-8?q?=EC=9D=B4=EC=8A=88=EB=B0=98=EC=98=81?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- 10-Retriever/06-MultiQueryRetriever.ipynb | 126 ++++++++++++---------- 1 file changed, 72 insertions(+), 54 deletions(-) diff --git a/10-Retriever/06-MultiQueryRetriever.ipynb b/10-Retriever/06-MultiQueryRetriever.ipynb index 2f98bd114..540f2fe10 100644 --- a/10-Retriever/06-MultiQueryRetriever.ipynb +++ b/10-Retriever/06-MultiQueryRetriever.ipynb @@ -12,32 +12,34 @@ "- Peer Review: \n", "- This is a part of [LangChain OpenTutorial](https://github.com/LangChain-OpenTutorial/LangChain-OpenTutorial)\n", "\n", - "[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/langchain-ai/langchain-academy/blob/main/module-4/sub-graph.ipynb) [![Open in LangChain Academy](https://cdn.prod.website-files.com/65b8cd72835ceeacd4449a53/66e9eba12c7b7688aa3dbb5e_LCA-badge-green.svg)](https://academy.langchain.com/courses/take/intro-to-langgraph/lessons/58239937-lesson-2-sub-graphs)\n", + "[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/LangChain-OpenTutorial/LangChain-OpenTutorial/blob/main/10-Retriever/06-MultiQueryRetriever.ipynb) [![Open in GitHub](https://img.shields.io/badge/Open%20in%20GitHub-181717?style=flat-square&logo=github&logoColor=white)](https://github.com/LangChain-OpenTutorial/LangChain-OpenTutorial/blob/main/10-Retriever/06-MultiQueryRetriever.ipynb)\n", "\n", "\n", "## Overview\n", "\n", - "`MultiQueryRetriever` offers a thoughtful approach to improving distance-based vector database searches by generating diverse queries with the help of a Language Learning Model (LLM). This method simplifies the search process, minimizes the need for manual prompt adjustments, and aims to provide more nuanced and comprehensive results.\n", + "`MultiQueryRetriever` offers a thoughtful approach to improving distance-based vector database retrieval by generating diverse queries with the help of an LLM. \n", + "\n", + "This method simplifies the retrieval process, minimizes the need for manual prompt adjustments, and aims to provide more nuanced and comprehensive results.\n", "\n", "- **Understanding Distance-Based Vector Search** \n", - " Distance-based vector search is a technique that identifies documents with embeddings similar to a query embedding based on their \"distance\" in high-dimensional space. However, subtle variations in query details or embedding representations can occasionally make it challenging to fully capture the intended meaning, which might affect the search results.\n", + " Distance-based vector search is a technique that identifies documents with embeddings similar to a query embedding based on their 'distance' in a high-dimensional space. However, subtle variations in query details or embedding representations can occasionally make it challenging to fully capture the intended meaning, which might affect the search results.\n", "\n", "- **Streamlined Prompt Tuning** \n", - " MultiQueryRetriever reduces the complexity of prompt tuning by utilizing an LLM to automatically generate multiple queries from different perspectives for a single input. This helps minimize the effort required for manual adjustments or prompt engineering.\n", + " `MultiQueryRetriever` reduces the complexity of prompt tuning by utilizing an LLM to automatically generate multiple queries from different perspectives for a single input. This helps minimize the effort required for manual adjustments or prompt engineering.\n", "\n", "- **Broader Document Retrieval** \n", " Each generated query is used to perform a search, and the unique documents retrieved from all queries are combined. This approach helps uncover a wider range of potentially relevant documents, increasing the chances of retrieving valuable information.\n", "\n", "- **Improved Search Robustness** \n", - " By exploring a question from multiple perspectives through diverse queries, MultiQueryRetriever addresses some of the limitations of distance-based searches. This approach can better account for nuanced differences and deeper meanings in the data, leading to more contextually relevant and well-rounded results.\n", + " By exploring a question from multiple perspectives through diverse queries, `MultiQueryRetriever` addresses some of the limitations of distance-based searches. This approach can better account for nuanced differences and deeper meanings in the data, leading to more contextually relevant and well-rounded results.\n", "\n", "### Table of Contents\n", "\n", "- [Overview](#overview)\n", "- [Environment Setup](#environment-setup)\n", - "- [Building a Vector Database](#Building-a-Vector-Database)\n", + "- [Building a Vector Database](#building-a-vector-database)\n", "- [Usage](#usage)\n", - "- [How to use the LCEL Chain](#how-to-use-the-LCEL-Chain)\n", + "- [How to Use the LCEL Chain](#how-to-use-the-lcel-chain)\n", "\n", "### References\n", "\n", @@ -117,31 +119,6 @@ { "cell_type": "code", "execution_count": 3, - "id": "8ee62e07", - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "True" - ] - }, - "execution_count": 3, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# Configuration file to manage API keys as environment variables\n", - "from dotenv import load_dotenv\n", - "\n", - "# Load API key information\n", - "load_dotenv()" - ] - }, - { - "cell_type": "code", - "execution_count": 4, "id": "0a6a2728", "metadata": {}, "outputs": [ @@ -166,6 +143,43 @@ ")" ] }, + { + "cell_type": "markdown", + "id": "8a8221e6", + "metadata": {}, + "source": [ + "Alternatively, environment variables can also be set using a `.env` file.\n", + "\n", + "**[Note]**\n", + "\n", + "- This is not necessary if you've already set the environment variables in the previous step." + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "8ee62e07", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "True" + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Configuration file to manage API keys as environment variables\n", + "from dotenv import load_dotenv\n", + "\n", + "# Load API key information\n", + "load_dotenv()" + ] + }, { "cell_type": "markdown", "id": "d6c14b5b", @@ -173,12 +187,14 @@ "source": [ "## Building a Vector Database\n", "\n", - "Vector databases enable efficient retrieval of relevant documents by embedding textual data into a high-dimensional vector space. This example demonstrates creating a simple vector database using LangChain, which involves loading and splitting a document, generating embeddings with OpenAI, and performing a search query to retrieve contextually relevant information." + "Vector databases enable efficient retrieval of relevant documents by embedding text data into a high-dimensional vector space. \n", + "\n", + "This example demonstrates creating a simple vector database using LangChain, which involves loading and splitting a document, generating embeddings with OpenAI, and performing a search query to retrieve contextually relevant information." ] }, { "cell_type": "code", - "execution_count": 18, + "execution_count": 13, "id": "dae75cb3", "metadata": {}, "outputs": [ @@ -261,7 +277,7 @@ }, { "cell_type": "code", - "execution_count": 8, + "execution_count": 12, "id": "1637815b", "metadata": {}, "outputs": [], @@ -289,12 +305,12 @@ "\n", "First, we retrieve the `\"langchain.retrievers.multi_query\"` logger.\n", "\n", - "This is done using the `logging.getLogger()` function. Then, we set the logger's log level to `INFO`, so that only log messages at the `INFO` level or above are printed.\n" + "This is done using the `logging.getLogger` method. Then, we set the logger's log level to `INFO`, so that only log messages at the `INFO` level or above are printed.\n" ] }, { "cell_type": "code", - "execution_count": 9, + "execution_count": 7, "id": "901d1749", "metadata": {}, "outputs": [], @@ -313,12 +329,14 @@ "source": [ "This code uses the `invoke` method of the `retriever_from_llm` object to search for documents relevant to the given `question`.\n", "\n", - "The retrieved documents are stored in the variable `relevant_docs`, and checking the length of this variable lets you see how many relevant documents were found. Through this process, you can effectively locate information related to the user's question and assess how much of it is available.\n" + "The retrieved documents are stored in the variable `relevant_docs`, and checking the length of this variable lets you see how many relevant documents were found.\n", + "\n", + "Through this process, you can effectively locate information related to the user's question and assess how much of it is available.\n" ] }, { "cell_type": "code", - "execution_count": 10, + "execution_count": 8, "id": "e2e305f8", "metadata": {}, "outputs": [ @@ -326,7 +344,7 @@ "name": "stderr", "output_type": "stream", "text": [ - "INFO:langchain.retrievers.multi_query:Generated queries: ['What are the main components and structural design of the LangChain framework?', 'Can you describe the essential characteristics and architectural elements of the LangChain framework?', 'What are the fundamental features and the architecture behind the LangChain framework?']\n" + "INFO:langchain.retrievers.multi_query:Generated queries: ['What are the main components and architectural design of the LangChain framework?', 'Can you describe the essential characteristics and structure of the LangChain framework?', 'What are the significant features and the underlying architecture of the LangChain framework?']\n" ] }, { @@ -334,7 +352,7 @@ "output_type": "stream", "text": [ "===============\n", - "Number of retrieved documents: 5\n", + "Number of retrieved documents: 6\n", "===============\n", "noteThese docs focus on the Python LangChain library. Head here for docs on the JavaScript LangChain library.\n", "Architecture​\n", @@ -364,15 +382,15 @@ "id": "81695892", "metadata": {}, "source": [ - "## How to use the LCEL Chain\n", + "## How to Use the LCEL Chain\n", "\n", - "- Define a custom prompt, then create a Chain with that prompt.\n", - "- When the Chain receives a user question (in the following example), it generates 5 questions, and returns the 5 generated questions separated by \"\\n\".\n" + "- Define a custom prompt, then create a `Chain` with that prompt.\n", + "- When the `Chain` receives a user question (in the following example), it generates 5 questions, and returns the 5 generated questions separated by '\\n'.\n" ] }, { "cell_type": "code", - "execution_count": 19, + "execution_count": 9, "id": "0ab98687", "metadata": {}, "outputs": [ @@ -381,10 +399,10 @@ "output_type": "stream", "text": [ "What are the main components and structure of the LangChain framework? \n", - "Can you describe the architecture and essential features of LangChain? \n", - "What are the significant characteristics and design of the LangChain framework? \n", - "Could you provide an overview of the LangChain framework's architecture and its key features? \n", - "What should I know about the LangChain framework's architecture and its primary functionalities? \n" + "Can you describe the architecture and essential characteristics of LangChain? \n", + "What are the significant features and design elements of the LangChain framework? \n", + "How is the LangChain framework structured, and what are its key functionalities? \n", + "Could you provide an overview of the LangChain framework's architecture and its primary features? \n" ] } ], @@ -429,12 +447,12 @@ "id": "0c6403eb", "metadata": {}, "source": [ - "You can pass the previously created Chain to `MultiQueryRetriever` to perform retrieval." + "You can pass the previously created `Chain` to the `MultiQueryRetriever` to perform retrieval." ] }, { "cell_type": "code", - "execution_count": 13, + "execution_count": 10, "id": "5f3cac81", "metadata": {}, "outputs": [], @@ -449,12 +467,12 @@ "id": "086076bb", "metadata": {}, "source": [ - "Use `MultiQueryRetriever` to search documents and check the results." + "Use the `MultiQueryRetriever` to search documents and check the results." ] }, { "cell_type": "code", - "execution_count": 14, + "execution_count": 11, "id": "6eaffe30", "metadata": {}, "outputs": [ @@ -462,7 +480,7 @@ "name": "stderr", "output_type": "stream", "text": [ - "INFO:langchain.retrievers.multi_query:Generated queries: ['What are the main characteristics and structure of the LangChain framework?', 'Can you describe the essential features and design of the LangChain framework?', 'Could you provide an overview of the key components and architecture of the LangChain framework?', 'What are the fundamental aspects and architectural elements of the LangChain framework?', 'Please outline the primary features and framework architecture of LangChain.']\n" + "INFO:langchain.retrievers.multi_query:Generated queries: ['What are the main characteristics and structure of the LangChain framework? ', 'Can you describe the essential features and design of the LangChain framework? ', 'Could you provide an overview of the key components and architecture of the LangChain framework? ', 'What are the fundamental aspects and architectural elements of the LangChain framework? ', 'Please outline the primary features and framework architecture of LangChain.']\n" ] }, { From e7cf98a3351d3e5a6f755da6b834377732747fa8 Mon Sep 17 00:00:00 2001 From: solon Date: Sun, 19 Jan 2025 14:19:55 +0900 Subject: [PATCH 2/2] Update 06-MultiQueryRetriever.ipynb --- 10-Retriever/06-MultiQueryRetriever.ipynb | 26 ++--------------------- 1 file changed, 2 insertions(+), 24 deletions(-) diff --git a/10-Retriever/06-MultiQueryRetriever.ipynb b/10-Retriever/06-MultiQueryRetriever.ipynb index 540f2fe10..b91b3a79b 100644 --- a/10-Retriever/06-MultiQueryRetriever.ipynb +++ b/10-Retriever/06-MultiQueryRetriever.ipynb @@ -64,32 +64,10 @@ }, { "cell_type": "code", - "execution_count": 1, + "execution_count": null, "id": "330d1c0a", "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "WARNING: Ignoring invalid distribution -angchain-community (c:\\users\\user\\appdata\\local\\programs\\python\\python310\\lib\\site-packages)\n", - "WARNING: Ignoring invalid distribution -orch (c:\\users\\user\\appdata\\local\\programs\\python\\python310\\lib\\site-packages)\n", - "WARNING: Ignoring invalid distribution -rotobuf (c:\\users\\user\\appdata\\local\\programs\\python\\python310\\lib\\site-packages)\n", - "WARNING: Ignoring invalid distribution -treamlit (c:\\users\\user\\appdata\\local\\programs\\python\\python310\\lib\\site-packages)\n", - "WARNING: Error parsing dependencies of torchsde: .* suffix can only be used with `==` or `!=` operators\n", - " numpy (>=1.19.*) ; python_version >= \"3.7\"\n", - " ~~~~~~~^\n", - "WARNING: Ignoring invalid distribution -angchain-community (c:\\users\\user\\appdata\\local\\programs\\python\\python310\\lib\\site-packages)\n", - "WARNING: Ignoring invalid distribution -orch (c:\\users\\user\\appdata\\local\\programs\\python\\python310\\lib\\site-packages)\n", - "WARNING: Ignoring invalid distribution -rotobuf (c:\\users\\user\\appdata\\local\\programs\\python\\python310\\lib\\site-packages)\n", - "WARNING: Ignoring invalid distribution -treamlit (c:\\users\\user\\appdata\\local\\programs\\python\\python310\\lib\\site-packages)\n", - "WARNING: Ignoring invalid distribution -angchain-community (c:\\users\\user\\appdata\\local\\programs\\python\\python310\\lib\\site-packages)\n", - "WARNING: Ignoring invalid distribution -orch (c:\\users\\user\\appdata\\local\\programs\\python\\python310\\lib\\site-packages)\n", - "WARNING: Ignoring invalid distribution -rotobuf (c:\\users\\user\\appdata\\local\\programs\\python\\python310\\lib\\site-packages)\n", - "WARNING: Ignoring invalid distribution -treamlit (c:\\users\\user\\appdata\\local\\programs\\python\\python310\\lib\\site-packages)\n" - ] - } - ], + "outputs": [], "source": [ "%%capture --no-stderr\n", "%pip install langchain-opentutorial"