Thanks to visit codestin.com
Credit goes to github.com

Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
152 changes: 74 additions & 78 deletions 10-Retriever/06-MultiQueryRetriever.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -12,32 +12,34 @@
"- Peer Review: \n",
"- This is a part of [LangChain OpenTutorial](https://github.com/LangChain-OpenTutorial/LangChain-OpenTutorial)\n",
"\n",
"[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/langchain-ai/langchain-academy/blob/main/module-4/sub-graph.ipynb) [![Open in LangChain Academy](https://cdn.prod.website-files.com/65b8cd72835ceeacd4449a53/66e9eba12c7b7688aa3dbb5e_LCA-badge-green.svg)](https://academy.langchain.com/courses/take/intro-to-langgraph/lessons/58239937-lesson-2-sub-graphs)\n",
"[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/LangChain-OpenTutorial/LangChain-OpenTutorial/blob/main/10-Retriever/06-MultiQueryRetriever.ipynb) [![Open in GitHub](https://img.shields.io/badge/Open%20in%20GitHub-181717?style=flat-square&logo=github&logoColor=white)](https://github.com/LangChain-OpenTutorial/LangChain-OpenTutorial/blob/main/10-Retriever/06-MultiQueryRetriever.ipynb)\n",
"\n",
"\n",
"## Overview\n",
"\n",
"`MultiQueryRetriever` offers a thoughtful approach to improving distance-based vector database searches by generating diverse queries with the help of a Language Learning Model (LLM). This method simplifies the search process, minimizes the need for manual prompt adjustments, and aims to provide more nuanced and comprehensive results.\n",
"`MultiQueryRetriever` offers a thoughtful approach to improving distance-based vector database retrieval by generating diverse queries with the help of an LLM. \n",
"\n",
"This method simplifies the retrieval process, minimizes the need for manual prompt adjustments, and aims to provide more nuanced and comprehensive results.\n",
"\n",
"- **Understanding Distance-Based Vector Search** \n",
" Distance-based vector search is a technique that identifies documents with embeddings similar to a query embedding based on their \"distance\" in high-dimensional space. However, subtle variations in query details or embedding representations can occasionally make it challenging to fully capture the intended meaning, which might affect the search results.\n",
" Distance-based vector search is a technique that identifies documents with embeddings similar to a query embedding based on their 'distance' in a high-dimensional space. However, subtle variations in query details or embedding representations can occasionally make it challenging to fully capture the intended meaning, which might affect the search results.\n",
"\n",
"- **Streamlined Prompt Tuning** \n",
" MultiQueryRetriever reduces the complexity of prompt tuning by utilizing an LLM to automatically generate multiple queries from different perspectives for a single input. This helps minimize the effort required for manual adjustments or prompt engineering.\n",
" `MultiQueryRetriever` reduces the complexity of prompt tuning by utilizing an LLM to automatically generate multiple queries from different perspectives for a single input. This helps minimize the effort required for manual adjustments or prompt engineering.\n",
"\n",
"- **Broader Document Retrieval** \n",
" Each generated query is used to perform a search, and the unique documents retrieved from all queries are combined. This approach helps uncover a wider range of potentially relevant documents, increasing the chances of retrieving valuable information.\n",
"\n",
"- **Improved Search Robustness** \n",
" By exploring a question from multiple perspectives through diverse queries, MultiQueryRetriever addresses some of the limitations of distance-based searches. This approach can better account for nuanced differences and deeper meanings in the data, leading to more contextually relevant and well-rounded results.\n",
" By exploring a question from multiple perspectives through diverse queries, `MultiQueryRetriever` addresses some of the limitations of distance-based searches. This approach can better account for nuanced differences and deeper meanings in the data, leading to more contextually relevant and well-rounded results.\n",
"\n",
"### Table of Contents\n",
"\n",
"- [Overview](#overview)\n",
"- [Environment Setup](#environment-setup)\n",
"- [Building a Vector Database](#Building-a-Vector-Database)\n",
"- [Building a Vector Database](#building-a-vector-database)\n",
"- [Usage](#usage)\n",
"- [How to use the LCEL Chain](#how-to-use-the-LCEL-Chain)\n",
"- [How to Use the LCEL Chain](#how-to-use-the-lcel-chain)\n",
"\n",
"### References\n",
"\n",
Expand All @@ -62,32 +64,10 @@
},
{
"cell_type": "code",
"execution_count": 1,
"execution_count": null,
"id": "330d1c0a",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"WARNING: Ignoring invalid distribution -angchain-community (c:\\users\\user\\appdata\\local\\programs\\python\\python310\\lib\\site-packages)\n",
"WARNING: Ignoring invalid distribution -orch (c:\\users\\user\\appdata\\local\\programs\\python\\python310\\lib\\site-packages)\n",
"WARNING: Ignoring invalid distribution -rotobuf (c:\\users\\user\\appdata\\local\\programs\\python\\python310\\lib\\site-packages)\n",
"WARNING: Ignoring invalid distribution -treamlit (c:\\users\\user\\appdata\\local\\programs\\python\\python310\\lib\\site-packages)\n",
"WARNING: Error parsing dependencies of torchsde: .* suffix can only be used with `==` or `!=` operators\n",
" numpy (>=1.19.*) ; python_version >= \"3.7\"\n",
" ~~~~~~~^\n",
"WARNING: Ignoring invalid distribution -angchain-community (c:\\users\\user\\appdata\\local\\programs\\python\\python310\\lib\\site-packages)\n",
"WARNING: Ignoring invalid distribution -orch (c:\\users\\user\\appdata\\local\\programs\\python\\python310\\lib\\site-packages)\n",
"WARNING: Ignoring invalid distribution -rotobuf (c:\\users\\user\\appdata\\local\\programs\\python\\python310\\lib\\site-packages)\n",
"WARNING: Ignoring invalid distribution -treamlit (c:\\users\\user\\appdata\\local\\programs\\python\\python310\\lib\\site-packages)\n",
"WARNING: Ignoring invalid distribution -angchain-community (c:\\users\\user\\appdata\\local\\programs\\python\\python310\\lib\\site-packages)\n",
"WARNING: Ignoring invalid distribution -orch (c:\\users\\user\\appdata\\local\\programs\\python\\python310\\lib\\site-packages)\n",
"WARNING: Ignoring invalid distribution -rotobuf (c:\\users\\user\\appdata\\local\\programs\\python\\python310\\lib\\site-packages)\n",
"WARNING: Ignoring invalid distribution -treamlit (c:\\users\\user\\appdata\\local\\programs\\python\\python310\\lib\\site-packages)\n"
]
}
],
"outputs": [],
"source": [
"%%capture --no-stderr\n",
"%pip install langchain-opentutorial"
Expand Down Expand Up @@ -117,31 +97,6 @@
{
"cell_type": "code",
"execution_count": 3,
"id": "8ee62e07",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Configuration file to manage API keys as environment variables\n",
"from dotenv import load_dotenv\n",
"\n",
"# Load API key information\n",
"load_dotenv()"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "0a6a2728",
"metadata": {},
"outputs": [
Expand All @@ -166,19 +121,58 @@
")"
]
},
{
"cell_type": "markdown",
"id": "8a8221e6",
"metadata": {},
"source": [
"Alternatively, environment variables can also be set using a `.env` file.\n",
"\n",
"**[Note]**\n",
"\n",
"- This is not necessary if you've already set the environment variables in the previous step."
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "8ee62e07",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Configuration file to manage API keys as environment variables\n",
"from dotenv import load_dotenv\n",
"\n",
"# Load API key information\n",
"load_dotenv()"
]
},
{
"cell_type": "markdown",
"id": "d6c14b5b",
"metadata": {},
"source": [
"## Building a Vector Database\n",
"\n",
"Vector databases enable efficient retrieval of relevant documents by embedding textual data into a high-dimensional vector space. This example demonstrates creating a simple vector database using LangChain, which involves loading and splitting a document, generating embeddings with OpenAI, and performing a search query to retrieve contextually relevant information."
"Vector databases enable efficient retrieval of relevant documents by embedding text data into a high-dimensional vector space. \n",
"\n",
"This example demonstrates creating a simple vector database using LangChain, which involves loading and splitting a document, generating embeddings with OpenAI, and performing a search query to retrieve contextually relevant information."
]
},
{
"cell_type": "code",
"execution_count": 18,
"execution_count": 13,
"id": "dae75cb3",
"metadata": {},
"outputs": [
Expand Down Expand Up @@ -261,7 +255,7 @@
},
{
"cell_type": "code",
"execution_count": 8,
"execution_count": 12,
"id": "1637815b",
"metadata": {},
"outputs": [],
Expand Down Expand Up @@ -289,12 +283,12 @@
"\n",
"First, we retrieve the `\"langchain.retrievers.multi_query\"` logger.\n",
"\n",
"This is done using the `logging.getLogger()` function. Then, we set the logger's log level to `INFO`, so that only log messages at the `INFO` level or above are printed.\n"
"This is done using the `logging.getLogger` method. Then, we set the logger's log level to `INFO`, so that only log messages at the `INFO` level or above are printed.\n"
]
},
{
"cell_type": "code",
"execution_count": 9,
"execution_count": 7,
"id": "901d1749",
"metadata": {},
"outputs": [],
Expand All @@ -313,28 +307,30 @@
"source": [
"This code uses the `invoke` method of the `retriever_from_llm` object to search for documents relevant to the given `question`.\n",
"\n",
"The retrieved documents are stored in the variable `relevant_docs`, and checking the length of this variable lets you see how many relevant documents were found. Through this process, you can effectively locate information related to the user's question and assess how much of it is available.\n"
"The retrieved documents are stored in the variable `relevant_docs`, and checking the length of this variable lets you see how many relevant documents were found.\n",
"\n",
"Through this process, you can effectively locate information related to the user's question and assess how much of it is available.\n"
]
},
{
"cell_type": "code",
"execution_count": 10,
"execution_count": 8,
"id": "e2e305f8",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"INFO:langchain.retrievers.multi_query:Generated queries: ['What are the main components and structural design of the LangChain framework?', 'Can you describe the essential characteristics and architectural elements of the LangChain framework?', 'What are the fundamental features and the architecture behind the LangChain framework?']\n"
"INFO:langchain.retrievers.multi_query:Generated queries: ['What are the main components and architectural design of the LangChain framework?', 'Can you describe the essential characteristics and structure of the LangChain framework?', 'What are the significant features and the underlying architecture of the LangChain framework?']\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"===============\n",
"Number of retrieved documents: 5\n",
"Number of retrieved documents: 6\n",
"===============\n",
"noteThese docs focus on the Python LangChain library. Head here for docs on the JavaScript LangChain library.\n",
"Architecture​\n",
Expand Down Expand Up @@ -364,15 +360,15 @@
"id": "81695892",
"metadata": {},
"source": [
"## How to use the LCEL Chain\n",
"## How to Use the LCEL Chain\n",
"\n",
"- Define a custom prompt, then create a Chain with that prompt.\n",
"- When the Chain receives a user question (in the following example), it generates 5 questions, and returns the 5 generated questions separated by \"\\n\".\n"
"- Define a custom prompt, then create a `Chain` with that prompt.\n",
"- When the `Chain` receives a user question (in the following example), it generates 5 questions, and returns the 5 generated questions separated by '\\n'.\n"
]
},
{
"cell_type": "code",
"execution_count": 19,
"execution_count": 9,
"id": "0ab98687",
"metadata": {},
"outputs": [
Expand All @@ -381,10 +377,10 @@
"output_type": "stream",
"text": [
"What are the main components and structure of the LangChain framework? \n",
"Can you describe the architecture and essential features of LangChain? \n",
"What are the significant characteristics and design of the LangChain framework? \n",
"Could you provide an overview of the LangChain framework's architecture and its key features? \n",
"What should I know about the LangChain framework's architecture and its primary functionalities? \n"
"Can you describe the architecture and essential characteristics of LangChain? \n",
"What are the significant features and design elements of the LangChain framework? \n",
"How is the LangChain framework structured, and what are its key functionalities? \n",
"Could you provide an overview of the LangChain framework's architecture and its primary features? \n"
]
}
],
Expand Down Expand Up @@ -429,12 +425,12 @@
"id": "0c6403eb",
"metadata": {},
"source": [
"You can pass the previously created Chain to `MultiQueryRetriever` to perform retrieval."
"You can pass the previously created `Chain` to the `MultiQueryRetriever` to perform retrieval."
]
},
{
"cell_type": "code",
"execution_count": 13,
"execution_count": 10,
"id": "5f3cac81",
"metadata": {},
"outputs": [],
Expand All @@ -449,20 +445,20 @@
"id": "086076bb",
"metadata": {},
"source": [
"Use `MultiQueryRetriever` to search documents and check the results."
"Use the `MultiQueryRetriever` to search documents and check the results."
]
},
{
"cell_type": "code",
"execution_count": 14,
"execution_count": 11,
"id": "6eaffe30",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"INFO:langchain.retrievers.multi_query:Generated queries: ['What are the main characteristics and structure of the LangChain framework?', 'Can you describe the essential features and design of the LangChain framework?', 'Could you provide an overview of the key components and architecture of the LangChain framework?', 'What are the fundamental aspects and architectural elements of the LangChain framework?', 'Please outline the primary features and framework architecture of LangChain.']\n"
"INFO:langchain.retrievers.multi_query:Generated queries: ['What are the main characteristics and structure of the LangChain framework? ', 'Can you describe the essential features and design of the LangChain framework? ', 'Could you provide an overview of the key components and architecture of the LangChain framework? ', 'What are the fundamental aspects and architectural elements of the LangChain framework? ', 'Please outline the primary features and framework architecture of LangChain.']\n"
]
},
{
Expand Down
Loading