From cf7775bd2b3ccb5303f08c04fa45578401bb3d9d Mon Sep 17 00:00:00 2001
From: Cha Hwa Young <chahwayoung214@gmail.com>
Date: Wed, 8 Jan 2025 23:58:36 +0900
Subject: [PATCH 1/8] [E-2]11-Reranker/04-FlashRank-Reranker

---
 11-Reranker/04-FlashRank-Reranker.ipynb | 387 ++++++++++++++++++++++++
 11-Reranker/data/appendix-keywords.txt  | 153 ++++++++++
 2 files changed, 540 insertions(+)
 create mode 100644 11-Reranker/04-FlashRank-Reranker.ipynb
 create mode 100644 11-Reranker/data/appendix-keywords.txt

diff --git a/11-Reranker/04-FlashRank-Reranker.ipynb b/11-Reranker/04-FlashRank-Reranker.ipynb
new file mode 100644
index 000000000..9d2922df4
--- /dev/null
+++ b/11-Reranker/04-FlashRank-Reranker.ipynb
@@ -0,0 +1,387 @@
+{
+ "cells": [
+  {
+   "metadata": {},
+   "cell_type": "markdown",
+   "source": [
+    "# FlashRank Reranker\n",
+    "\n",
+    "- Author: [Hwayoung Cha](https://github.com/forwardyoung)\n",
+    "- Design: []()\n",
+    "- Peer Review: []()\n",
+    "\n",
+    "[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/langchain-ai/langchain-academy/blob/main/module-4/sub-graph.ipynb) [![Open in LangChain Academy](https://cdn.prod.website-files.com/65b8cd72835ceeacd4449a53/66e9eba12c7b7688aa3dbb5e_LCA-badge-green.svg)](https://academy.langchain.com/courses/take/intro-to-langgraph/lessons/58239937-lesson-2-sub-graphs)\n",
+    "\n",
+    "## Overview\n",
+    "\n",
+    "> [FlashRank](https://github.com/PrithivirajDamodaran/FlashRank) is an ultra-lightweight and ultra-fast Python library designed to add reranking to existing search and `retrieval` pipelines. It is based on state-of-the-art (`SoTA`) `cross-encoders`.\n",
+    "\n",
+    "This notebook introduces the use of `FlashRank-Reranker` within the LangChain framework, showcasing how to apply reranking techniques to improve the quality of search or `retrieval` results. It provides practical code examples and explanations for integrating `FlashRank` into a LangChain pipeline, highlighting its efficiency and effectiveness. The focus is on leveraging `FlashRank`'s capabilities to enhance the ranking of outputs in a streamlined and scalable way.\n",
+    "\n",
+    "### Table of Contents\n",
+    "\n",
+    "- [Overview](#overview)\n",
+    "- [Environement Setup](#environment-setup)"
+   ],
+   "id": "c69d1f48d21cd2b4"
+  },
+  {
+   "metadata": {},
+   "cell_type": "markdown",
+   "source": [
+    "## Environment Setup\n",
+    "\n",
+    "Set up the environment. You may refer to [Environment Setup](https://wikidocs.net/257836) for more details.\n",
+    "\n",
+    "**[Note]**\n",
+    "- `langchain-opentutorial` is a package that provides a set of easy-to-use environment setup, useful functions and utilities for tutorials. \n",
+    "- You can checkout the [`langchain-opentutorial`](https://github.com/LangChain-OpenTutorial/langchain-opentutorial-pypi) for more details."
+   ],
+   "id": "c7431102d93a694f"
+  },
+  {
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2025-01-08T14:53:36.107286Z",
+     "start_time": "2025-01-08T14:53:35.957725Z"
+    }
+   },
+   "cell_type": "code",
+   "source": [
+    "# Set environment variables\n",
+    "from langchain_opentutorial import set_env\n",
+    "\n",
+    "set_env(\n",
+    "    {\n",
+    "        \"OPENAI_API_KEY\": \"\",\n",
+    "    }\n",
+    ")"
+   ],
+   "id": "501e9dfa010f326a",
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Environment variables have been set successfully.\n"
+     ]
+    }
+   ],
+   "execution_count": 1
+  },
+  {
+   "metadata": {},
+   "cell_type": "markdown",
+   "source": [
+    "You can alternatively set OPENAI_API_KEY in .env file and load it.\n",
+    "\n",
+    "[Note] This is not necessary if you've already set OPENAI_API_KEY in previous steps."
+   ],
+   "id": "7d83ee066d91fb4f"
+  },
+  {
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2025-01-08T14:53:36.123222Z",
+     "start_time": "2025-01-08T14:53:36.108289Z"
+    }
+   },
+   "cell_type": "code",
+   "source": [
+    "# Configuration file to manage API keys as environment variables\n",
+    "from dotenv import load_dotenv\n",
+    "\n",
+    "# Load API key information\n",
+    "load_dotenv(override=True)"
+   ],
+   "id": "abed94e9253ec29e",
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "True"
+      ]
+     },
+     "execution_count": 2,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "execution_count": 2
+  },
+  {
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2025-01-08T14:53:36.127689Z",
+     "start_time": "2025-01-08T14:53:36.124407Z"
+    }
+   },
+   "cell_type": "code",
+   "source": [
+    "# install\n",
+    "# !pip install -qU flashrank"
+   ],
+   "id": "e31774e423dd76fb",
+   "outputs": [],
+   "execution_count": 3
+  },
+  {
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2025-01-08T14:53:36.133428Z",
+     "start_time": "2025-01-08T14:53:36.128691Z"
+    }
+   },
+   "cell_type": "code",
+   "source": [
+    "def pretty_print_docs(docs):\n",
+    "    print(\n",
+    "        f\"\\n{'-' * 100}\\n\".join(\n",
+    "            [\n",
+    "                f\"Document {i+1}:\\n\\n{d.page_content}\\nMetadata: {d.metadata}\"\n",
+    "                for i, d in enumerate(docs)\n",
+    "            ]\n",
+    "        )\n",
+    "    )"
+   ],
+   "id": "43856bcf1e8f0c63",
+   "outputs": [],
+   "execution_count": 4
+  },
+  {
+   "metadata": {},
+   "cell_type": "markdown",
+   "source": [
+    "## FlashrankRerank\n",
+    "\n",
+    "Load data for a simple example and create a retriever."
+   ],
+   "id": "1c7d03faa97bf809"
+  },
+  {
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2025-01-08T14:53:41.320899Z",
+     "start_time": "2025-01-08T14:53:36.134653Z"
+    }
+   },
+   "cell_type": "code",
+   "source": [
+    "from langchain_community.document_loaders import TextLoader\n",
+    "from langchain_community.vectorstores import FAISS\n",
+    "from langchain_text_splitters import RecursiveCharacterTextSplitter\n",
+    "from langchain_openai import OpenAIEmbeddings\n",
+    "\n",
+    "# Load the documents\n",
+    "documents = TextLoader(\"./data/appendix-keywords.txt\").load()\n",
+    "\n",
+    "# Initialized the text splitter\n",
+    "text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=100)\n",
+    "\n",
+    "# Split the documents\n",
+    "texts = text_splitter.split_documents(documents)\n",
+    "\n",
+    "# Add a unique ID to each text\n",
+    "for idx, text in enumerate(texts):\n",
+    "    text.metadata[\"id\"] = idx\n",
+    "    \n",
+    "# Initialize the retriever\n",
+    "retriever = FAISS.from_documents(\n",
+    "    texts, OpenAIEmbeddings()\n",
+    ").as_retriever(search_kwargs={\"k\": 10})\n",
+    "\n",
+    "# query\n",
+    "query = \"Tell me about Word2Vec\"\n",
+    "\n",
+    "# Search for documents\n",
+    "docs = retriever.invoke(query)\n",
+    "\n",
+    "# Print the document\n",
+    "pretty_print_docs(docs)"
+   ],
+   "id": "79d934121fd476be",
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Document 1:\n",
+      "\n",
+      "Word2Vec\n",
+      "Definition: Word2Vec is a technique in NLP that maps words to a vector space, representing their semantic relationships based on context.\n",
+      "Example: In a Word2Vec model, \"king\" and \"queen\" are represented by vectors located close to each other.\n",
+      "Related Keywords: Natural Language Processing (NLP), Embedding, Semantic Similarity\n",
+      "Metadata: {'source': './data/appendix-keywords.txt', 'id': 12}\n",
+      "----------------------------------------------------------------------------------------------------\n",
+      "Document 2:\n",
+      "\n",
+      "Embedding\n",
+      "Definition: Embedding is the process of converting textual data, such as words or sentences, into low-dimensional continuous vectors that computers can process and understand.\n",
+      "Example: The word \"apple\" can be represented as a vector like [0.65, -0.23, 0.17].\n",
+      "Related Keywords: Natural Language Processing (NLP), Vectorization, Deep Learning\n",
+      "Metadata: {'source': './data/appendix-keywords.txt', 'id': 1}\n",
+      "----------------------------------------------------------------------------------------------------\n",
+      "Document 3:\n",
+      "\n",
+      "VectorStore\n",
+      "Definition: A VectorStore is a system designed to store data in vector format, enabling efficient retrieval, classification, and analysis tasks.\n",
+      "Example: Storing word embedding vectors in a database for quick access during semantic search.\n",
+      "Related Keywords: Embedding, Database, Vectorization\n",
+      "Metadata: {'source': './data/appendix-keywords.txt', 'id': 4}\n",
+      "----------------------------------------------------------------------------------------------------\n",
+      "Document 4:\n",
+      "\n",
+      "TF-IDF (Term Frequency-Inverse Document Frequency)\n",
+      "Definition: TF-IDF is a statistical measure used to evaluate the importance of a word within a document by considering its frequency and rarity across a corpus.\n",
+      "Example: Words with high TF-IDF values are often unique and critical for understanding the document.\n",
+      "Related Keywords: Natural Language Processing (NLP), Information Retrieval, Data Mining\n",
+      "Metadata: {'source': './data/appendix-keywords.txt', 'id': 18}\n",
+      "----------------------------------------------------------------------------------------------------\n",
+      "Document 5:\n",
+      "\n",
+      "GPT (Generative Pretrained Transformer)\n",
+      "Definition: GPT is a generative language model pre-trained on vast datasets, capable of performing various text-based tasks. It generates natural and coherent text based on input.\n",
+      "Example: A chatbot generating detailed answers to user queries is powered by GPT models.\n",
+      "Related Keywords: Natural Language Processing (NLP), Text Generation, Deep Learning\n",
+      "Metadata: {'source': './data/appendix-keywords.txt', 'id': 24}\n",
+      "----------------------------------------------------------------------------------------------------\n",
+      "Document 6:\n",
+      "\n",
+      "Transformer\n",
+      "Definition: A Transformer is a type of deep learning model widely used in natural language processing tasks like translation, summarization, and text generation. It is based on the Attention mechanism.\n",
+      "Example: Google Translate utilizes a Transformer model for multilingual translation.\n",
+      "Related Keywords: Deep Learning, Natural Language Processing (NLP), Attention mechanism\n",
+      "Metadata: {'source': './data/appendix-keywords.txt', 'id': 8}\n",
+      "----------------------------------------------------------------------------------------------------\n",
+      "Document 7:\n",
+      "\n",
+      "LLM (Large Language Model)\n",
+      "Definition: LLMs are massive language models trained on large-scale text data, used for various natural language understanding and generation tasks.\n",
+      "Example: OpenAI's GPT series is a prominent example of LLMs.\n",
+      "Related Keywords: Natural Language Processing (NLP), Deep Learning, Text Generation\n",
+      "Metadata: {'source': './data/appendix-keywords.txt', 'id': 13}\n",
+      "----------------------------------------------------------------------------------------------------\n",
+      "Document 8:\n",
+      "\n",
+      "HuggingFace\n",
+      "Definition: HuggingFace is a library offering pre-trained models and tools for natural language processing, making NLP tasks accessible to researchers and developers.\n",
+      "Example: HuggingFace's Transformers library can be used for sentiment analysis and text generation.\n",
+      "Related Keywords: Natural Language Processing (NLP), Deep Learning, Library.\n",
+      "Metadata: {'source': './data/appendix-keywords.txt', 'id': 9}\n",
+      "----------------------------------------------------------------------------------------------------\n",
+      "Document 9:\n",
+      "\n",
+      "Tokenizer\n",
+      "Definition: A tokenizer is a tool that splits text data into tokens, often used for preprocessing in natural language processing tasks.\n",
+      "Example: The sentence \"I love programming.\" is tokenized into [\"I\", \"love\", \"programming\", \".\"].\n",
+      "Related Keywords: Tokenization, Natural Language Processing (NLP), Syntax Analysis.\n",
+      "Metadata: {'source': './data/appendix-keywords.txt', 'id': 3}\n",
+      "----------------------------------------------------------------------------------------------------\n",
+      "Document 10:\n",
+      "\n",
+      "Semantic Search\n",
+      "Definition: Semantic search is a search technique that understands the meaning of a user's query beyond simple keyword matching, returning results that are contextually relevant.\n",
+      "Example: If a user searches for \"planets in the solar system,\" the system provides information about planets like Jupiter and Mars.\n",
+      "Related Keywords: Natural Language Processing (NLP), Search Algorithms, Data Mining\n",
+      "Metadata: {'source': './data/appendix-keywords.txt', 'id': 0}\n"
+     ]
+    }
+   ],
+   "execution_count": 5
+  },
+  {
+   "metadata": {},
+   "cell_type": "markdown",
+   "source": "Now, let's wrap the base `retriever` with a `ContextualCompressionRetriever` and use `FlashrankRerank` as the compressor.",
+   "id": "ea07e244c9171d26"
+  },
+  {
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2025-01-08T14:53:42.781060Z",
+     "start_time": "2025-01-08T14:53:41.323926Z"
+    }
+   },
+   "cell_type": "code",
+   "source": [
+    "from langchain.retrievers import ContextualCompressionRetriever\n",
+    "from langchain.retrievers.document_compressors import FlashrankRerank\n",
+    "from langchain_openai import ChatOpenAI\n",
+    "\n",
+    "# Initialize the LLM\n",
+    "llm = ChatOpenAI(temperature=0)\n",
+    "\n",
+    "# Initialize the FlshrankRerank\n",
+    "compressor = FlashrankRerank(model=\"ms-marco-MultiBERT-L-12\")\n",
+    "\n",
+    "# Initialize the ContextualCompressioinRetriever\n",
+    "compression_retriever = ContextualCompressionRetriever(\n",
+    "    base_compressor=compressor, base_retriever=retriever\n",
+    ")\n",
+    "\n",
+    "# Search for compressed documents\n",
+    "compressed_docs = compression_retriever.invoke(\n",
+    "    \"Tell me about Word2Vec.\"\n",
+    ")\n",
+    "\n",
+    "# Print the document ID\n",
+    "print([doc.metadata[\"id\"] for doc in compressed_docs])"
+   ],
+   "id": "23a21f9f025132c5",
+   "outputs": [
+    {
+     "ename": "PydanticUserError",
+     "evalue": "`FlashrankRerank` is not fully defined; you should define `Ranker`, then call `FlashrankRerank.model_rebuild()`.\n\nFor further information visit https://errors.pydantic.dev/2.9/u/class-not-fully-defined",
+     "output_type": "error",
+     "traceback": [
+      "\u001B[1;31m---------------------------------------------------------------------------\u001B[0m",
+      "\u001B[1;31mPydanticUserError\u001B[0m                         Traceback (most recent call last)",
+      "Cell \u001B[1;32mIn[6], line 9\u001B[0m\n\u001B[0;32m      6\u001B[0m llm \u001B[38;5;241m=\u001B[39m ChatOpenAI(temperature\u001B[38;5;241m=\u001B[39m\u001B[38;5;241m0\u001B[39m)\n\u001B[0;32m      8\u001B[0m \u001B[38;5;66;03m# Initialize the FlshrankRerank\u001B[39;00m\n\u001B[1;32m----> 9\u001B[0m compressor \u001B[38;5;241m=\u001B[39m \u001B[43mFlashrankRerank\u001B[49m\u001B[43m(\u001B[49m\u001B[43mmodel\u001B[49m\u001B[38;5;241;43m=\u001B[39;49m\u001B[38;5;124;43m\"\u001B[39;49m\u001B[38;5;124;43mms-marco-MultiBERT-L-12\u001B[39;49m\u001B[38;5;124;43m\"\u001B[39;49m\u001B[43m)\u001B[49m\n\u001B[0;32m     11\u001B[0m \u001B[38;5;66;03m# Initialize the ContextualCompressioinRetriever\u001B[39;00m\n\u001B[0;32m     12\u001B[0m compression_retriever \u001B[38;5;241m=\u001B[39m ContextualCompressionRetriever(\n\u001B[0;32m     13\u001B[0m     base_compressor\u001B[38;5;241m=\u001B[39mcompressor, base_retriever\u001B[38;5;241m=\u001B[39mretriever\n\u001B[0;32m     14\u001B[0m )\n",
+      "    \u001B[1;31m[... skipping hidden 1 frame]\u001B[0m\n",
+      "File \u001B[1;32m~\\AppData\\Local\\pypoetry\\Cache\\virtualenvs\\langchain-opentutorial-GHgbjDj7-py3.11\\Lib\\site-packages\\pydantic\\_internal\\_mock_val_ser.py:99\u001B[0m, in \u001B[0;36mMockValSer.__getattr__\u001B[1;34m(self, item)\u001B[0m\n\u001B[0;32m     97\u001B[0m \u001B[38;5;66;03m# raise an AttributeError if `item` doesn't exist\u001B[39;00m\n\u001B[0;32m     98\u001B[0m \u001B[38;5;28mgetattr\u001B[39m(\u001B[38;5;28mself\u001B[39m\u001B[38;5;241m.\u001B[39m_val_or_ser, item)\n\u001B[1;32m---> 99\u001B[0m \u001B[38;5;28;01mraise\u001B[39;00m PydanticUserError(\u001B[38;5;28mself\u001B[39m\u001B[38;5;241m.\u001B[39m_error_message, code\u001B[38;5;241m=\u001B[39m\u001B[38;5;28mself\u001B[39m\u001B[38;5;241m.\u001B[39m_code)\n",
+      "\u001B[1;31mPydanticUserError\u001B[0m: `FlashrankRerank` is not fully defined; you should define `Ranker`, then call `FlashrankRerank.model_rebuild()`.\n\nFor further information visit https://errors.pydantic.dev/2.9/u/class-not-fully-defined"
+     ]
+    }
+   ],
+   "execution_count": 6
+  },
+  {
+   "metadata": {},
+   "cell_type": "markdown",
+   "source": "Compare the results after reanker is applied.",
+   "id": "4a147fc787860bac"
+  },
+  {
+   "metadata": {},
+   "cell_type": "code",
+   "source": [
+    "# Print the results of document compressions\n",
+    "pretty_print_docs(compressed_docs)"
+   ],
+   "id": "732f27a4e8b3d4cd",
+   "outputs": [],
+   "execution_count": null
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 2
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython2",
+   "version": "2.7.6"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/11-Reranker/data/appendix-keywords.txt b/11-Reranker/data/appendix-keywords.txt
new file mode 100644
index 000000000..940a19186
--- /dev/null
+++ b/11-Reranker/data/appendix-keywords.txt
@@ -0,0 +1,153 @@
+Semantic Search
+Definition: Semantic search is a search technique that understands the meaning of a user's query beyond simple keyword matching, returning results that are contextually relevant.
+Example: If a user searches for "planets in the solar system," the system provides information about planets like Jupiter and Mars.
+Related Keywords: Natural Language Processing (NLP), Search Algorithms, Data Mining
+
+Embedding
+Definition: Embedding is the process of converting textual data, such as words or sentences, into low-dimensional continuous vectors that computers can process and understand.
+Example: The word "apple" can be represented as a vector like [0.65, -0.23, 0.17].
+Related Keywords: Natural Language Processing (NLP), Vectorization, Deep Learning
+
+Token
+Definition: A token refers to a smaller unit of text obtained by splitting a larger piece of text. It can be a word, phrase, or sentence.
+Example: The sentence "I go to school" can be tokenized into "I," "go," "to," and "school."
+Related Keywords: Tokenization, Natural Language Processing (NLP), Syntax Analysis
+
+Tokenizer
+Definition: A tokenizer is a tool that splits text data into tokens, often used for preprocessing in natural language processing tasks.
+Example: The sentence "I love programming." is tokenized into ["I", "love", "programming", "."].
+Related Keywords: Tokenization, Natural Language Processing (NLP), Syntax Analysis.
+
+VectorStore
+Definition: A VectorStore is a system designed to store data in vector format, enabling efficient retrieval, classification, and analysis tasks.
+Example: Storing word embedding vectors in a database for quick access during semantic search.
+Related Keywords: Embedding, Database, Vectorization
+
+SQL
+Definition: SQL (Structured Query Language) is a programming language for managing data in databases. 
+It allows you to perform various operations such as querying, updating, inserting, and deleting data.
+Example: SELECT * FROM users WHERE age > 18; retrieves information about users aged above 18.
+Related Keywords: Database, Query, Data Management
+
+CSV
+Definition: CSV (Comma-Separated Values) is a file format used for storing tabular data, where each value is separated by a comma.
+Example: A CSV file with headers "Name, Age, Occupation" may contain data like "John, 30, Developer."
+Related Keywords: Data Format, File Processing, Data Exchange
+
+JSON
+Definition: JSON (JavaScript Object Notation) is a lightweight data-interchange format that represents data objects using readable text for both humans and machines.
+Example: {"Name": John", " Age": 30, " Occupation ": "Developer"} is a JSON object.
+Related Keywords: Data Exchange, Web Development, API
+
+Transformer
+Definition: A Transformer is a type of deep learning model widely used in natural language processing tasks like translation, summarization, and text generation. It is based on the Attention mechanism.
+Example: Google Translate utilizes a Transformer model for multilingual translation.
+Related Keywords: Deep Learning, Natural Language Processing (NLP), Attention mechanism
+
+HuggingFace
+Definition: HuggingFace is a library offering pre-trained models and tools for natural language processing, making NLP tasks accessible to researchers and developers.
+Example: HuggingFace's Transformers library can be used for sentiment analysis and text generation.
+Related Keywords: Natural Language Processing (NLP), Deep Learning, Library.
+
+Digital Transformation
+Definition: Digital transformation refers to the integration of technology to innovate services, culture, and operations within a company, enhancing competitiveness and business models.
+Example: A company adopting cloud computing to revolutionize data storage and processing demonstrates digital transformation.
+Related Keywords: Innovation, Technology, Business Model
+
+Crawling
+Definition: Crawling is the automated process of visiting web pages to gather data, commonly used for search engine optimization and data analysis.
+Example: Google Search Engine crawls websites to collect and index content.
+Related Keywords: Data Collection, Web Scraping, Search Engine
+
+Word2Vec
+Definition: Word2Vec is a technique in NLP that maps words to a vector space, representing their semantic relationships based on context.
+Example: In a Word2Vec model, "king" and "queen" are represented by vectors located close to each other.
+Related Keywords: Natural Language Processing (NLP), Embedding, Semantic Similarity
+
+LLM (Large Language Model)
+Definition: LLMs are massive language models trained on large-scale text data, used for various natural language understanding and generation tasks.
+Example: OpenAI's GPT series is a prominent example of LLMs.
+Related Keywords: Natural Language Processing (NLP), Deep Learning, Text Generation
+
+FAISS (Facebook AI Similarity Search)
+Definition: FAISS is a high-speed similarity search library developed by Facebook, optimized for searching large sets of vectors efficiently.
+Example: FAISS can quickly find similar images among millions of image vectors.
+Related Keywords: Vector Search, Machine Learning, Database Optimization
+
+Open Source
+Definition: Open source software allows its source code to be freely used, modified, and distributed, fostering collaboration and innovation.
+Example: The Linux operating system is a well-known open source project.
+Related Keywords: Software Development, Community, Technical Collaboration
+Structured Data
+Definition: Structured data is organized according to a specific format or schema, making it easy to search and analyze.
+Example: A customer information table in a relational database is an example of structured data.
+Related Keywords: Database, Data Analysis, Data Modeling
+
+Parser
+Definition: A parser analyzes input data (text, files, etc.) and converts it into a structured format, often used in programming language syntax analysis or file processing.
+Example: Parsing an HTML document to generate its DOM structure is an instance of parsing.
+Related Keywords: Syntax Analysis, Compiler, Data Processing
+
+TF-IDF (Term Frequency-Inverse Document Frequency)
+Definition: TF-IDF is a statistical measure used to evaluate the importance of a word within a document by considering its frequency and rarity across a corpus.
+Example: Words with high TF-IDF values are often unique and critical for understanding the document.
+Related Keywords: Natural Language Processing (NLP), Information Retrieval, Data Mining
+
+Deep Learning
+Definition: Deep learning is a subset of machine learning that uses neural networks to solve complex problems, focusing on learning high-level representations from data.
+Example: Deep learning models are used for tasks like image recognition, speech recognition, and NLP.
+Related Keywords: Artificial Neural Networks, Machine Learning, Data Analysis
+
+Schema
+Definition: A schema defines the structure of a database or file, detailing how data is organized and stored.
+Example: A relational database schema specifies column names, data types, and key constraints.
+Related Keywords: Database, Data Modeling, Data Management
+
+DataFrame
+Definition: A DataFrame is a tabular data structure with rows and columns, commonly used for data manipulation and analysis.
+Example: In Python's Pandas library, a DataFrame can contain diverse data types and support various data operations.
+Related Keywords: Data Analysis, Pandas, Data Processing
+
+Attention Mechanism
+Definition: 
+The Attention mechanism is a technique in deep learning that allows models to focus more on important information. It is primarily used in processing sequential data such as text and time series.
+Example: 
+In translation models, the Attention mechanism helps the model focus on relevant parts of the input sentence to generate accurate translations.
+Related Keywords: Deep Learning, Natural Language Processing, Sequence Modeling
+
+Pandas
+Definition: Pandas is a Python library offering tools for efficient data manipulation and analysis. It simplifies complex data operations.
+Example: Pandas can be used to load, clean, and analyze CSV files.
+Related Keywords: Data Analysis, Python, Data Processing
+
+GPT (Generative Pretrained Transformer)
+Definition: GPT is a generative language model pre-trained on vast datasets, capable of performing various text-based tasks. It generates natural and coherent text based on input.
+Example: A chatbot generating detailed answers to user queries is powered by GPT models.
+Related Keywords: Natural Language Processing (NLP), Text Generation, Deep Learning
+
+InstructGPT
+Definition: 
+InstructGPT is an optimized GPT model designed to perform specific tasks based on user instructions. It is built to generate more accurate and relevant results in response to given commands.
+Example: When a user provides a specific instruction like "Draft an email," InstructGPT generates an email based on the provided content.
+Related Keywords: Artificial Intelligence, Natural Language Understanding, Command-Based Processing
+
+Keyword Search
+Definition: Keyword search involves finding information based on user-inputted keywords, commonly used in search engines and database systems.
+Example: Searching 
+When a user searches for "coffee shops in Seoul," the system returns a list of relevant coffee shops.
+Related Keywords: Search Engine, Data Search, Information Retrieval
+
+Page Rank
+Definition: Page Rank is an algorithm for evaluating the importance of web pages, primarily used to rank search engine results. It analyzes the link structure of websites.
+Example: Google uses Page Rank to determine the order of search results.
+Related Keywords: Search Engine Optimization, Web Analytics, Link Analysis
+
+Data Mining
+Definition: Data mining is the process of extracting useful information from large datasets using techniques like statistics, machine learning, and pattern recognition.
+Example: Retailers analyzing customer purchase data to devise sales strategies is an application of data mining.
+Related Keywords: Big Data, Pattern Recognition, Predictive Analytics
+
+Multimodal
+Definition: Multimodal refers to combining and processing multiple types of data (e.g., text, images, and sound) to extract richer insights and predictions.
+Example: A system analyzing both images and captions to perform accurate image classification demonstrates multimodal technology.
+Related Keywords: Data Fusion, Artificial Intelligence, Deep Learning
\ No newline at end of file

From 45fe3d2bc94b075fbe3e001077cddbd041db5dfc Mon Sep 17 00:00:00 2001
From: Cha Hwa Young <chahwayoung214@gmail.com>
Date: Thu, 9 Jan 2025 00:13:49 +0900
Subject: [PATCH 2/8] [E-2]11-Reranker/04-FlashRank-Reranker

---
 11-Reranker/04-FlashRank-Reranker.ipynb | 182 +++---------------------
 1 file changed, 17 insertions(+), 165 deletions(-)

diff --git a/11-Reranker/04-FlashRank-Reranker.ipynb b/11-Reranker/04-FlashRank-Reranker.ipynb
index 9d2922df4..f3c9d818c 100644
--- a/11-Reranker/04-FlashRank-Reranker.ipynb
+++ b/11-Reranker/04-FlashRank-Reranker.ipynb
@@ -4,6 +4,7 @@
    "metadata": {},
    "cell_type": "markdown",
    "source": [
+    "\n",
     "# FlashRank Reranker\n",
     "\n",
     "- Author: [Hwayoung Cha](https://github.com/forwardyoung)\n",
@@ -40,12 +41,7 @@
    "id": "c7431102d93a694f"
   },
   {
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2025-01-08T14:53:36.107286Z",
-     "start_time": "2025-01-08T14:53:35.957725Z"
-    }
-   },
+   "metadata": {},
    "cell_type": "code",
    "source": [
     "# Set environment variables\n",
@@ -58,16 +54,8 @@
     ")"
    ],
    "id": "501e9dfa010f326a",
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Environment variables have been set successfully.\n"
-     ]
-    }
-   ],
-   "execution_count": 1
+   "outputs": [],
+   "execution_count": null
   },
   {
    "metadata": {},
@@ -80,12 +68,7 @@
    "id": "7d83ee066d91fb4f"
   },
   {
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2025-01-08T14:53:36.123222Z",
-     "start_time": "2025-01-08T14:53:36.108289Z"
-    }
-   },
+   "metadata": {},
    "cell_type": "code",
    "source": [
     "# Configuration file to manage API keys as environment variables\n",
@@ -95,27 +78,11 @@
     "load_dotenv(override=True)"
    ],
    "id": "abed94e9253ec29e",
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "True"
-      ]
-     },
-     "execution_count": 2,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "execution_count": 2
+   "outputs": [],
+   "execution_count": null
   },
   {
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2025-01-08T14:53:36.127689Z",
-     "start_time": "2025-01-08T14:53:36.124407Z"
-    }
-   },
+   "metadata": {},
    "cell_type": "code",
    "source": [
     "# install\n",
@@ -123,15 +90,10 @@
    ],
    "id": "e31774e423dd76fb",
    "outputs": [],
-   "execution_count": 3
+   "execution_count": null
   },
   {
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2025-01-08T14:53:36.133428Z",
-     "start_time": "2025-01-08T14:53:36.128691Z"
-    }
-   },
+   "metadata": {},
    "cell_type": "code",
    "source": [
     "def pretty_print_docs(docs):\n",
@@ -146,7 +108,7 @@
    ],
    "id": "43856bcf1e8f0c63",
    "outputs": [],
-   "execution_count": 4
+   "execution_count": null
   },
   {
    "metadata": {},
@@ -159,12 +121,7 @@
    "id": "1c7d03faa97bf809"
   },
   {
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2025-01-08T14:53:41.320899Z",
-     "start_time": "2025-01-08T14:53:36.134653Z"
-    }
-   },
+   "metadata": {},
    "cell_type": "code",
    "source": [
     "from langchain_community.document_loaders import TextLoader\n",
@@ -200,94 +157,8 @@
     "pretty_print_docs(docs)"
    ],
    "id": "79d934121fd476be",
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Document 1:\n",
-      "\n",
-      "Word2Vec\n",
-      "Definition: Word2Vec is a technique in NLP that maps words to a vector space, representing their semantic relationships based on context.\n",
-      "Example: In a Word2Vec model, \"king\" and \"queen\" are represented by vectors located close to each other.\n",
-      "Related Keywords: Natural Language Processing (NLP), Embedding, Semantic Similarity\n",
-      "Metadata: {'source': './data/appendix-keywords.txt', 'id': 12}\n",
-      "----------------------------------------------------------------------------------------------------\n",
-      "Document 2:\n",
-      "\n",
-      "Embedding\n",
-      "Definition: Embedding is the process of converting textual data, such as words or sentences, into low-dimensional continuous vectors that computers can process and understand.\n",
-      "Example: The word \"apple\" can be represented as a vector like [0.65, -0.23, 0.17].\n",
-      "Related Keywords: Natural Language Processing (NLP), Vectorization, Deep Learning\n",
-      "Metadata: {'source': './data/appendix-keywords.txt', 'id': 1}\n",
-      "----------------------------------------------------------------------------------------------------\n",
-      "Document 3:\n",
-      "\n",
-      "VectorStore\n",
-      "Definition: A VectorStore is a system designed to store data in vector format, enabling efficient retrieval, classification, and analysis tasks.\n",
-      "Example: Storing word embedding vectors in a database for quick access during semantic search.\n",
-      "Related Keywords: Embedding, Database, Vectorization\n",
-      "Metadata: {'source': './data/appendix-keywords.txt', 'id': 4}\n",
-      "----------------------------------------------------------------------------------------------------\n",
-      "Document 4:\n",
-      "\n",
-      "TF-IDF (Term Frequency-Inverse Document Frequency)\n",
-      "Definition: TF-IDF is a statistical measure used to evaluate the importance of a word within a document by considering its frequency and rarity across a corpus.\n",
-      "Example: Words with high TF-IDF values are often unique and critical for understanding the document.\n",
-      "Related Keywords: Natural Language Processing (NLP), Information Retrieval, Data Mining\n",
-      "Metadata: {'source': './data/appendix-keywords.txt', 'id': 18}\n",
-      "----------------------------------------------------------------------------------------------------\n",
-      "Document 5:\n",
-      "\n",
-      "GPT (Generative Pretrained Transformer)\n",
-      "Definition: GPT is a generative language model pre-trained on vast datasets, capable of performing various text-based tasks. It generates natural and coherent text based on input.\n",
-      "Example: A chatbot generating detailed answers to user queries is powered by GPT models.\n",
-      "Related Keywords: Natural Language Processing (NLP), Text Generation, Deep Learning\n",
-      "Metadata: {'source': './data/appendix-keywords.txt', 'id': 24}\n",
-      "----------------------------------------------------------------------------------------------------\n",
-      "Document 6:\n",
-      "\n",
-      "Transformer\n",
-      "Definition: A Transformer is a type of deep learning model widely used in natural language processing tasks like translation, summarization, and text generation. It is based on the Attention mechanism.\n",
-      "Example: Google Translate utilizes a Transformer model for multilingual translation.\n",
-      "Related Keywords: Deep Learning, Natural Language Processing (NLP), Attention mechanism\n",
-      "Metadata: {'source': './data/appendix-keywords.txt', 'id': 8}\n",
-      "----------------------------------------------------------------------------------------------------\n",
-      "Document 7:\n",
-      "\n",
-      "LLM (Large Language Model)\n",
-      "Definition: LLMs are massive language models trained on large-scale text data, used for various natural language understanding and generation tasks.\n",
-      "Example: OpenAI's GPT series is a prominent example of LLMs.\n",
-      "Related Keywords: Natural Language Processing (NLP), Deep Learning, Text Generation\n",
-      "Metadata: {'source': './data/appendix-keywords.txt', 'id': 13}\n",
-      "----------------------------------------------------------------------------------------------------\n",
-      "Document 8:\n",
-      "\n",
-      "HuggingFace\n",
-      "Definition: HuggingFace is a library offering pre-trained models and tools for natural language processing, making NLP tasks accessible to researchers and developers.\n",
-      "Example: HuggingFace's Transformers library can be used for sentiment analysis and text generation.\n",
-      "Related Keywords: Natural Language Processing (NLP), Deep Learning, Library.\n",
-      "Metadata: {'source': './data/appendix-keywords.txt', 'id': 9}\n",
-      "----------------------------------------------------------------------------------------------------\n",
-      "Document 9:\n",
-      "\n",
-      "Tokenizer\n",
-      "Definition: A tokenizer is a tool that splits text data into tokens, often used for preprocessing in natural language processing tasks.\n",
-      "Example: The sentence \"I love programming.\" is tokenized into [\"I\", \"love\", \"programming\", \".\"].\n",
-      "Related Keywords: Tokenization, Natural Language Processing (NLP), Syntax Analysis.\n",
-      "Metadata: {'source': './data/appendix-keywords.txt', 'id': 3}\n",
-      "----------------------------------------------------------------------------------------------------\n",
-      "Document 10:\n",
-      "\n",
-      "Semantic Search\n",
-      "Definition: Semantic search is a search technique that understands the meaning of a user's query beyond simple keyword matching, returning results that are contextually relevant.\n",
-      "Example: If a user searches for \"planets in the solar system,\" the system provides information about planets like Jupiter and Mars.\n",
-      "Related Keywords: Natural Language Processing (NLP), Search Algorithms, Data Mining\n",
-      "Metadata: {'source': './data/appendix-keywords.txt', 'id': 0}\n"
-     ]
-    }
-   ],
-   "execution_count": 5
+   "outputs": [],
+   "execution_count": null
   },
   {
    "metadata": {},
@@ -296,12 +167,7 @@
    "id": "ea07e244c9171d26"
   },
   {
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2025-01-08T14:53:42.781060Z",
-     "start_time": "2025-01-08T14:53:41.323926Z"
-    }
-   },
+   "metadata": {},
    "cell_type": "code",
    "source": [
     "from langchain.retrievers import ContextualCompressionRetriever\n",
@@ -328,22 +194,8 @@
     "print([doc.metadata[\"id\"] for doc in compressed_docs])"
    ],
    "id": "23a21f9f025132c5",
-   "outputs": [
-    {
-     "ename": "PydanticUserError",
-     "evalue": "`FlashrankRerank` is not fully defined; you should define `Ranker`, then call `FlashrankRerank.model_rebuild()`.\n\nFor further information visit https://errors.pydantic.dev/2.9/u/class-not-fully-defined",
-     "output_type": "error",
-     "traceback": [
-      "\u001B[1;31m---------------------------------------------------------------------------\u001B[0m",
-      "\u001B[1;31mPydanticUserError\u001B[0m                         Traceback (most recent call last)",
-      "Cell \u001B[1;32mIn[6], line 9\u001B[0m\n\u001B[0;32m      6\u001B[0m llm \u001B[38;5;241m=\u001B[39m ChatOpenAI(temperature\u001B[38;5;241m=\u001B[39m\u001B[38;5;241m0\u001B[39m)\n\u001B[0;32m      8\u001B[0m \u001B[38;5;66;03m# Initialize the FlshrankRerank\u001B[39;00m\n\u001B[1;32m----> 9\u001B[0m compressor \u001B[38;5;241m=\u001B[39m \u001B[43mFlashrankRerank\u001B[49m\u001B[43m(\u001B[49m\u001B[43mmodel\u001B[49m\u001B[38;5;241;43m=\u001B[39;49m\u001B[38;5;124;43m\"\u001B[39;49m\u001B[38;5;124;43mms-marco-MultiBERT-L-12\u001B[39;49m\u001B[38;5;124;43m\"\u001B[39;49m\u001B[43m)\u001B[49m\n\u001B[0;32m     11\u001B[0m \u001B[38;5;66;03m# Initialize the ContextualCompressioinRetriever\u001B[39;00m\n\u001B[0;32m     12\u001B[0m compression_retriever \u001B[38;5;241m=\u001B[39m ContextualCompressionRetriever(\n\u001B[0;32m     13\u001B[0m     base_compressor\u001B[38;5;241m=\u001B[39mcompressor, base_retriever\u001B[38;5;241m=\u001B[39mretriever\n\u001B[0;32m     14\u001B[0m )\n",
-      "    \u001B[1;31m[... skipping hidden 1 frame]\u001B[0m\n",
-      "File \u001B[1;32m~\\AppData\\Local\\pypoetry\\Cache\\virtualenvs\\langchain-opentutorial-GHgbjDj7-py3.11\\Lib\\site-packages\\pydantic\\_internal\\_mock_val_ser.py:99\u001B[0m, in \u001B[0;36mMockValSer.__getattr__\u001B[1;34m(self, item)\u001B[0m\n\u001B[0;32m     97\u001B[0m \u001B[38;5;66;03m# raise an AttributeError if `item` doesn't exist\u001B[39;00m\n\u001B[0;32m     98\u001B[0m \u001B[38;5;28mgetattr\u001B[39m(\u001B[38;5;28mself\u001B[39m\u001B[38;5;241m.\u001B[39m_val_or_ser, item)\n\u001B[1;32m---> 99\u001B[0m \u001B[38;5;28;01mraise\u001B[39;00m PydanticUserError(\u001B[38;5;28mself\u001B[39m\u001B[38;5;241m.\u001B[39m_error_message, code\u001B[38;5;241m=\u001B[39m\u001B[38;5;28mself\u001B[39m\u001B[38;5;241m.\u001B[39m_code)\n",
-      "\u001B[1;31mPydanticUserError\u001B[0m: `FlashrankRerank` is not fully defined; you should define `Ranker`, then call `FlashrankRerank.model_rebuild()`.\n\nFor further information visit https://errors.pydantic.dev/2.9/u/class-not-fully-defined"
-     ]
-    }
-   ],
-   "execution_count": 6
+   "outputs": [],
+   "execution_count": null
   },
   {
    "metadata": {},

From ebdd2ae48dc8b8f26d74765a6837d809ae0033c0 Mon Sep 17 00:00:00 2001
From: Cha Hwa Young <chahwayoung214@gmail.com>
Date: Thu, 9 Jan 2025 00:17:51 +0900
Subject: [PATCH 3/8] [E-2]11-Reranker/04-FlashRank-Reranker

---
 11-Reranker/04-FlashRank-Reranker.ipynb | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/11-Reranker/04-FlashRank-Reranker.ipynb b/11-Reranker/04-FlashRank-Reranker.ipynb
index f3c9d818c..6fd3b4bd2 100644
--- a/11-Reranker/04-FlashRank-Reranker.ipynb
+++ b/11-Reranker/04-FlashRank-Reranker.ipynb
@@ -22,7 +22,8 @@
     "### Table of Contents\n",
     "\n",
     "- [Overview](#overview)\n",
-    "- [Environement Setup](#environment-setup)"
+    "- [Environement Setup](#environment-setup)\n",
+    "- [FlashRankRerank](#flashrankrerank)"
    ],
    "id": "c69d1f48d21cd2b4"
   },

From 1b8d6bffeb6c2dc1e98c95b67367731ae5338f3d Mon Sep 17 00:00:00 2001
From: Cha Hwa Young <chahwayoung214@gmail.com>
Date: Thu, 9 Jan 2025 00:29:18 +0900
Subject: [PATCH 4/8] [E-2]11-Reranker/04-FlashRank-Reranker

---
 11-Reranker/04-FlashRank-Reranker.ipynb | 139 ++++++++++++++----------
 1 file changed, 81 insertions(+), 58 deletions(-)

diff --git a/11-Reranker/04-FlashRank-Reranker.ipynb b/11-Reranker/04-FlashRank-Reranker.ipynb
index 6fd3b4bd2..61cd90c19 100644
--- a/11-Reranker/04-FlashRank-Reranker.ipynb
+++ b/11-Reranker/04-FlashRank-Reranker.ipynb
@@ -1,8 +1,9 @@
 {
  "cells": [
   {
-   "metadata": {},
    "cell_type": "markdown",
+   "id": "c69d1f48d21cd2b4",
+   "metadata": {},
    "source": [
     "\n",
     "# FlashRank Reranker\n",
@@ -24,12 +25,12 @@
     "- [Overview](#overview)\n",
     "- [Environement Setup](#environment-setup)\n",
     "- [FlashRankRerank](#flashrankrerank)"
-   ],
-   "id": "c69d1f48d21cd2b4"
+   ]
   },
   {
-   "metadata": {},
    "cell_type": "markdown",
+   "id": "c7431102d93a694f",
+   "metadata": {},
    "source": [
     "## Environment Setup\n",
     "\n",
@@ -38,12 +39,14 @@
     "**[Note]**\n",
     "- `langchain-opentutorial` is a package that provides a set of easy-to-use environment setup, useful functions and utilities for tutorials. \n",
     "- You can checkout the [`langchain-opentutorial`](https://github.com/LangChain-OpenTutorial/langchain-opentutorial-pypi) for more details."
-   ],
-   "id": "c7431102d93a694f"
+   ]
   },
   {
-   "metadata": {},
    "cell_type": "code",
+   "execution_count": null,
+   "id": "501e9dfa010f326a",
+   "metadata": {},
+   "outputs": [],
    "source": [
     "# Set environment variables\n",
     "from langchain_opentutorial import set_env\n",
@@ -53,49 +56,68 @@
     "        \"OPENAI_API_KEY\": \"\",\n",
     "    }\n",
     ")"
-   ],
-   "id": "501e9dfa010f326a",
-   "outputs": [],
-   "execution_count": null
+   ]
   },
   {
-   "metadata": {},
    "cell_type": "markdown",
+   "id": "7d83ee066d91fb4f",
+   "metadata": {},
    "source": [
     "You can alternatively set OPENAI_API_KEY in .env file and load it.\n",
     "\n",
     "[Note] This is not necessary if you've already set OPENAI_API_KEY in previous steps."
-   ],
-   "id": "7d83ee066d91fb4f"
+   ]
   },
   {
-   "metadata": {},
    "cell_type": "code",
+   "execution_count": null,
+   "id": "abed94e9253ec29e",
+   "metadata": {},
+   "outputs": [],
    "source": [
     "# Configuration file to manage API keys as environment variables\n",
     "from dotenv import load_dotenv\n",
     "\n",
     "# Load API key information\n",
     "load_dotenv(override=True)"
-   ],
-   "id": "abed94e9253ec29e",
-   "outputs": [],
-   "execution_count": null
+   ]
   },
   {
-   "metadata": {},
    "cell_type": "code",
-   "source": [
-    "# install\n",
-    "# !pip install -qU flashrank"
-   ],
-   "id": "e31774e423dd76fb",
+   "execution_count": 7,
+   "id": "687b4939",
+   "metadata": {},
    "outputs": [],
-   "execution_count": null
+   "source": [
+    "%%capture --no-stderr\n",
+    "%pip install langchain-opentutorial"
+   ]
   },
   {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "af16502c",
    "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Install required packages\n",
+    "from langchain_opentutorial import package\n",
+    "\n",
+    "package.install(\n",
+    "    [\n",
+    "        \"flashrank\"\n",
+    "    ],\n",
+    "    verbose=False,\n",
+    "    upgrade=False,\n",
+    ")"
+   ]
+  },
+  {
    "cell_type": "code",
+   "execution_count": 5,
+   "id": "43856bcf1e8f0c63",
+   "metadata": {},
+   "outputs": [],
    "source": [
     "def pretty_print_docs(docs):\n",
     "    print(\n",
@@ -106,24 +128,24 @@
     "            ]\n",
     "        )\n",
     "    )"
-   ],
-   "id": "43856bcf1e8f0c63",
-   "outputs": [],
-   "execution_count": null
+   ]
   },
   {
-   "metadata": {},
    "cell_type": "markdown",
+   "id": "1c7d03faa97bf809",
+   "metadata": {},
    "source": [
     "## FlashrankRerank\n",
     "\n",
     "Load data for a simple example and create a retriever."
-   ],
-   "id": "1c7d03faa97bf809"
+   ]
   },
   {
-   "metadata": {},
    "cell_type": "code",
+   "execution_count": null,
+   "id": "79d934121fd476be",
+   "metadata": {},
+   "outputs": [],
    "source": [
     "from langchain_community.document_loaders import TextLoader\n",
     "from langchain_community.vectorstores import FAISS\n",
@@ -156,20 +178,22 @@
     "\n",
     "# Print the document\n",
     "pretty_print_docs(docs)"
-   ],
-   "id": "79d934121fd476be",
-   "outputs": [],
-   "execution_count": null
+   ]
   },
   {
-   "metadata": {},
    "cell_type": "markdown",
-   "source": "Now, let's wrap the base `retriever` with a `ContextualCompressionRetriever` and use `FlashrankRerank` as the compressor.",
-   "id": "ea07e244c9171d26"
+   "id": "ea07e244c9171d26",
+   "metadata": {},
+   "source": [
+    "Now, let's wrap the base `retriever` with a `ContextualCompressionRetriever` and use `FlashrankRerank` as the compressor."
+   ]
   },
   {
-   "metadata": {},
    "cell_type": "code",
+   "execution_count": null,
+   "id": "23a21f9f025132c5",
+   "metadata": {},
+   "outputs": [],
    "source": [
     "from langchain.retrievers import ContextualCompressionRetriever\n",
     "from langchain.retrievers.document_compressors import FlashrankRerank\n",
@@ -193,46 +217,45 @@
     "\n",
     "# Print the document ID\n",
     "print([doc.metadata[\"id\"] for doc in compressed_docs])"
-   ],
-   "id": "23a21f9f025132c5",
-   "outputs": [],
-   "execution_count": null
+   ]
   },
   {
-   "metadata": {},
    "cell_type": "markdown",
-   "source": "Compare the results after reanker is applied.",
-   "id": "4a147fc787860bac"
+   "id": "4a147fc787860bac",
+   "metadata": {},
+   "source": [
+    "Compare the results after reanker is applied."
+   ]
   },
   {
-   "metadata": {},
    "cell_type": "code",
+   "execution_count": null,
+   "id": "732f27a4e8b3d4cd",
+   "metadata": {},
+   "outputs": [],
    "source": [
     "# Print the results of document compressions\n",
     "pretty_print_docs(compressed_docs)"
-   ],
-   "id": "732f27a4e8b3d4cd",
-   "outputs": [],
-   "execution_count": null
+   ]
   }
  ],
  "metadata": {
   "kernelspec": {
-   "display_name": "Python 3",
+   "display_name": "langchain-opentutorial-GHgbjDj7-py3.11",
    "language": "python",
    "name": "python3"
   },
   "language_info": {
    "codemirror_mode": {
     "name": "ipython",
-    "version": 2
+    "version": 3
    },
    "file_extension": ".py",
    "mimetype": "text/x-python",
    "name": "python",
    "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython2",
-   "version": "2.7.6"
+   "pygments_lexer": "ipython3",
+   "version": "3.11.3"
   }
  },
  "nbformat": 4,

From 761f5056564908009758cf2274e1e52bd1317404 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=EB=B0=B0=EA=B8=B0=EB=AF=BC?=
 <53887180+BAEM1N@users.noreply.github.com>
Date: Thu, 9 Jan 2025 19:39:22 +0900
Subject: [PATCH 5/8] [I] requirements.txt / update requirements.txt

Restrict python-magic-bin installation to non-macOS platforms in requirements.txt
---
 requirements.txt | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/requirements.txt b/requirements.txt
index cf2daac52..ec38759e8 100644
--- a/requirements.txt
+++ b/requirements.txt
@@ -30,7 +30,8 @@ langchain-neo4j
 langchain-mongodb
 fastembed
 certifi
-python-magic-bin
 pymongo
 langchain_qdrant
 
+# python-magic-bin 설치 제한 (macOS에서 제외)
+python-magic-bin; sys_platform != "darwin"

From 6a5ffe1f3a15a37fff5169074c0a49bee071764e Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=EB=B0=B0=EA=B8=B0=EB=AF=BC?=
 <53887180+BAEM1N@users.noreply.github.com>
Date: Thu, 9 Jan 2025 19:52:01 +0900
Subject: [PATCH 6/8] [I] requirements.txt / revert requirements.txt

Revert commit due to incorrect branch application
---
 requirements.txt | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/requirements.txt b/requirements.txt
index ec38759e8..e000b026d 100644
--- a/requirements.txt
+++ b/requirements.txt
@@ -30,8 +30,6 @@ langchain-neo4j
 langchain-mongodb
 fastembed
 certifi
+python-magic-bin
 pymongo
 langchain_qdrant
-
-# python-magic-bin 설치 제한 (macOS에서 제외)
-python-magic-bin; sys_platform != "darwin"

From 1646a7505aeb3f740d7ce2147bd4c43829252aff Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=EB=B0=B0=EA=B8=B0=EB=AF=BC?=
 <53887180+BAEM1N@users.noreply.github.com>
Date: Thu, 9 Jan 2025 19:53:53 +0900
Subject: [PATCH 7/8] [I] requirements.txt / revert requirements.txt


From b4604023f8d1d32009f56059d241df55af44acb5 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=EB=B0=B0=EA=B8=B0=EB=AF=BC?=
 <53887180+BAEM1N@users.noreply.github.com>
Date: Thu, 9 Jan 2025 19:54:21 +0900
Subject: [PATCH 8/8] [I] requirements.txt / revert requirements.txt