From cf7775bd2b3ccb5303f08c04fa45578401bb3d9d Mon Sep 17 00:00:00 2001 From: Cha Hwa Young Date: Wed, 8 Jan 2025 23:58:36 +0900 Subject: [PATCH 1/8] [E-2]11-Reranker/04-FlashRank-Reranker --- 11-Reranker/04-FlashRank-Reranker.ipynb | 387 ++++++++++++++++++++++++ 11-Reranker/data/appendix-keywords.txt | 153 ++++++++++ 2 files changed, 540 insertions(+) create mode 100644 11-Reranker/04-FlashRank-Reranker.ipynb create mode 100644 11-Reranker/data/appendix-keywords.txt diff --git a/11-Reranker/04-FlashRank-Reranker.ipynb b/11-Reranker/04-FlashRank-Reranker.ipynb new file mode 100644 index 000000000..9d2922df4 --- /dev/null +++ b/11-Reranker/04-FlashRank-Reranker.ipynb @@ -0,0 +1,387 @@ +{ + "cells": [ + { + "metadata": {}, + "cell_type": "markdown", + "source": [ + "# FlashRank Reranker\n", + "\n", + "- Author: [Hwayoung Cha](https://github.com/forwardyoung)\n", + "- Design: []()\n", + "- Peer Review: []()\n", + "\n", + "[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/langchain-ai/langchain-academy/blob/main/module-4/sub-graph.ipynb) [![Open in LangChain Academy](https://cdn.prod.website-files.com/65b8cd72835ceeacd4449a53/66e9eba12c7b7688aa3dbb5e_LCA-badge-green.svg)](https://academy.langchain.com/courses/take/intro-to-langgraph/lessons/58239937-lesson-2-sub-graphs)\n", + "\n", + "## Overview\n", + "\n", + "> [FlashRank](https://github.com/PrithivirajDamodaran/FlashRank) is an ultra-lightweight and ultra-fast Python library designed to add reranking to existing search and `retrieval` pipelines. It is based on state-of-the-art (`SoTA`) `cross-encoders`.\n", + "\n", + "This notebook introduces the use of `FlashRank-Reranker` within the LangChain framework, showcasing how to apply reranking techniques to improve the quality of search or `retrieval` results. It provides practical code examples and explanations for integrating `FlashRank` into a LangChain pipeline, highlighting its efficiency and effectiveness. The focus is on leveraging `FlashRank`'s capabilities to enhance the ranking of outputs in a streamlined and scalable way.\n", + "\n", + "### Table of Contents\n", + "\n", + "- [Overview](#overview)\n", + "- [Environement Setup](#environment-setup)" + ], + "id": "c69d1f48d21cd2b4" + }, + { + "metadata": {}, + "cell_type": "markdown", + "source": [ + "## Environment Setup\n", + "\n", + "Set up the environment. You may refer to [Environment Setup](https://wikidocs.net/257836) for more details.\n", + "\n", + "**[Note]**\n", + "- `langchain-opentutorial` is a package that provides a set of easy-to-use environment setup, useful functions and utilities for tutorials. \n", + "- You can checkout the [`langchain-opentutorial`](https://github.com/LangChain-OpenTutorial/langchain-opentutorial-pypi) for more details." + ], + "id": "c7431102d93a694f" + }, + { + "metadata": { + "ExecuteTime": { + "end_time": "2025-01-08T14:53:36.107286Z", + "start_time": "2025-01-08T14:53:35.957725Z" + } + }, + "cell_type": "code", + "source": [ + "# Set environment variables\n", + "from langchain_opentutorial import set_env\n", + "\n", + "set_env(\n", + " {\n", + " \"OPENAI_API_KEY\": \"\",\n", + " }\n", + ")" + ], + "id": "501e9dfa010f326a", + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Environment variables have been set successfully.\n" + ] + } + ], + "execution_count": 1 + }, + { + "metadata": {}, + "cell_type": "markdown", + "source": [ + "You can alternatively set OPENAI_API_KEY in .env file and load it.\n", + "\n", + "[Note] This is not necessary if you've already set OPENAI_API_KEY in previous steps." + ], + "id": "7d83ee066d91fb4f" + }, + { + "metadata": { + "ExecuteTime": { + "end_time": "2025-01-08T14:53:36.123222Z", + "start_time": "2025-01-08T14:53:36.108289Z" + } + }, + "cell_type": "code", + "source": [ + "# Configuration file to manage API keys as environment variables\n", + "from dotenv import load_dotenv\n", + "\n", + "# Load API key information\n", + "load_dotenv(override=True)" + ], + "id": "abed94e9253ec29e", + "outputs": [ + { + "data": { + "text/plain": [ + "True" + ] + }, + "execution_count": 2, + "metadata": {}, + "output_type": "execute_result" + } + ], + "execution_count": 2 + }, + { + "metadata": { + "ExecuteTime": { + "end_time": "2025-01-08T14:53:36.127689Z", + "start_time": "2025-01-08T14:53:36.124407Z" + } + }, + "cell_type": "code", + "source": [ + "# install\n", + "# !pip install -qU flashrank" + ], + "id": "e31774e423dd76fb", + "outputs": [], + "execution_count": 3 + }, + { + "metadata": { + "ExecuteTime": { + "end_time": "2025-01-08T14:53:36.133428Z", + "start_time": "2025-01-08T14:53:36.128691Z" + } + }, + "cell_type": "code", + "source": [ + "def pretty_print_docs(docs):\n", + " print(\n", + " f\"\\n{'-' * 100}\\n\".join(\n", + " [\n", + " f\"Document {i+1}:\\n\\n{d.page_content}\\nMetadata: {d.metadata}\"\n", + " for i, d in enumerate(docs)\n", + " ]\n", + " )\n", + " )" + ], + "id": "43856bcf1e8f0c63", + "outputs": [], + "execution_count": 4 + }, + { + "metadata": {}, + "cell_type": "markdown", + "source": [ + "## FlashrankRerank\n", + "\n", + "Load data for a simple example and create a retriever." + ], + "id": "1c7d03faa97bf809" + }, + { + "metadata": { + "ExecuteTime": { + "end_time": "2025-01-08T14:53:41.320899Z", + "start_time": "2025-01-08T14:53:36.134653Z" + } + }, + "cell_type": "code", + "source": [ + "from langchain_community.document_loaders import TextLoader\n", + "from langchain_community.vectorstores import FAISS\n", + "from langchain_text_splitters import RecursiveCharacterTextSplitter\n", + "from langchain_openai import OpenAIEmbeddings\n", + "\n", + "# Load the documents\n", + "documents = TextLoader(\"./data/appendix-keywords.txt\").load()\n", + "\n", + "# Initialized the text splitter\n", + "text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=100)\n", + "\n", + "# Split the documents\n", + "texts = text_splitter.split_documents(documents)\n", + "\n", + "# Add a unique ID to each text\n", + "for idx, text in enumerate(texts):\n", + " text.metadata[\"id\"] = idx\n", + " \n", + "# Initialize the retriever\n", + "retriever = FAISS.from_documents(\n", + " texts, OpenAIEmbeddings()\n", + ").as_retriever(search_kwargs={\"k\": 10})\n", + "\n", + "# query\n", + "query = \"Tell me about Word2Vec\"\n", + "\n", + "# Search for documents\n", + "docs = retriever.invoke(query)\n", + "\n", + "# Print the document\n", + "pretty_print_docs(docs)" + ], + "id": "79d934121fd476be", + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Document 1:\n", + "\n", + "Word2Vec\n", + "Definition: Word2Vec is a technique in NLP that maps words to a vector space, representing their semantic relationships based on context.\n", + "Example: In a Word2Vec model, \"king\" and \"queen\" are represented by vectors located close to each other.\n", + "Related Keywords: Natural Language Processing (NLP), Embedding, Semantic Similarity\n", + "Metadata: {'source': './data/appendix-keywords.txt', 'id': 12}\n", + "----------------------------------------------------------------------------------------------------\n", + "Document 2:\n", + "\n", + "Embedding\n", + "Definition: Embedding is the process of converting textual data, such as words or sentences, into low-dimensional continuous vectors that computers can process and understand.\n", + "Example: The word \"apple\" can be represented as a vector like [0.65, -0.23, 0.17].\n", + "Related Keywords: Natural Language Processing (NLP), Vectorization, Deep Learning\n", + "Metadata: {'source': './data/appendix-keywords.txt', 'id': 1}\n", + "----------------------------------------------------------------------------------------------------\n", + "Document 3:\n", + "\n", + "VectorStore\n", + "Definition: A VectorStore is a system designed to store data in vector format, enabling efficient retrieval, classification, and analysis tasks.\n", + "Example: Storing word embedding vectors in a database for quick access during semantic search.\n", + "Related Keywords: Embedding, Database, Vectorization\n", + "Metadata: {'source': './data/appendix-keywords.txt', 'id': 4}\n", + "----------------------------------------------------------------------------------------------------\n", + "Document 4:\n", + "\n", + "TF-IDF (Term Frequency-Inverse Document Frequency)\n", + "Definition: TF-IDF is a statistical measure used to evaluate the importance of a word within a document by considering its frequency and rarity across a corpus.\n", + "Example: Words with high TF-IDF values are often unique and critical for understanding the document.\n", + "Related Keywords: Natural Language Processing (NLP), Information Retrieval, Data Mining\n", + "Metadata: {'source': './data/appendix-keywords.txt', 'id': 18}\n", + "----------------------------------------------------------------------------------------------------\n", + "Document 5:\n", + "\n", + "GPT (Generative Pretrained Transformer)\n", + "Definition: GPT is a generative language model pre-trained on vast datasets, capable of performing various text-based tasks. It generates natural and coherent text based on input.\n", + "Example: A chatbot generating detailed answers to user queries is powered by GPT models.\n", + "Related Keywords: Natural Language Processing (NLP), Text Generation, Deep Learning\n", + "Metadata: {'source': './data/appendix-keywords.txt', 'id': 24}\n", + "----------------------------------------------------------------------------------------------------\n", + "Document 6:\n", + "\n", + "Transformer\n", + "Definition: A Transformer is a type of deep learning model widely used in natural language processing tasks like translation, summarization, and text generation. It is based on the Attention mechanism.\n", + "Example: Google Translate utilizes a Transformer model for multilingual translation.\n", + "Related Keywords: Deep Learning, Natural Language Processing (NLP), Attention mechanism\n", + "Metadata: {'source': './data/appendix-keywords.txt', 'id': 8}\n", + "----------------------------------------------------------------------------------------------------\n", + "Document 7:\n", + "\n", + "LLM (Large Language Model)\n", + "Definition: LLMs are massive language models trained on large-scale text data, used for various natural language understanding and generation tasks.\n", + "Example: OpenAI's GPT series is a prominent example of LLMs.\n", + "Related Keywords: Natural Language Processing (NLP), Deep Learning, Text Generation\n", + "Metadata: {'source': './data/appendix-keywords.txt', 'id': 13}\n", + "----------------------------------------------------------------------------------------------------\n", + "Document 8:\n", + "\n", + "HuggingFace\n", + "Definition: HuggingFace is a library offering pre-trained models and tools for natural language processing, making NLP tasks accessible to researchers and developers.\n", + "Example: HuggingFace's Transformers library can be used for sentiment analysis and text generation.\n", + "Related Keywords: Natural Language Processing (NLP), Deep Learning, Library.\n", + "Metadata: {'source': './data/appendix-keywords.txt', 'id': 9}\n", + "----------------------------------------------------------------------------------------------------\n", + "Document 9:\n", + "\n", + "Tokenizer\n", + "Definition: A tokenizer is a tool that splits text data into tokens, often used for preprocessing in natural language processing tasks.\n", + "Example: The sentence \"I love programming.\" is tokenized into [\"I\", \"love\", \"programming\", \".\"].\n", + "Related Keywords: Tokenization, Natural Language Processing (NLP), Syntax Analysis.\n", + "Metadata: {'source': './data/appendix-keywords.txt', 'id': 3}\n", + "----------------------------------------------------------------------------------------------------\n", + "Document 10:\n", + "\n", + "Semantic Search\n", + "Definition: Semantic search is a search technique that understands the meaning of a user's query beyond simple keyword matching, returning results that are contextually relevant.\n", + "Example: If a user searches for \"planets in the solar system,\" the system provides information about planets like Jupiter and Mars.\n", + "Related Keywords: Natural Language Processing (NLP), Search Algorithms, Data Mining\n", + "Metadata: {'source': './data/appendix-keywords.txt', 'id': 0}\n" + ] + } + ], + "execution_count": 5 + }, + { + "metadata": {}, + "cell_type": "markdown", + "source": "Now, let's wrap the base `retriever` with a `ContextualCompressionRetriever` and use `FlashrankRerank` as the compressor.", + "id": "ea07e244c9171d26" + }, + { + "metadata": { + "ExecuteTime": { + "end_time": "2025-01-08T14:53:42.781060Z", + "start_time": "2025-01-08T14:53:41.323926Z" + } + }, + "cell_type": "code", + "source": [ + "from langchain.retrievers import ContextualCompressionRetriever\n", + "from langchain.retrievers.document_compressors import FlashrankRerank\n", + "from langchain_openai import ChatOpenAI\n", + "\n", + "# Initialize the LLM\n", + "llm = ChatOpenAI(temperature=0)\n", + "\n", + "# Initialize the FlshrankRerank\n", + "compressor = FlashrankRerank(model=\"ms-marco-MultiBERT-L-12\")\n", + "\n", + "# Initialize the ContextualCompressioinRetriever\n", + "compression_retriever = ContextualCompressionRetriever(\n", + " base_compressor=compressor, base_retriever=retriever\n", + ")\n", + "\n", + "# Search for compressed documents\n", + "compressed_docs = compression_retriever.invoke(\n", + " \"Tell me about Word2Vec.\"\n", + ")\n", + "\n", + "# Print the document ID\n", + "print([doc.metadata[\"id\"] for doc in compressed_docs])" + ], + "id": "23a21f9f025132c5", + "outputs": [ + { + "ename": "PydanticUserError", + "evalue": "`FlashrankRerank` is not fully defined; you should define `Ranker`, then call `FlashrankRerank.model_rebuild()`.\n\nFor further information visit https://errors.pydantic.dev/2.9/u/class-not-fully-defined", + "output_type": "error", + "traceback": [ + "\u001B[1;31m---------------------------------------------------------------------------\u001B[0m", + "\u001B[1;31mPydanticUserError\u001B[0m Traceback (most recent call last)", + "Cell \u001B[1;32mIn[6], line 9\u001B[0m\n\u001B[0;32m 6\u001B[0m llm \u001B[38;5;241m=\u001B[39m ChatOpenAI(temperature\u001B[38;5;241m=\u001B[39m\u001B[38;5;241m0\u001B[39m)\n\u001B[0;32m 8\u001B[0m \u001B[38;5;66;03m# Initialize the FlshrankRerank\u001B[39;00m\n\u001B[1;32m----> 9\u001B[0m compressor \u001B[38;5;241m=\u001B[39m \u001B[43mFlashrankRerank\u001B[49m\u001B[43m(\u001B[49m\u001B[43mmodel\u001B[49m\u001B[38;5;241;43m=\u001B[39;49m\u001B[38;5;124;43m\"\u001B[39;49m\u001B[38;5;124;43mms-marco-MultiBERT-L-12\u001B[39;49m\u001B[38;5;124;43m\"\u001B[39;49m\u001B[43m)\u001B[49m\n\u001B[0;32m 11\u001B[0m \u001B[38;5;66;03m# Initialize the ContextualCompressioinRetriever\u001B[39;00m\n\u001B[0;32m 12\u001B[0m compression_retriever \u001B[38;5;241m=\u001B[39m ContextualCompressionRetriever(\n\u001B[0;32m 13\u001B[0m base_compressor\u001B[38;5;241m=\u001B[39mcompressor, base_retriever\u001B[38;5;241m=\u001B[39mretriever\n\u001B[0;32m 14\u001B[0m )\n", + " \u001B[1;31m[... skipping hidden 1 frame]\u001B[0m\n", + "File \u001B[1;32m~\\AppData\\Local\\pypoetry\\Cache\\virtualenvs\\langchain-opentutorial-GHgbjDj7-py3.11\\Lib\\site-packages\\pydantic\\_internal\\_mock_val_ser.py:99\u001B[0m, in \u001B[0;36mMockValSer.__getattr__\u001B[1;34m(self, item)\u001B[0m\n\u001B[0;32m 97\u001B[0m \u001B[38;5;66;03m# raise an AttributeError if `item` doesn't exist\u001B[39;00m\n\u001B[0;32m 98\u001B[0m \u001B[38;5;28mgetattr\u001B[39m(\u001B[38;5;28mself\u001B[39m\u001B[38;5;241m.\u001B[39m_val_or_ser, item)\n\u001B[1;32m---> 99\u001B[0m \u001B[38;5;28;01mraise\u001B[39;00m PydanticUserError(\u001B[38;5;28mself\u001B[39m\u001B[38;5;241m.\u001B[39m_error_message, code\u001B[38;5;241m=\u001B[39m\u001B[38;5;28mself\u001B[39m\u001B[38;5;241m.\u001B[39m_code)\n", + "\u001B[1;31mPydanticUserError\u001B[0m: `FlashrankRerank` is not fully defined; you should define `Ranker`, then call `FlashrankRerank.model_rebuild()`.\n\nFor further information visit https://errors.pydantic.dev/2.9/u/class-not-fully-defined" + ] + } + ], + "execution_count": 6 + }, + { + "metadata": {}, + "cell_type": "markdown", + "source": "Compare the results after reanker is applied.", + "id": "4a147fc787860bac" + }, + { + "metadata": {}, + "cell_type": "code", + "source": [ + "# Print the results of document compressions\n", + "pretty_print_docs(compressed_docs)" + ], + "id": "732f27a4e8b3d4cd", + "outputs": [], + "execution_count": null + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 2 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython2", + "version": "2.7.6" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/11-Reranker/data/appendix-keywords.txt b/11-Reranker/data/appendix-keywords.txt new file mode 100644 index 000000000..940a19186 --- /dev/null +++ b/11-Reranker/data/appendix-keywords.txt @@ -0,0 +1,153 @@ +Semantic Search +Definition: Semantic search is a search technique that understands the meaning of a user's query beyond simple keyword matching, returning results that are contextually relevant. +Example: If a user searches for "planets in the solar system," the system provides information about planets like Jupiter and Mars. +Related Keywords: Natural Language Processing (NLP), Search Algorithms, Data Mining + +Embedding +Definition: Embedding is the process of converting textual data, such as words or sentences, into low-dimensional continuous vectors that computers can process and understand. +Example: The word "apple" can be represented as a vector like [0.65, -0.23, 0.17]. +Related Keywords: Natural Language Processing (NLP), Vectorization, Deep Learning + +Token +Definition: A token refers to a smaller unit of text obtained by splitting a larger piece of text. It can be a word, phrase, or sentence. +Example: The sentence "I go to school" can be tokenized into "I," "go," "to," and "school." +Related Keywords: Tokenization, Natural Language Processing (NLP), Syntax Analysis + +Tokenizer +Definition: A tokenizer is a tool that splits text data into tokens, often used for preprocessing in natural language processing tasks. +Example: The sentence "I love programming." is tokenized into ["I", "love", "programming", "."]. +Related Keywords: Tokenization, Natural Language Processing (NLP), Syntax Analysis. + +VectorStore +Definition: A VectorStore is a system designed to store data in vector format, enabling efficient retrieval, classification, and analysis tasks. +Example: Storing word embedding vectors in a database for quick access during semantic search. +Related Keywords: Embedding, Database, Vectorization + +SQL +Definition: SQL (Structured Query Language) is a programming language for managing data in databases. +It allows you to perform various operations such as querying, updating, inserting, and deleting data. +Example: SELECT * FROM users WHERE age > 18; retrieves information about users aged above 18. +Related Keywords: Database, Query, Data Management + +CSV +Definition: CSV (Comma-Separated Values) is a file format used for storing tabular data, where each value is separated by a comma. +Example: A CSV file with headers "Name, Age, Occupation" may contain data like "John, 30, Developer." +Related Keywords: Data Format, File Processing, Data Exchange + +JSON +Definition: JSON (JavaScript Object Notation) is a lightweight data-interchange format that represents data objects using readable text for both humans and machines. +Example: {"Name": John", " Age": 30, " Occupation ": "Developer"} is a JSON object. +Related Keywords: Data Exchange, Web Development, API + +Transformer +Definition: A Transformer is a type of deep learning model widely used in natural language processing tasks like translation, summarization, and text generation. It is based on the Attention mechanism. +Example: Google Translate utilizes a Transformer model for multilingual translation. +Related Keywords: Deep Learning, Natural Language Processing (NLP), Attention mechanism + +HuggingFace +Definition: HuggingFace is a library offering pre-trained models and tools for natural language processing, making NLP tasks accessible to researchers and developers. +Example: HuggingFace's Transformers library can be used for sentiment analysis and text generation. +Related Keywords: Natural Language Processing (NLP), Deep Learning, Library. + +Digital Transformation +Definition: Digital transformation refers to the integration of technology to innovate services, culture, and operations within a company, enhancing competitiveness and business models. +Example: A company adopting cloud computing to revolutionize data storage and processing demonstrates digital transformation. +Related Keywords: Innovation, Technology, Business Model + +Crawling +Definition: Crawling is the automated process of visiting web pages to gather data, commonly used for search engine optimization and data analysis. +Example: Google Search Engine crawls websites to collect and index content. +Related Keywords: Data Collection, Web Scraping, Search Engine + +Word2Vec +Definition: Word2Vec is a technique in NLP that maps words to a vector space, representing their semantic relationships based on context. +Example: In a Word2Vec model, "king" and "queen" are represented by vectors located close to each other. +Related Keywords: Natural Language Processing (NLP), Embedding, Semantic Similarity + +LLM (Large Language Model) +Definition: LLMs are massive language models trained on large-scale text data, used for various natural language understanding and generation tasks. +Example: OpenAI's GPT series is a prominent example of LLMs. +Related Keywords: Natural Language Processing (NLP), Deep Learning, Text Generation + +FAISS (Facebook AI Similarity Search) +Definition: FAISS is a high-speed similarity search library developed by Facebook, optimized for searching large sets of vectors efficiently. +Example: FAISS can quickly find similar images among millions of image vectors. +Related Keywords: Vector Search, Machine Learning, Database Optimization + +Open Source +Definition: Open source software allows its source code to be freely used, modified, and distributed, fostering collaboration and innovation. +Example: The Linux operating system is a well-known open source project. +Related Keywords: Software Development, Community, Technical Collaboration +Structured Data +Definition: Structured data is organized according to a specific format or schema, making it easy to search and analyze. +Example: A customer information table in a relational database is an example of structured data. +Related Keywords: Database, Data Analysis, Data Modeling + +Parser +Definition: A parser analyzes input data (text, files, etc.) and converts it into a structured format, often used in programming language syntax analysis or file processing. +Example: Parsing an HTML document to generate its DOM structure is an instance of parsing. +Related Keywords: Syntax Analysis, Compiler, Data Processing + +TF-IDF (Term Frequency-Inverse Document Frequency) +Definition: TF-IDF is a statistical measure used to evaluate the importance of a word within a document by considering its frequency and rarity across a corpus. +Example: Words with high TF-IDF values are often unique and critical for understanding the document. +Related Keywords: Natural Language Processing (NLP), Information Retrieval, Data Mining + +Deep Learning +Definition: Deep learning is a subset of machine learning that uses neural networks to solve complex problems, focusing on learning high-level representations from data. +Example: Deep learning models are used for tasks like image recognition, speech recognition, and NLP. +Related Keywords: Artificial Neural Networks, Machine Learning, Data Analysis + +Schema +Definition: A schema defines the structure of a database or file, detailing how data is organized and stored. +Example: A relational database schema specifies column names, data types, and key constraints. +Related Keywords: Database, Data Modeling, Data Management + +DataFrame +Definition: A DataFrame is a tabular data structure with rows and columns, commonly used for data manipulation and analysis. +Example: In Python's Pandas library, a DataFrame can contain diverse data types and support various data operations. +Related Keywords: Data Analysis, Pandas, Data Processing + +Attention Mechanism +Definition: +The Attention mechanism is a technique in deep learning that allows models to focus more on important information. It is primarily used in processing sequential data such as text and time series. +Example: +In translation models, the Attention mechanism helps the model focus on relevant parts of the input sentence to generate accurate translations. +Related Keywords: Deep Learning, Natural Language Processing, Sequence Modeling + +Pandas +Definition: Pandas is a Python library offering tools for efficient data manipulation and analysis. It simplifies complex data operations. +Example: Pandas can be used to load, clean, and analyze CSV files. +Related Keywords: Data Analysis, Python, Data Processing + +GPT (Generative Pretrained Transformer) +Definition: GPT is a generative language model pre-trained on vast datasets, capable of performing various text-based tasks. It generates natural and coherent text based on input. +Example: A chatbot generating detailed answers to user queries is powered by GPT models. +Related Keywords: Natural Language Processing (NLP), Text Generation, Deep Learning + +InstructGPT +Definition: +InstructGPT is an optimized GPT model designed to perform specific tasks based on user instructions. It is built to generate more accurate and relevant results in response to given commands. +Example: When a user provides a specific instruction like "Draft an email," InstructGPT generates an email based on the provided content. +Related Keywords: Artificial Intelligence, Natural Language Understanding, Command-Based Processing + +Keyword Search +Definition: Keyword search involves finding information based on user-inputted keywords, commonly used in search engines and database systems. +Example: Searching +When a user searches for "coffee shops in Seoul," the system returns a list of relevant coffee shops. +Related Keywords: Search Engine, Data Search, Information Retrieval + +Page Rank +Definition: Page Rank is an algorithm for evaluating the importance of web pages, primarily used to rank search engine results. It analyzes the link structure of websites. +Example: Google uses Page Rank to determine the order of search results. +Related Keywords: Search Engine Optimization, Web Analytics, Link Analysis + +Data Mining +Definition: Data mining is the process of extracting useful information from large datasets using techniques like statistics, machine learning, and pattern recognition. +Example: Retailers analyzing customer purchase data to devise sales strategies is an application of data mining. +Related Keywords: Big Data, Pattern Recognition, Predictive Analytics + +Multimodal +Definition: Multimodal refers to combining and processing multiple types of data (e.g., text, images, and sound) to extract richer insights and predictions. +Example: A system analyzing both images and captions to perform accurate image classification demonstrates multimodal technology. +Related Keywords: Data Fusion, Artificial Intelligence, Deep Learning \ No newline at end of file From 45fe3d2bc94b075fbe3e001077cddbd041db5dfc Mon Sep 17 00:00:00 2001 From: Cha Hwa Young Date: Thu, 9 Jan 2025 00:13:49 +0900 Subject: [PATCH 2/8] [E-2]11-Reranker/04-FlashRank-Reranker --- 11-Reranker/04-FlashRank-Reranker.ipynb | 182 +++--------------------- 1 file changed, 17 insertions(+), 165 deletions(-) diff --git a/11-Reranker/04-FlashRank-Reranker.ipynb b/11-Reranker/04-FlashRank-Reranker.ipynb index 9d2922df4..f3c9d818c 100644 --- a/11-Reranker/04-FlashRank-Reranker.ipynb +++ b/11-Reranker/04-FlashRank-Reranker.ipynb @@ -4,6 +4,7 @@ "metadata": {}, "cell_type": "markdown", "source": [ + "\n", "# FlashRank Reranker\n", "\n", "- Author: [Hwayoung Cha](https://github.com/forwardyoung)\n", @@ -40,12 +41,7 @@ "id": "c7431102d93a694f" }, { - "metadata": { - "ExecuteTime": { - "end_time": "2025-01-08T14:53:36.107286Z", - "start_time": "2025-01-08T14:53:35.957725Z" - } - }, + "metadata": {}, "cell_type": "code", "source": [ "# Set environment variables\n", @@ -58,16 +54,8 @@ ")" ], "id": "501e9dfa010f326a", - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Environment variables have been set successfully.\n" - ] - } - ], - "execution_count": 1 + "outputs": [], + "execution_count": null }, { "metadata": {}, @@ -80,12 +68,7 @@ "id": "7d83ee066d91fb4f" }, { - "metadata": { - "ExecuteTime": { - "end_time": "2025-01-08T14:53:36.123222Z", - "start_time": "2025-01-08T14:53:36.108289Z" - } - }, + "metadata": {}, "cell_type": "code", "source": [ "# Configuration file to manage API keys as environment variables\n", @@ -95,27 +78,11 @@ "load_dotenv(override=True)" ], "id": "abed94e9253ec29e", - "outputs": [ - { - "data": { - "text/plain": [ - "True" - ] - }, - "execution_count": 2, - "metadata": {}, - "output_type": "execute_result" - } - ], - "execution_count": 2 + "outputs": [], + "execution_count": null }, { - "metadata": { - "ExecuteTime": { - "end_time": "2025-01-08T14:53:36.127689Z", - "start_time": "2025-01-08T14:53:36.124407Z" - } - }, + "metadata": {}, "cell_type": "code", "source": [ "# install\n", @@ -123,15 +90,10 @@ ], "id": "e31774e423dd76fb", "outputs": [], - "execution_count": 3 + "execution_count": null }, { - "metadata": { - "ExecuteTime": { - "end_time": "2025-01-08T14:53:36.133428Z", - "start_time": "2025-01-08T14:53:36.128691Z" - } - }, + "metadata": {}, "cell_type": "code", "source": [ "def pretty_print_docs(docs):\n", @@ -146,7 +108,7 @@ ], "id": "43856bcf1e8f0c63", "outputs": [], - "execution_count": 4 + "execution_count": null }, { "metadata": {}, @@ -159,12 +121,7 @@ "id": "1c7d03faa97bf809" }, { - "metadata": { - "ExecuteTime": { - "end_time": "2025-01-08T14:53:41.320899Z", - "start_time": "2025-01-08T14:53:36.134653Z" - } - }, + "metadata": {}, "cell_type": "code", "source": [ "from langchain_community.document_loaders import TextLoader\n", @@ -200,94 +157,8 @@ "pretty_print_docs(docs)" ], "id": "79d934121fd476be", - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Document 1:\n", - "\n", - "Word2Vec\n", - "Definition: Word2Vec is a technique in NLP that maps words to a vector space, representing their semantic relationships based on context.\n", - "Example: In a Word2Vec model, \"king\" and \"queen\" are represented by vectors located close to each other.\n", - "Related Keywords: Natural Language Processing (NLP), Embedding, Semantic Similarity\n", - "Metadata: {'source': './data/appendix-keywords.txt', 'id': 12}\n", - "----------------------------------------------------------------------------------------------------\n", - "Document 2:\n", - "\n", - "Embedding\n", - "Definition: Embedding is the process of converting textual data, such as words or sentences, into low-dimensional continuous vectors that computers can process and understand.\n", - "Example: The word \"apple\" can be represented as a vector like [0.65, -0.23, 0.17].\n", - "Related Keywords: Natural Language Processing (NLP), Vectorization, Deep Learning\n", - "Metadata: {'source': './data/appendix-keywords.txt', 'id': 1}\n", - "----------------------------------------------------------------------------------------------------\n", - "Document 3:\n", - "\n", - "VectorStore\n", - "Definition: A VectorStore is a system designed to store data in vector format, enabling efficient retrieval, classification, and analysis tasks.\n", - "Example: Storing word embedding vectors in a database for quick access during semantic search.\n", - "Related Keywords: Embedding, Database, Vectorization\n", - "Metadata: {'source': './data/appendix-keywords.txt', 'id': 4}\n", - "----------------------------------------------------------------------------------------------------\n", - "Document 4:\n", - "\n", - "TF-IDF (Term Frequency-Inverse Document Frequency)\n", - "Definition: TF-IDF is a statistical measure used to evaluate the importance of a word within a document by considering its frequency and rarity across a corpus.\n", - "Example: Words with high TF-IDF values are often unique and critical for understanding the document.\n", - "Related Keywords: Natural Language Processing (NLP), Information Retrieval, Data Mining\n", - "Metadata: {'source': './data/appendix-keywords.txt', 'id': 18}\n", - "----------------------------------------------------------------------------------------------------\n", - "Document 5:\n", - "\n", - "GPT (Generative Pretrained Transformer)\n", - "Definition: GPT is a generative language model pre-trained on vast datasets, capable of performing various text-based tasks. It generates natural and coherent text based on input.\n", - "Example: A chatbot generating detailed answers to user queries is powered by GPT models.\n", - "Related Keywords: Natural Language Processing (NLP), Text Generation, Deep Learning\n", - "Metadata: {'source': './data/appendix-keywords.txt', 'id': 24}\n", - "----------------------------------------------------------------------------------------------------\n", - "Document 6:\n", - "\n", - "Transformer\n", - "Definition: A Transformer is a type of deep learning model widely used in natural language processing tasks like translation, summarization, and text generation. It is based on the Attention mechanism.\n", - "Example: Google Translate utilizes a Transformer model for multilingual translation.\n", - "Related Keywords: Deep Learning, Natural Language Processing (NLP), Attention mechanism\n", - "Metadata: {'source': './data/appendix-keywords.txt', 'id': 8}\n", - "----------------------------------------------------------------------------------------------------\n", - "Document 7:\n", - "\n", - "LLM (Large Language Model)\n", - "Definition: LLMs are massive language models trained on large-scale text data, used for various natural language understanding and generation tasks.\n", - "Example: OpenAI's GPT series is a prominent example of LLMs.\n", - "Related Keywords: Natural Language Processing (NLP), Deep Learning, Text Generation\n", - "Metadata: {'source': './data/appendix-keywords.txt', 'id': 13}\n", - "----------------------------------------------------------------------------------------------------\n", - "Document 8:\n", - "\n", - "HuggingFace\n", - "Definition: HuggingFace is a library offering pre-trained models and tools for natural language processing, making NLP tasks accessible to researchers and developers.\n", - "Example: HuggingFace's Transformers library can be used for sentiment analysis and text generation.\n", - "Related Keywords: Natural Language Processing (NLP), Deep Learning, Library.\n", - "Metadata: {'source': './data/appendix-keywords.txt', 'id': 9}\n", - "----------------------------------------------------------------------------------------------------\n", - "Document 9:\n", - "\n", - "Tokenizer\n", - "Definition: A tokenizer is a tool that splits text data into tokens, often used for preprocessing in natural language processing tasks.\n", - "Example: The sentence \"I love programming.\" is tokenized into [\"I\", \"love\", \"programming\", \".\"].\n", - "Related Keywords: Tokenization, Natural Language Processing (NLP), Syntax Analysis.\n", - "Metadata: {'source': './data/appendix-keywords.txt', 'id': 3}\n", - "----------------------------------------------------------------------------------------------------\n", - "Document 10:\n", - "\n", - "Semantic Search\n", - "Definition: Semantic search is a search technique that understands the meaning of a user's query beyond simple keyword matching, returning results that are contextually relevant.\n", - "Example: If a user searches for \"planets in the solar system,\" the system provides information about planets like Jupiter and Mars.\n", - "Related Keywords: Natural Language Processing (NLP), Search Algorithms, Data Mining\n", - "Metadata: {'source': './data/appendix-keywords.txt', 'id': 0}\n" - ] - } - ], - "execution_count": 5 + "outputs": [], + "execution_count": null }, { "metadata": {}, @@ -296,12 +167,7 @@ "id": "ea07e244c9171d26" }, { - "metadata": { - "ExecuteTime": { - "end_time": "2025-01-08T14:53:42.781060Z", - "start_time": "2025-01-08T14:53:41.323926Z" - } - }, + "metadata": {}, "cell_type": "code", "source": [ "from langchain.retrievers import ContextualCompressionRetriever\n", @@ -328,22 +194,8 @@ "print([doc.metadata[\"id\"] for doc in compressed_docs])" ], "id": "23a21f9f025132c5", - "outputs": [ - { - "ename": "PydanticUserError", - "evalue": "`FlashrankRerank` is not fully defined; you should define `Ranker`, then call `FlashrankRerank.model_rebuild()`.\n\nFor further information visit https://errors.pydantic.dev/2.9/u/class-not-fully-defined", - "output_type": "error", - "traceback": [ - "\u001B[1;31m---------------------------------------------------------------------------\u001B[0m", - "\u001B[1;31mPydanticUserError\u001B[0m Traceback (most recent call last)", - "Cell \u001B[1;32mIn[6], line 9\u001B[0m\n\u001B[0;32m 6\u001B[0m llm \u001B[38;5;241m=\u001B[39m ChatOpenAI(temperature\u001B[38;5;241m=\u001B[39m\u001B[38;5;241m0\u001B[39m)\n\u001B[0;32m 8\u001B[0m \u001B[38;5;66;03m# Initialize the FlshrankRerank\u001B[39;00m\n\u001B[1;32m----> 9\u001B[0m compressor \u001B[38;5;241m=\u001B[39m \u001B[43mFlashrankRerank\u001B[49m\u001B[43m(\u001B[49m\u001B[43mmodel\u001B[49m\u001B[38;5;241;43m=\u001B[39;49m\u001B[38;5;124;43m\"\u001B[39;49m\u001B[38;5;124;43mms-marco-MultiBERT-L-12\u001B[39;49m\u001B[38;5;124;43m\"\u001B[39;49m\u001B[43m)\u001B[49m\n\u001B[0;32m 11\u001B[0m \u001B[38;5;66;03m# Initialize the ContextualCompressioinRetriever\u001B[39;00m\n\u001B[0;32m 12\u001B[0m compression_retriever \u001B[38;5;241m=\u001B[39m ContextualCompressionRetriever(\n\u001B[0;32m 13\u001B[0m base_compressor\u001B[38;5;241m=\u001B[39mcompressor, base_retriever\u001B[38;5;241m=\u001B[39mretriever\n\u001B[0;32m 14\u001B[0m )\n", - " \u001B[1;31m[... skipping hidden 1 frame]\u001B[0m\n", - "File \u001B[1;32m~\\AppData\\Local\\pypoetry\\Cache\\virtualenvs\\langchain-opentutorial-GHgbjDj7-py3.11\\Lib\\site-packages\\pydantic\\_internal\\_mock_val_ser.py:99\u001B[0m, in \u001B[0;36mMockValSer.__getattr__\u001B[1;34m(self, item)\u001B[0m\n\u001B[0;32m 97\u001B[0m \u001B[38;5;66;03m# raise an AttributeError if `item` doesn't exist\u001B[39;00m\n\u001B[0;32m 98\u001B[0m \u001B[38;5;28mgetattr\u001B[39m(\u001B[38;5;28mself\u001B[39m\u001B[38;5;241m.\u001B[39m_val_or_ser, item)\n\u001B[1;32m---> 99\u001B[0m \u001B[38;5;28;01mraise\u001B[39;00m PydanticUserError(\u001B[38;5;28mself\u001B[39m\u001B[38;5;241m.\u001B[39m_error_message, code\u001B[38;5;241m=\u001B[39m\u001B[38;5;28mself\u001B[39m\u001B[38;5;241m.\u001B[39m_code)\n", - "\u001B[1;31mPydanticUserError\u001B[0m: `FlashrankRerank` is not fully defined; you should define `Ranker`, then call `FlashrankRerank.model_rebuild()`.\n\nFor further information visit https://errors.pydantic.dev/2.9/u/class-not-fully-defined" - ] - } - ], - "execution_count": 6 + "outputs": [], + "execution_count": null }, { "metadata": {}, From ebdd2ae48dc8b8f26d74765a6837d809ae0033c0 Mon Sep 17 00:00:00 2001 From: Cha Hwa Young Date: Thu, 9 Jan 2025 00:17:51 +0900 Subject: [PATCH 3/8] [E-2]11-Reranker/04-FlashRank-Reranker --- 11-Reranker/04-FlashRank-Reranker.ipynb | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/11-Reranker/04-FlashRank-Reranker.ipynb b/11-Reranker/04-FlashRank-Reranker.ipynb index f3c9d818c..6fd3b4bd2 100644 --- a/11-Reranker/04-FlashRank-Reranker.ipynb +++ b/11-Reranker/04-FlashRank-Reranker.ipynb @@ -22,7 +22,8 @@ "### Table of Contents\n", "\n", "- [Overview](#overview)\n", - "- [Environement Setup](#environment-setup)" + "- [Environement Setup](#environment-setup)\n", + "- [FlashRankRerank](#flashrankrerank)" ], "id": "c69d1f48d21cd2b4" }, From 1b8d6bffeb6c2dc1e98c95b67367731ae5338f3d Mon Sep 17 00:00:00 2001 From: Cha Hwa Young Date: Thu, 9 Jan 2025 00:29:18 +0900 Subject: [PATCH 4/8] [E-2]11-Reranker/04-FlashRank-Reranker --- 11-Reranker/04-FlashRank-Reranker.ipynb | 139 ++++++++++++++---------- 1 file changed, 81 insertions(+), 58 deletions(-) diff --git a/11-Reranker/04-FlashRank-Reranker.ipynb b/11-Reranker/04-FlashRank-Reranker.ipynb index 6fd3b4bd2..61cd90c19 100644 --- a/11-Reranker/04-FlashRank-Reranker.ipynb +++ b/11-Reranker/04-FlashRank-Reranker.ipynb @@ -1,8 +1,9 @@ { "cells": [ { - "metadata": {}, "cell_type": "markdown", + "id": "c69d1f48d21cd2b4", + "metadata": {}, "source": [ "\n", "# FlashRank Reranker\n", @@ -24,12 +25,12 @@ "- [Overview](#overview)\n", "- [Environement Setup](#environment-setup)\n", "- [FlashRankRerank](#flashrankrerank)" - ], - "id": "c69d1f48d21cd2b4" + ] }, { - "metadata": {}, "cell_type": "markdown", + "id": "c7431102d93a694f", + "metadata": {}, "source": [ "## Environment Setup\n", "\n", @@ -38,12 +39,14 @@ "**[Note]**\n", "- `langchain-opentutorial` is a package that provides a set of easy-to-use environment setup, useful functions and utilities for tutorials. \n", "- You can checkout the [`langchain-opentutorial`](https://github.com/LangChain-OpenTutorial/langchain-opentutorial-pypi) for more details." - ], - "id": "c7431102d93a694f" + ] }, { - "metadata": {}, "cell_type": "code", + "execution_count": null, + "id": "501e9dfa010f326a", + "metadata": {}, + "outputs": [], "source": [ "# Set environment variables\n", "from langchain_opentutorial import set_env\n", @@ -53,49 +56,68 @@ " \"OPENAI_API_KEY\": \"\",\n", " }\n", ")" - ], - "id": "501e9dfa010f326a", - "outputs": [], - "execution_count": null + ] }, { - "metadata": {}, "cell_type": "markdown", + "id": "7d83ee066d91fb4f", + "metadata": {}, "source": [ "You can alternatively set OPENAI_API_KEY in .env file and load it.\n", "\n", "[Note] This is not necessary if you've already set OPENAI_API_KEY in previous steps." - ], - "id": "7d83ee066d91fb4f" + ] }, { - "metadata": {}, "cell_type": "code", + "execution_count": null, + "id": "abed94e9253ec29e", + "metadata": {}, + "outputs": [], "source": [ "# Configuration file to manage API keys as environment variables\n", "from dotenv import load_dotenv\n", "\n", "# Load API key information\n", "load_dotenv(override=True)" - ], - "id": "abed94e9253ec29e", - "outputs": [], - "execution_count": null + ] }, { - "metadata": {}, "cell_type": "code", - "source": [ - "# install\n", - "# !pip install -qU flashrank" - ], - "id": "e31774e423dd76fb", + "execution_count": 7, + "id": "687b4939", + "metadata": {}, "outputs": [], - "execution_count": null + "source": [ + "%%capture --no-stderr\n", + "%pip install langchain-opentutorial" + ] }, { + "cell_type": "code", + "execution_count": null, + "id": "af16502c", "metadata": {}, + "outputs": [], + "source": [ + "# Install required packages\n", + "from langchain_opentutorial import package\n", + "\n", + "package.install(\n", + " [\n", + " \"flashrank\"\n", + " ],\n", + " verbose=False,\n", + " upgrade=False,\n", + ")" + ] + }, + { "cell_type": "code", + "execution_count": 5, + "id": "43856bcf1e8f0c63", + "metadata": {}, + "outputs": [], "source": [ "def pretty_print_docs(docs):\n", " print(\n", @@ -106,24 +128,24 @@ " ]\n", " )\n", " )" - ], - "id": "43856bcf1e8f0c63", - "outputs": [], - "execution_count": null + ] }, { - "metadata": {}, "cell_type": "markdown", + "id": "1c7d03faa97bf809", + "metadata": {}, "source": [ "## FlashrankRerank\n", "\n", "Load data for a simple example and create a retriever." - ], - "id": "1c7d03faa97bf809" + ] }, { - "metadata": {}, "cell_type": "code", + "execution_count": null, + "id": "79d934121fd476be", + "metadata": {}, + "outputs": [], "source": [ "from langchain_community.document_loaders import TextLoader\n", "from langchain_community.vectorstores import FAISS\n", @@ -156,20 +178,22 @@ "\n", "# Print the document\n", "pretty_print_docs(docs)" - ], - "id": "79d934121fd476be", - "outputs": [], - "execution_count": null + ] }, { - "metadata": {}, "cell_type": "markdown", - "source": "Now, let's wrap the base `retriever` with a `ContextualCompressionRetriever` and use `FlashrankRerank` as the compressor.", - "id": "ea07e244c9171d26" + "id": "ea07e244c9171d26", + "metadata": {}, + "source": [ + "Now, let's wrap the base `retriever` with a `ContextualCompressionRetriever` and use `FlashrankRerank` as the compressor." + ] }, { - "metadata": {}, "cell_type": "code", + "execution_count": null, + "id": "23a21f9f025132c5", + "metadata": {}, + "outputs": [], "source": [ "from langchain.retrievers import ContextualCompressionRetriever\n", "from langchain.retrievers.document_compressors import FlashrankRerank\n", @@ -193,46 +217,45 @@ "\n", "# Print the document ID\n", "print([doc.metadata[\"id\"] for doc in compressed_docs])" - ], - "id": "23a21f9f025132c5", - "outputs": [], - "execution_count": null + ] }, { - "metadata": {}, "cell_type": "markdown", - "source": "Compare the results after reanker is applied.", - "id": "4a147fc787860bac" + "id": "4a147fc787860bac", + "metadata": {}, + "source": [ + "Compare the results after reanker is applied." + ] }, { - "metadata": {}, "cell_type": "code", + "execution_count": null, + "id": "732f27a4e8b3d4cd", + "metadata": {}, + "outputs": [], "source": [ "# Print the results of document compressions\n", "pretty_print_docs(compressed_docs)" - ], - "id": "732f27a4e8b3d4cd", - "outputs": [], - "execution_count": null + ] } ], "metadata": { "kernelspec": { - "display_name": "Python 3", + "display_name": "langchain-opentutorial-GHgbjDj7-py3.11", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", - "version": 2 + "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", - "pygments_lexer": "ipython2", - "version": "2.7.6" + "pygments_lexer": "ipython3", + "version": "3.11.3" } }, "nbformat": 4, From 761f5056564908009758cf2274e1e52bd1317404 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=EB=B0=B0=EA=B8=B0=EB=AF=BC?= <53887180+BAEM1N@users.noreply.github.com> Date: Thu, 9 Jan 2025 19:39:22 +0900 Subject: [PATCH 5/8] [I] requirements.txt / update requirements.txt Restrict python-magic-bin installation to non-macOS platforms in requirements.txt --- requirements.txt | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/requirements.txt b/requirements.txt index cf2daac52..ec38759e8 100644 --- a/requirements.txt +++ b/requirements.txt @@ -30,7 +30,8 @@ langchain-neo4j langchain-mongodb fastembed certifi -python-magic-bin pymongo langchain_qdrant +# python-magic-bin 설치 제한 (macOS에서 제외) +python-magic-bin; sys_platform != "darwin" From 6a5ffe1f3a15a37fff5169074c0a49bee071764e Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=EB=B0=B0=EA=B8=B0=EB=AF=BC?= <53887180+BAEM1N@users.noreply.github.com> Date: Thu, 9 Jan 2025 19:52:01 +0900 Subject: [PATCH 6/8] [I] requirements.txt / revert requirements.txt Revert commit due to incorrect branch application --- requirements.txt | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/requirements.txt b/requirements.txt index ec38759e8..e000b026d 100644 --- a/requirements.txt +++ b/requirements.txt @@ -30,8 +30,6 @@ langchain-neo4j langchain-mongodb fastembed certifi +python-magic-bin pymongo langchain_qdrant - -# python-magic-bin 설치 제한 (macOS에서 제외) -python-magic-bin; sys_platform != "darwin" From 1646a7505aeb3f740d7ce2147bd4c43829252aff Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=EB=B0=B0=EA=B8=B0=EB=AF=BC?= <53887180+BAEM1N@users.noreply.github.com> Date: Thu, 9 Jan 2025 19:53:53 +0900 Subject: [PATCH 7/8] [I] requirements.txt / revert requirements.txt From b4604023f8d1d32009f56059d241df55af44acb5 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=EB=B0=B0=EA=B8=B0=EB=AF=BC?= <53887180+BAEM1N@users.noreply.github.com> Date: Thu, 9 Jan 2025 19:54:21 +0900 Subject: [PATCH 8/8] [I] requirements.txt / revert requirements.txt