Conversational Agent with LangChain, OpenAI API, and RAG Concept

Project Overview

This project is a conversational agent that leverages LangChain, OpenAI API, and the RAG (Retrieval-Augmented Generation) concept. The agent is designed to read lengthy PDF documents, extract various components such as text, images, and tables, and store them in a vector database for efficient retrieval during conversations with users.

Features

PDF Processing: The agent is capable of parsing and extracting information from long PDF documents.
Multi-Modal Extraction: Extracts text, images, and tables from PDFs for a comprehensive understanding.
Vector Database: Utilizes a vector database to store and retrieve information efficiently.
Conversational AI: Implements the RAG concept to enhance conversational interactions with users.

Multi-Modal RAG

We will use Unstructured to parse images, text, and tables from documents (PDFs).
We will use the multi-vector retriever with Chroma to store raw text and images along with their summaries for retrieval.
We will use GPT-4V for both image summarization (for retrieval) as well as final answer synthesis from join review of images and texts (or tables).

Dependencies

LangChain <- Visit here to understand langchain installation
OpenAI API <- Instructions for setting up and using OpenAI API.
Chroma DB <- Instructions for setting up and using the vector database.

Usage

Provide path to the source pdf
Change the prompt_text according to your needs.
Replace your questions in the query line.
The agent will use the stored information for intelligent responses.

Considerations

Retrieval

Retrieval is performed based upon similarity to image summaries as well as text chunks. This requires some careful consideration because image retrieval can fail if there are competing text chunks. To mitigate this, I produce larger (4k token) text chunks and summarize them for retrieval.
Image Size

The quality of answer synthesis appears to be sensitive to image size, as expected. I'll do evals soon to test this more carefully.

License

This project is licensed under the MIT License.

Acknowledgments

Thanks to LangChain for providing the language processing capabilities.
Thanks to OpenAI for the powerful language model.
Thanks to Vector DB for efficient data storage.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
figures		figures
.DS_Store		.DS_Store
.gitignore		.gitignore
Attention_is_all_you_need.pdf		Attention_is_all_you_need.pdf
LICENSE.md		LICENSE.md
Multi_modal_RAG.ipynb		Multi_modal_RAG.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Conversational Agent with LangChain, OpenAI API, and RAG Concept

Project Overview

Features

Multi-Modal RAG

Dependencies

Usage

Considerations

License

Acknowledgments

About

Uh oh!

Releases

Packages

Languages

License

SunGajiwala/Multi-Modal-using-RAG

Folders and files

Latest commit

History

Repository files navigation

Conversational Agent with LangChain, OpenAI API, and RAG Concept

Project Overview

Features

Multi-Modal RAG

Dependencies

Usage

Considerations

License

Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages