🤖 M3ARAG: Multi-Modal Multi-Agent RAG Pipeline

This project is a fully modular Multi-Modal Multi-Agent Retrieval-Augmented Generation (RAG) system capable of processing PDFs, HTMLs, images, and tables for answering questions using a pipeline of specialized agents:

Text Agent + Image Agent: For generating insights from the retrieved contexts of the RAG system
Generalize Agent would combine and generalize the answers from TextAgent and ImageAgent for each question
Planning Agent would receive the query from the user and then separates it into several "tasks" or questions for retrieving many information from the RAG system.
Merge Agent would combine all the responses from the Generalize Agent and merge them into a response which would answer the initial query from the user.
Verfier Agent would score the combined answer of the Merge Agent and then telling if we need to query for more information by generating following up questions for continually retrieve information.

It supports local document extraction via Docling, embedding with SentenceTransformers, and multi-agent orchestration.

📁 Project Structure

M3ARAG/
├── agents/                 # Modular agent logic.
├── pipeline/               # Pipeline and Chat launcher interface
├── data/                   # Storing the downloaded files
│   ├── store/              # Raw downloaded files (PDF, HTML, etc.)
│   ├── merge/              # Single processing location for indexing of RAG.
│   └── extract/            # Converted PDFs, extracted images/tables
├── RAG/                    # RAG system
├── config/                 # Config files for RAG, Agents and Prompt file
│   ├── agent_config.py     # Config for using Agents
│   ├── rag_config.py       # Config for using RAG
│   └── prompt.py           # Prompts Storage.
├── rag_text/               # RAG text captioning
├── rag_image/              # RAG image captioning
├── utils/                  # Helper utilities (e.g., process_documents)
├── test/                   # Testing places
├── main.py                 # Main entry point
├── chat_streamlit.py       # Main function for chatting via streamlit
├── README.md               # Main information about the repository
├── timeline.md             # Tasks and next tasks that we have done

🚀 How to Run

1. Clone the repo

git clone https://github.com/pdz1804/M3ARAG.git
cd M3ARAG

2. Create Python environment (optional but recommended)

python -m venv myenv
# Or: conda create -n m3arag python=3.10

# Activate
source myenv/bin/activate  # macOS/Linux
myenv\Scripts\activate     # Windows

3. Install dependencies

pip install -r requirements.txt

💡 If requirements.txt is missing, install manually like below:

pip install sentence-transformers langchain openai chromadb docling python-dotenv

4. Install Poppler

Windows

Download Poppler for Windows:
- Visit: https://github.com/oschwartz10612/poppler-windows/releases/
- Download the latest .zip file under Assets (e.g., poppler-xx_xx_xx.zip).
Extract the zip to a location like C:\poppler.
Add Poppler to PATH:
- Open Start > Environment Variables.
- Under System Variables, find and select Path, click Edit.
- Click New and add:
```
C:\poppler\Library\bin
```
- Click OK and restart your terminal.

Verify installation:

where pdfinfo

You should see:

C:\poppler\Library\bin\pdfinfo.exe

macOS (Homebrew)

brew install poppler

To verify:

which pdfinfo

Ubuntu/Linux

sudo apt update
sudo apt install poppler-utils

To verify:

which pdfinfo

5. Set up your environment variables

Copy .env.example and rename to .env, then fill in your keys:

OPENAI_API_KEY=pdz-...
GOOGLE_API_KEY=pdz-...

6. Run the full pipeline

If you want to run RAG-flow individually without Agents or with Agents:

# Download data only (for local testing)
python main.py --download

# Ingest data only
python main.py --ingest

# Chatting 
python main.py --chat

# Small note: we can run --download --ingest --chat at once

# Run it on streamlit: by uploading docs or inputing urls
python main.py --app

This will:

Download and store files from hardcoded URLs.
Extract content using Docling.
Index text via SentenceTransformers + Chroma.
Start interactive agent-based chat loop.

🧠 Agents Used

Agent	Description
`TextAgent`	Answers questions by retrieving from embedded text chunks
`ImageAgent`	Answers questions by retrieving from embedded images of pages
`GeneralizeAgent`	Combines answers from multiple modalities (text, image)
`PlanningAgent`	Decomposes complex questions into structured sub-questions.
`MergeAgent`	Fuses sub-agent responses into a coherent final answer.
`VerifierAgent`	Evaluates merged answer, determines quality, and suggests refinement.

🔍 Visual Overview

Toggle to view Document Processing & Indexing

This diagram shows how documents are split into chunks and images, indexed via ChromaDB and stored on disk.

Toggle to view Multi-Modal Retrieval Pipeline

Illustration of text and image-based retrieval using sub-queries from the user question.

Toggle to view Agent-Oriented Workflow

Overview of how multiple specialized agents interact to process, merge, verify, and answer complex queries.

📥 Input Types Supported

✅ PDF documents (.pdf)
✅ HTML, MD, PPTX, CSV, DOCX, TXT (converted to PDF)
✅ Extracted images (captioning + indexing coming soon)
🧪 Support for audio, .json, .xml being tested for later release

📄 License

This project is licensed under the MIT License. See LICENSE for details.

🧠 Author

Built by Nguyen Quang Phu (pdz1804) and Tieu Tri Bang

🙋‍♂️ Contact

Reach out or open an issue for support or ideas.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🤖 M3ARAG: Multi-Modal Multi-Agent RAG Pipeline

📁 Project Structure

🚀 How to Run

1. Clone the repo

2. Create Python environment (optional but recommended)

3. Install dependencies

4. Install Poppler

Windows

macOS (Homebrew)

Ubuntu/Linux

5. Set up your environment variables

6. Run the full pipeline

🧠 Agents Used

🔍 Visual Overview

📥 Input Types Supported

📄 License

🧠 Author

🙋‍♂️ Contact

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
RAG		RAG
agents		agents
config		config
imgs		imgs
pipeline		pipeline
rag_image		rag_image
rag_text		rag_text
test		test
utils		utils
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
chat_streamlit.py		chat_streamlit.py
main.py		main.py
requirements.txt		requirements.txt
timeline.md		timeline.md

License

pdz1804/M3ARAG

Folders and files

Latest commit

History

Repository files navigation

🤖 M3ARAG: Multi-Modal Multi-Agent RAG Pipeline

📁 Project Structure

🚀 How to Run

1. Clone the repo

2. Create Python environment (optional but recommended)

3. Install dependencies

4. Install Poppler

Windows

macOS (Homebrew)

Ubuntu/Linux

5. Set up your environment variables

6. Run the full pipeline

🧠 Agents Used

🔍 Visual Overview

📥 Input Types Supported

📄 License

🧠 Author

🙋‍♂️ Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages