🧠 HackRx RAG API — Document Q&A using LLaMA-3 + Pinecone

This project is a production-ready Retrieval-Augmented Generation (RAG) API that takes in a user-uploaded PDF and a list of natural language questions, performs contextual retrieval over the document, and returns accurate answers powered by Groq’s ultra-fast LLaMA-3.1-70B Versatile model.

🚀 Features

📄 Accepts any publicly accessible PDF document
❓ Accepts multiple user questions in one request
🧩 Uses sentence-aware chunking (NLTK) to split documents at sentence boundaries
📌 Uses hash-based chunk IDs to prevent duplicate embeddings of the same document
💾 Stores and retrieves embeddings via Pinecone vector database
⚡ Embeddings are generated locally using Huggingface Sentence-Transformers
🤖 Question answering powered by LLaMA-3.1-70B Versatile via Groq API

🛠️ Tech Stack

Layer	Tool/Service
⚙ Backend	FastAPI
📄 Embeddings	`sentence-transformers` (local, Huggingface)
🔍 Vector Store	Pinecone
🧠 LLM (Answering)	`LLaMA-3.1-70B Versatile` via Groq API
🧠 Sentence Chunking	NLTK for sentence splitting
🔐 Auth	Bearer token-based authorization
🔄 Deduplication	SHA256 hash of document + chunk IDs

📦 API Overview

🔹 `POST /hackrx/run`

Request Example:

{
  "documents": "https://example.com/your-policy.pdf",
  "questions": [
    "What is the grace period for premium payment?",
    "What are the exclusions under maternity benefits?"
  ]
}

Header:

Authorization: Bearer <your_api_key>

Response Example:

{
  "answers": [
    "A 30-day grace period is provided for premium payments.",
    "Maternity benefits are covered after 24 months with sub-limits."
  ]
}

📁 Folder Structure

.
├── main1.py                  # FastAPI entrypoint
├── get_text_pdf.py         # PDF text extraction logic
├── get_embeddings.py       # Sentence splitting + local embedding generation
├── llm_answering.py        # Groq API answering using LLaMA-3
├── pinecone_db.py          # Pinecone upsert/query utilities
├── hash_text.py            # SHA256 hashing for deduplication
├── .env                    # Store your API keys here

🔐 Environment Variables

Create a .env file in your project root with the following content:

MY_API_KEY_FOR_AUTH=your_local_api_key
PINECONE_API_KEY=your_pinecone_key
PINECONE_INDEX_NAME=hackrx-index
GROQ_API_KEY=your_groq_api_key

🧠 Deduplication Logic

Each uploaded document is hashed using SHA256. This ensures:

✅ You never embed the same document twice
✅ Pinecone stores each chunk under an ID like:

{doc_hash}-chunk-{i}

If the same PDF is uploaded again, embedding is skipped and only retrieval happens.

🧠 Embedding Model (Local)

This app uses HuggingFace Sentence Transformers locally Fast and suitable for short to mid-length documents.

📎 Sentence-Aware Chunking (NLTK)

Documents are first tokenized into sentences using NLTK's sent_tokenize().

Chunks are then built using sentence boundaries (not blindly splitting by character length), improving semantic consistency.

Overlap between chunks is handled at sentence level for better context flow.

📌 Chunk Splitting and ID Logic

Chunk size: ~700 characters
Overlap: 1–2 sentences from the previous chunk
Chunk ID: {doc_hash}-chunk-{i}

This allows:

🔄 Consistent re-chunking
🧼 Idempotent upserts into Pinecone

🔐 Authorization

All API endpoints are protected via Bearer Token. Provide this header in every request:

Authorization: Bearer your_local_api_key

🧪 Running Locally

1. Install requirements

pip install -r requirements.txt

2. Run the FastAPI server

uvicorn main1:app --reload

Then go to http://127.0.0.1:8000/docs to try the API with Swagger UI.

🧼 Optional: Clear Pinecone Index

To clear all document vectors (dev-only):

curl -X POST http://localhost:8000//hackrx/clear

🧠 Model Used

Answering is powered by:

LLaMA-3.1-70B Versatile served via Groq API — ultra low-latency inference with OpenAI-compatible API interface.

📄 License

MIT License

🙌 Acknowledgements

HuggingFace for MiniLM sentence encoders
Groq for blazing-fast LLM inference
Pinecone for scalable vector storage
NLTK for sentence-aware chunking
FastAPI for building production APIs fast

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
download_models.py		download_models.py
get_embeddings.py		get_embeddings.py
get_text_pdf.py		get_text_pdf.py
hash_text.py		hash_text.py
llm_answering.py		llm_answering.py
main.py		main.py
nltk_setup.py		nltk_setup.py
pinecone_db.py		pinecone_db.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🧠 HackRx RAG API — Document Q&A using LLaMA-3 + Pinecone

🚀 Features

🛠️ Tech Stack

📦 API Overview

🔹 `POST /hackrx/run`

Request Example:

Header:

Response Example:

📁 Folder Structure

🔐 Environment Variables

🧠 Deduplication Logic

🧠 Embedding Model (Local)

📎 Sentence-Aware Chunking (NLTK)

📌 Chunk Splitting and ID Logic

🔐 Authorization

🧪 Running Locally

1. Install requirements

2. Run the FastAPI server

🧼 Optional: Clear Pinecone Index

🧠 Model Used

📄 License

🙌 Acknowledgements

About

Uh oh!

Releases

Packages

Languages

AniT20EXPERT/hackrx_rag

Folders and files

Latest commit

History

Repository files navigation

🧠 HackRx RAG API — Document Q&A using LLaMA-3 + Pinecone

🚀 Features

🛠️ Tech Stack

📦 API Overview

🔹 POST /hackrx/run

Request Example:

Header:

Response Example:

📁 Folder Structure

🔐 Environment Variables

🧠 Deduplication Logic

🧠 Embedding Model (Local)

📎 Sentence-Aware Chunking (NLTK)

📌 Chunk Splitting and ID Logic

🔐 Authorization

🧪 Running Locally

1. Install requirements

2. Run the FastAPI server

🧼 Optional: Clear Pinecone Index

🧠 Model Used

📄 License

🙌 Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

🔹 `POST /hackrx/run`

Packages