This project is a production-ready Retrieval-Augmented Generation (RAG) API that takes in a user-uploaded PDF and a list of natural language questions, performs contextual retrieval over the document, and returns accurate answers powered by Groq’s ultra-fast LLaMA-3.1-70B Versatile model.
- 📄 Accepts any publicly accessible PDF document
- ❓ Accepts multiple user questions in one request
- 🧩 Uses sentence-aware chunking (NLTK) to split documents at sentence boundaries
- 📌 Uses hash-based chunk IDs to prevent duplicate embeddings of the same document
- 💾 Stores and retrieves embeddings via Pinecone vector database
- ⚡ Embeddings are generated locally using Huggingface Sentence-Transformers
- 🤖 Question answering powered by LLaMA-3.1-70B Versatile via Groq API
| Layer | Tool/Service |
|---|---|
| ⚙ Backend | FastAPI |
| 📄 Embeddings | sentence-transformers (local, Huggingface) |
| 🔍 Vector Store | Pinecone |
| 🧠 LLM (Answering) | LLaMA-3.1-70B Versatile via Groq API |
| 🧠 Sentence Chunking | NLTK for sentence splitting |
| 🔐 Auth | Bearer token-based authorization |
| 🔄 Deduplication | SHA256 hash of document + chunk IDs |
{
"documents": "https://example.com/your-policy.pdf",
"questions": [
"What is the grace period for premium payment?",
"What are the exclusions under maternity benefits?"
]
}Authorization: Bearer <your_api_key>
{
"answers": [
"A 30-day grace period is provided for premium payments.",
"Maternity benefits are covered after 24 months with sub-limits."
]
}.
├── main1.py # FastAPI entrypoint
├── get_text_pdf.py # PDF text extraction logic
├── get_embeddings.py # Sentence splitting + local embedding generation
├── llm_answering.py # Groq API answering using LLaMA-3
├── pinecone_db.py # Pinecone upsert/query utilities
├── hash_text.py # SHA256 hashing for deduplication
├── .env # Store your API keys here
Create a .env file in your project root with the following content:
MY_API_KEY_FOR_AUTH=your_local_api_key
PINECONE_API_KEY=your_pinecone_key
PINECONE_INDEX_NAME=hackrx-index
GROQ_API_KEY=your_groq_api_keyEach uploaded document is hashed using SHA256. This ensures:
- ✅ You never embed the same document twice
- ✅ Pinecone stores each chunk under an ID like:
{doc_hash}-chunk-{i}
If the same PDF is uploaded again, embedding is skipped and only retrieval happens.
This app uses HuggingFace Sentence Transformers locally Fast and suitable for short to mid-length documents.
Documents are first tokenized into sentences using NLTK's sent_tokenize().
Chunks are then built using sentence boundaries (not blindly splitting by character length), improving semantic consistency.
Overlap between chunks is handled at sentence level for better context flow.
- Chunk size: ~700 characters
- Overlap: 1–2 sentences from the previous chunk
- Chunk ID:
{doc_hash}-chunk-{i}
This allows:
- 🔄 Consistent re-chunking
- 🧼 Idempotent upserts into Pinecone
All API endpoints are protected via Bearer Token. Provide this header in every request:
Authorization: Bearer your_local_api_keypip install -r requirements.txtuvicorn main1:app --reloadThen go to http://127.0.0.1:8000/docs to try the API with Swagger UI.
To clear all document vectors (dev-only):
curl -X POST http://localhost:8000//hackrx/clearAnswering is powered by:
LLaMA-3.1-70B Versatile served via Groq API — ultra low-latency inference with OpenAI-compatible API interface.
MIT License
- HuggingFace for MiniLM sentence encoders
- Groq for blazing-fast LLM inference
- Pinecone for scalable vector storage
- NLTK for sentence-aware chunking
- FastAPI for building production APIs fast