Retrieval-Augmented Generation (RAG) API

A lightweight, fast, and secure RAG (Retrieval-Augmented Generation) system built using FastAPI, Qdrant, and a local LLaMA 2 model. This project demonstrates how to augment LLM responses with document-based context for improved accuracy and relevance.

Features

Vector-based document storage and retrieval using Qdrant
Local LLaMA 2 integration using llama-cpp-python
Basic API key protection for secure access
Sentence Transformers for text embeddings
FastAPI fro the web interface
Simple and modular architecture
CLI and cURL testing support
Logging for observability

Architecture Overview

[ User Query ]
      ↓
 [ /ask API ] 
      ↓
[ Qdrant Retriever ] ← [ Embedded Document Chunks ]
      ↓
[ LLaMA 2 Generator ]
      ↓
[ Response to User ]

Installation

Install basic requirements

pip install fastapi==0.110.0 uvicorn==0.29.0

Install sentence-transformers

pip install sentence-transformers==2.5.1

Install PyTorch (CPU version)

pip install torch==2.2.2 --index-url https://download.pytorch.org/whl/cpu

Install transformers
```
pip install transformers==4.40.1
```
Install Qdrant client
```
pip install qdrant-client==1.7.0
```

Install remaining dependencies

pip install python-dotenv==1.0.1 requests==2.31.0

Install llama-cpp-python with Metal support

CMAKE_ARGS="-DLLAMA_METAL=on" pip install llama-cpp-python==0.2.24

Usage

initialize the Database:
```
python scripts/process_documents.py
```
Start the Server:
```
uvicorn app.main:app --reload
```

Test the API:

Using curl:

 curl -X 'GET' \
    'http://localhost:8000/ask?question=What%20is%20the%20opportunity%20cost%3F' \
    -H 'accept: application/json' \
    -H 'x-api-key: a9oH2PDkGTFSX6aYSHvyswtDKz9HYYWsM2DmnW-8qGk'

 Response body
 {
    "question": "What is the opportunity cost?",
    "answer": "The opportunity cost is what you give up by choosing one option over another. In this case, it's the amount of money that could have been earned through compound interest if the triplets had left their allowances in a savings account instead of giving them to their grandma for safekeeping."
 }

Using the Swagger UI:
- Open http://localhost:8000/docs in your browser
- Click on Authorize, enter the api key and save
- Click on the /ask endpoint
- Click "Try it out"
- Enter your question and API key
- Click "Execute"

Configuration

API Key:
- Set your API key in api_key_middleware.py
- For production, use environment variables
LLaMA Model:
- Update the model path in llm.py
- Ensure you have the correct model file
Qdrant Storage:
- Default: in-memory storage
- For persistence, update the client initialization in retriever.py

Performance Considerations

The system uses CPU-based inference by default
Response times may vary based on:
- Hardware specifications
- Number of documents in the database
- Complexity of queries
For better performance:
- Use GPU if available (modify llama-cpp-python installation)
- Adjust the number of retrieved documents (k parameter)
- Consider using a smaller LLaMA model variant

Document Processing

Documents are processed in chunks for better context management
Embeddings are generated once during document addition
Supported document formats:
- Text files (.txt)
- More formats can be added by extending the document processor
Document chunks are stored in Qdrant with their embeddings

Troubleshooting

API Key Issues:
- Ensure API key is set in environment variables
- Check API key format in requests
- Verify API key middleware is properly configured
Model Loading Issues:
- Verify model file exists in the correct location
- Check model file format (.gguf)
- Ensure sufficient system memory
Database Issues:
- Check Qdrant data directory permissions
- Verify document processing completed successfully
- Check database connection settings

Development

Adding New Documents:

# Add your document to the project
# Then run the document processor
python scripts/process_documents.py

Modifying the System:
- Document embeddings: Modify app/retriever.py
- LLM settings: Update app/llm.py
- API endpoints: Edit app/main.py
- Security: Configure app/api_key_middleware.py
Testing:
- Use the provided test scripts in the test/ directory
- Run API tests: python test/test_query.py
- Check logs for debugging information

Some example querries

who receives the most money in interest?
what is opportunity cost?
what should people compare before they make a trade off?
what is simple interest?
what is compound interest?
What is the difference between simple and compound interest in the story?
How much money will Diane have for the vacation?
What is Brian's opportunity cost in the story?
What lesson does the story teach about financial decisions?
What is Python?
What is Qdrant?

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
app		app
scripts		scripts
test		test
README.md		README.md
opportunity_cost.txt		opportunity_cost.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Retrieval-Augmented Generation (RAG) API

Features

Architecture Overview

Installation

Usage

Configuration

Performance Considerations

Document Processing

Troubleshooting

Development

About

Uh oh!

Releases

Packages

Uh oh!

Languages

praria/ArcaneApp

Folders and files

Latest commit

History

Repository files navigation

Retrieval-Augmented Generation (RAG) API

Features

Architecture Overview

Installation

Usage

Configuration

Performance Considerations

Document Processing

Troubleshooting

Development

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages