Thanks to visit codestin.com
Credit goes to github.com

Skip to content

praria/ArcaneApp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Retrieval-Augmented Generation (RAG) API


A lightweight, fast, and secure RAG (Retrieval-Augmented Generation) system built using FastAPI, Qdrant, and a local LLaMA 2 model. This project demonstrates how to augment LLM responses with document-based context for improved accuracy and relevance.

Features


  • Vector-based document storage and retrieval using Qdrant
  • Local LLaMA 2 integration using llama-cpp-python
  • Basic API key protection for secure access
  • Sentence Transformers for text embeddings
  • FastAPI fro the web interface
  • Simple and modular architecture
  • CLI and cURL testing support
  • Logging for observability

Architecture Overview


[ User Query ]
      ↓
 [ /ask API ] 
      ↓
[ Qdrant Retriever ] ← [ Embedded Document Chunks ]
      ↓
[ LLaMA 2 Generator ]
      ↓
[ Response to User ]

Installation


  1. Install basic requirements

    pip install fastapi==0.110.0 uvicorn==0.29.0
  2. Install sentence-transformers

    pip install sentence-transformers==2.5.1
  3. Install PyTorch (CPU version)

    pip install torch==2.2.2 --index-url https://download.pytorch.org/whl/cpu
  4. Install transformers

    pip install transformers==4.40.1
  5. Install Qdrant client

    pip install qdrant-client==1.7.0
  6. Install remaining dependencies

    pip install python-dotenv==1.0.1 requests==2.31.0
  7. Install llama-cpp-python with Metal support

    CMAKE_ARGS="-DLLAMA_METAL=on" pip install llama-cpp-python==0.2.24

Usage


  1. initialize the Database:

    python scripts/process_documents.py
  2. Start the Server:

    uvicorn app.main:app --reload
  3. Test the API:

    • Using curl:
       curl -X 'GET' \
          'http://localhost:8000/ask?question=What%20is%20the%20opportunity%20cost%3F' \
          -H 'accept: application/json' \
          -H 'x-api-key: a9oH2PDkGTFSX6aYSHvyswtDKz9HYYWsM2DmnW-8qGk'
      
       Response body
       {
          "question": "What is the opportunity cost?",
          "answer": "The opportunity cost is what you give up by choosing one option over another. In this case, it's the amount of money that could have been earned through compound interest if the triplets had left their allowances in a savings account instead of giving them to their grandma for safekeeping."
       }
    • Using the Swagger UI:
      • Open http://localhost:8000/docs in your browser
      • Click on Authorize, enter the api key and save
      • Click on the /ask endpoint
      • Click "Try it out"
      • Enter your question and API key
      • Click "Execute"

Configuration

  1. API Key:

    • Set your API key in api_key_middleware.py
    • For production, use environment variables
  2. LLaMA Model:

    • Update the model path in llm.py
    • Ensure you have the correct model file
  3. Qdrant Storage:

    • Default: in-memory storage
    • For persistence, update the client initialization in retriever.py

Performance Considerations


  • The system uses CPU-based inference by default
  • Response times may vary based on:
    • Hardware specifications
    • Number of documents in the database
    • Complexity of queries
  • For better performance:
    • Use GPU if available (modify llama-cpp-python installation)
    • Adjust the number of retrieved documents (k parameter)
    • Consider using a smaller LLaMA model variant

Document Processing


  • Documents are processed in chunks for better context management
  • Embeddings are generated once during document addition
  • Supported document formats:
    • Text files (.txt)
    • More formats can be added by extending the document processor
  • Document chunks are stored in Qdrant with their embeddings

Troubleshooting


  1. API Key Issues:

    • Ensure API key is set in environment variables
    • Check API key format in requests
    • Verify API key middleware is properly configured
  2. Model Loading Issues:

    • Verify model file exists in the correct location
    • Check model file format (.gguf)
    • Ensure sufficient system memory
  3. Database Issues:

    • Check Qdrant data directory permissions
    • Verify document processing completed successfully
    • Check database connection settings

Development


  1. Adding New Documents:

    # Add your document to the project
    # Then run the document processor
    python scripts/process_documents.py
  2. Modifying the System:

    • Document embeddings: Modify app/retriever.py
    • LLM settings: Update app/llm.py
    • API endpoints: Edit app/main.py
    • Security: Configure app/api_key_middleware.py
  3. Testing:

    • Use the provided test scripts in the test/ directory
    • Run API tests: python test/test_query.py
    • Check logs for debugging information

Some example querries


  • who receives the most money in interest?

  • what is opportunity cost?

  • what should people compare before they make a trade off?

  • what is simple interest?

  • what is compound interest?

  • What is the difference between simple and compound interest in the story?

  • How much money will Diane have for the vacation?

  • What is Brian's opportunity cost in the story?

  • What lesson does the story teach about financial decisions?

  • What is Python?

  • What is Qdrant?

About

A working prototype of a basic Retrieval-Augmented Generation (RAG) API

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages