A Retrieval-Augmented Generation (RAG) web application built with Flask and Qdrant.
Β© 2024 Dennis Kruyt. All rights reserved.
This application implements a RAG system that combines the power of Large Language Models (LLMs) with a vector database (Qdrant) to provide context-enhanced responses to user queries. It features:
- π¬ User query interface for asking questions
- π Admin interface for managing document collections
- π Document processing and embedding
- π€ Integration with multiple LLM providers (OpenAI, Claude, Ollama)
- π₯οΈ User Interface: Clean, intuitive interface to submit queries and receive LLM responses
- π Web Search: Search the web directly using SearXNG integration with LLM result interpretation
- π€ Agent Search: Break down complex questions into sub-queries for more comprehensive answers
- π§ Mind Maps: Visualize response concepts with automatically generated mind maps
- π Vector Search: Retrieve relevant document snippets based on semantic similarity
- π€ Admin Interface: Securely manage collections and upload documents
- π Document Processing: Automatically extract text, chunk, embed, and store documents
- π§ Multiple LLM Support: Configure your preferred LLM provider (OpenAI, Claude, Ollama)
- π Dynamic Embedding Models: Automatically detects and uses available embedding models from all configured providers
- π Python 3.8+
- ποΈ Qdrant running locally or remotely
- π API keys for your chosen LLM provider
-
Clone the repository:
git clone https://github.com/dkruyt/WebRAgent.git cd WebRAgent -
Create and activate a virtual environment:
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install dependencies:
pip install -r requirements.txt
-
Copy the example environment file and configure it with your settings:
cp .env.example .env
Then edit the
.envfile with your preferred settings. Here are the key settings to configure:# API Keys for LLM Providers (uncomment and add your keys for the providers you want to use) # At least one provider should be configured #OPENAI_API_KEY=your_openai_api_key_here #CLAUDE_API_KEY=your_claude_api_key_here # Ollama Configuration (uncomment to use Ollama) #OLLAMA_HOST=http://localhost:11434 # Qdrant Configuration QDRANT_HOST=localhost QDRANT_PORT=6333 # SearXNG Configuration SEARXNG_URL=http://searxng:8080 # Flask Secret Key (generate a secure random key for production) FLASK_SECRET_KEY=change_me_in_production # Admin User Configuration ADMIN_USERNAME=admin ADMIN_PASSWORD=change_me_in_productionNote: The system will automatically detect and use models from the providers you've configured:
- If you set OPENAI_API_KEY, it will use OpenAI models for both LLM and embeddings
- If you set CLAUDE_API_KEY, it will use Claude models for LLM
- If you set OLLAMA_HOST, it will use Ollama models for both LLM and embeddings
- Sentence Transformers will be used as fallback embedding models
There's no need to manually specify which models to use - the system dynamically detects available models.
-
Make sure you have Qdrant running locally or specify a remote instance in the
.envfile. -
If using Ollama, make sure it's running locally or specify a remote instance in the
.envfile. -
Start the application:
python run.py
-
Access the application at
http://localhost:5000
This project includes Docker and Docker Compose configurations for easy deployment.
-
Clone the repository:
git clone https://github.com/dkruyt/WebRAgent.git cd WebRAgent -
Start the application with Docker Compose:
docker-compose up -d
-
The following services will be available:
- π RAG Web Application: http://localhost:5000
- π Qdrant Dashboard: http://localhost:6333/dashboard
- π SearXNG Search Engine: http://localhost:8080
-
To shut down the application:
docker-compose down
If you want to pre-download the Ollama models before starting the application:
# For main LLM models
ollama pull llama2
ollama pull mistral
ollama pull gemma
# For embedding models
ollama pull nomic-embed-text
ollama pull all-minilmThe system will automatically detect these models if they're available in your Ollama installation.
- Navigate to the home page
- Choose your search method:
- Document Search: Select a collection from the dropdown
- Web Search: Toggle the "Web Search" option
- Enter your query in the text box
- Configure additional options (optional):
- Generate Mind Map: Toggle to visualize concepts related to your query
- Agent Search: Enable for complex questions that benefit from being broken down
- Number of Results: Adjust how many results to retrieve
- Submit your query and view the response
- Explore source documents or web sources that informed the answer
- Toggle the "Web Search" option on the main interface
- Enter your query
- The system will:
- Search the web using SearXNG
- Use an LLM to interpret and synthesize the search results
- Present a comprehensive answer along with source links
- Enable the "Agent Search" checkbox
- Choose a strategy:
- Direct Decomposition: Breaks down your question into targeted sub-queries
- Informed Decomposition: Gets initial results first, then creates follow-up queries
- Submit your query to receive a comprehensive answer synthesized from multiple search operations
- Login with admin credentials (default: username
admin, passwordadmin123) - Create new collections from the admin dashboard
- Upload documents to collections
- Documents are automatically processed and made available for retrieval
- π Flask: Web framework for the application
- ποΈ Qdrant: Vector database for storing and retrieving document embeddings
- π SearXNG: Self-hosted search engine for web search capabilities
- π€ Agent Framework: Custom implementation for query decomposition and synthesis
- π§ Mind Map Generation: Visualization system for query responses
- π€ Embedding Models:
- SentenceTransformers: Local embedding models (always available as fallback)
- OpenAI Embeddings: High-quality embeddings when API key is configured
- Ollama Embeddings: Local embedding models when Ollama is configured
- π Model Management: Dynamic provider detection and configuration based on available environment variables
- π Flask-Login: For admin authentication
- π Python Libraries: For document processing (PyPDF2, BeautifulSoup, etc.)
- π Docling: Advanced document processing capability for extracting text from various file formats
WebRAgent/
βββ app/
β βββ models/ # Data models
β β βββ chat.py # Chat session models
β β βββ collection.py # Document collection models
β β βββ document.py # Document models and metadata
β β βββ user.py # User authentication models
β βββ routes/ # Route handlers
β β βββ admin.py # Admin interface routes
β β βββ auth.py # Authentication routes
β β βββ chat.py # Chat interface routes
β β βββ main.py # Main application routes
β βββ services/ # Business logic
β β βββ agent_search_service.py # Query decomposition and agent search
β β βββ chat_service.py # Chat session management
β β βββ claude_service.py # Anthropic Claude integration
β β βββ document_service.py # Document processing
β β βββ llm_service.py # LLM provider abstraction
β β βββ mindmap_service.py # Mind map generation
β β βββ model_service.py # Dynamic model management
β β βββ ollama_service.py # Ollama integration
β β βββ openai_service.py # OpenAI integration
β β βββ qdrant_service.py # Vector database operations
β β βββ rag_service.py # Core RAG functionality
β β βββ searxng_service.py # Web search integration
β β βββ web_search_agent_service.py # Web search with agent capabilities
β βββ static/ # CSS, JS, and other static files
β βββ templates/ # Jinja2 templates
β βββ __init__.py # Flask application factory
βββ data/ # Created at runtime for data storage
β βββ collections/ # Collection metadata storage
β βββ documents/ # Document metadata storage
β βββ models/ # Model configuration storage
β β βββ config.json # Dynamic model configuration
β β βββ dimensions.json # Embedding model dimensions
β βββ uploads/ # Uploaded document files
βββ searxng/ # SearXNG configuration
βββ .dockerignore # Files to exclude from Docker build
βββ .env # Environment variables
βββ .env.example # Example environment file
βββ .gitignore # Git ignore patterns
βββ docker-compose.yml # Docker Compose config
βββ docker-compose.gpu.yml # Docker Compose config with GPU support
βββ Dockerfile # Docker build instructions
βββ requirements.txt # Project dependencies
βββ README.md # Project documentation
βββ run.py # Application entry point
β οΈ This application uses a simple in-memory user store for demo purposes- π‘οΈ In a production environment, use a proper database with password hashing
- π Configure HTTPS for secure communication
- π Set a strong, unique
FLASK_SECRET_KEY - π« Do not expose admin routes to the public internet without proper security
MIT
Β© 2024 Dennis Kruyt. All rights reserved.