A Retrieval-Augmented Generation (RAG) web application built with Flask and Qdrant.
Β© 2024 Dennis Kruyt. All rights reserved.
WebRAgent is a powerful Retrieval-Augmented Generation system that merges Large Language Models (LLMs) with a vector database (Qdrant) to provide contextually rich answers to user queries. By offering various search modesβincluding Collection Search for internal documents, Web Search via SearXNG, and a more comprehensive Deep Web SearchβWebRAgent ensures you can find the information you need quickly and thoroughly. For more complex questions, WebRAgentβs Agent Search functionality breaks down queries into sub-problems and compiles a holistic answer. You can also visualize the relationships between concepts using the built-in Mind Map generator.
If you prefer to keep your LLM-powered workflows completely private and self-contained, you can integrate Ollama into WebRAgent. Ollama runs entirely on your local machine.
This application implements a RAG system that combines the power of Large Language Models (LLMs) with a vector database (Qdrant) to provide context-enhanced responses to user queries. It features:
- π¬ User query interface for asking questions
- π Admin interface for managing document collections
- π Document processing and embedding
- π€ Integration with multiple LLM providers (OpenAI, Claude, Ollama)
Search within your document collections for relevant information. Simply select a specific collection from the dropdown menu to limit queries to that collectionβs contents.
Search the internet for information using SearXNG. This option fetches search results from various search engines and synthesizes them with LLMs for a comprehensive answer.
An enhanced web search that scrapes and processes the full content of web pages to extract more detailed information. This option:
- Retrieves search results from the web
- Scrapes the full content of each page
- Analyzes the content to extract relevant information
- Takes longer to process but provides more comprehensive results
Enhances the search process by breaking down complex questions into smaller, more focused sub-queries:
- Analyzes your question to identify key components
- Creates targeted sub-queries for each component
- Processes each sub-query separately
- Synthesizes a comprehensive answer from all results
- Particularly useful for multi-part questions
- Direct Decomposition: Immediately breaks your query down into sub-queries before searching
- Informed Decomposition: First performs a preliminary search, then creates targeted follow-up queries based on initial findings
Automatically creates a visual mind map representing the answer, helping you understand the relationships between concepts at a glance.
Controls how many source documents or web pages will be used to generate the answer. Increasing this number can provide a more thorough overview but may increase processing time.
- π₯οΈ User Interface: A clean, intuitive interface to submit queries and receive LLM responses
- π Vector Search: Retrieve relevant document snippets based on semantic similarity
- π€ Admin Interface: Securely manage collections and upload documents
- π Document Processing: Automatically extract text, chunk, embed, and store documents
- π§ Multiple LLM Support: Configure your preferred LLM provider (OpenAI, Claude, Ollama)
- π Dynamic Embedding Models: Automatically detects and uses available embedding models from all configured providers
- π Python 3.8+
- ποΈ Qdrant running locally or remotely
- π API keys for your chosen LLM provider
-
Clone the repository:
git clone https://github.com/dkruyt/WebRAgent.git cd WebRAgent -
Create and activate a virtual environment:
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install dependencies:
pip install -r requirements.txt
-
Copy the example environment file and configure it:
cp .env.example .env
Then edit the
.envfile with your preferred settings. For example:# API Keys for LLM Providers (uncomment and add your keys for the providers you want to use) # At least one provider should be configured #OPENAI_API_KEY=your_openai_api_key_here #CLAUDE_API_KEY=your_claude_api_key_here # Ollama Configuration (uncomment to use Ollama) #OLLAMA_HOST=http://localhost:11434 # Qdrant Configuration QDRANT_HOST=localhost QDRANT_PORT=6333 # SearXNG Configuration SEARXNG_URL=http://searxng:8080 # Flask Secret Key (generate a secure random key for production) FLASK_SECRET_KEY=change_me_in_production # Admin User Configuration ADMIN_USERNAME=admin ADMIN_PASSWORD=change_me_in_productionThe system will automatically detect and use models from the providers you've configured. For example:
- If
OPENAI_API_KEYis set, it will use OpenAI models for both LLM and embeddings. - If
CLAUDE_API_KEYis set, it will use Claude models for LLM. - If
OLLAMA_HOSTis set, it will use Ollama models for both LLM and embeddings. - Sentence Transformers will be used as a fallback embedding model.
- If
-
Ensure Qdrant is running locally or specify a remote instance in the
.envfile. -
If using Ollama, make sure itβs running locally or specify the remote instance in the
.envfile. -
Start the application:
python run.py
-
Access the application at
http://localhost:5000.
-
Clone the repository:
git clone https://github.com/dkruyt/WebRAgent.git cd WebRAgent -
Start the application with Docker Compose:
docker-compose up -d
-
The following services will be available:
- π RAG Web Application:
http://localhost:5000 - π Qdrant Dashboard:
http://localhost:6333/dashboard - π SearXNG Search Engine:
http://localhost:8080
- π RAG Web Application:
-
To shut down the application:
docker-compose down
If you want to pre-download Ollama models before starting the application:
# For main LLM models
ollama pull llama2
ollama pull mistral
ollama pull gemma
# For embedding models
ollama pull nomic-embed-text
ollama pull all-minilmThe system will automatically detect these models if they're available in your Ollama installation.
- Navigate to the home page (
http://localhost:5000). - Choose your search method:
- Collection Search: Select a collection from the dropdown menu
- Web Search: Toggle the βWeb Searchβ option
- Deep Web Search: Toggle βDeep Web Searchβ if you need to scrape and analyze full page contents
- Enter your query in the text box.
- Configure additional options (optional):
- Generate Mind Map: Visualize concepts related to your query
- Agent Search: Enable for complex queries; pick a strategy (Direct or Informed Decomposition)
- Number of Results: Adjust how many results to retrieve
- Submit your query and view the response.
- Explore source documents or web sources that informed the answer.
- Toggle the βWeb Searchβ or βDeep Web Searchβ option on the main interface.
- Enter your query.
- The system will:
- Search the web using SearXNG.
- Optionally scrape and analyze page content (Deep Web Search).
- Use an LLM to interpret and synthesize the findings.
- Present a comprehensive answer along with source links.
- Enable the βAgent Searchβ checkbox.
- Choose a strategy:
- Direct Decomposition: Breaks down your question into sub-queries immediately.
- Informed Decomposition: Performs a preliminary search, then refines sub-queries based on initial results.
- Submit your query to receive a comprehensive answer assembled from multiple targeted searches.
- Login with admin credentials (specified in your
.envfile). - Create new collections from the admin dashboard.
- Upload documents to collections.
- Documents are automatically processed and made available for retrieval in user queries.
- π Flask: Web framework for the application
- ποΈ Qdrant: Vector database for storing and retrieving document embeddings
- π SearXNG: Self-hosted search engine for web search capabilities
- π€ Agent Framework: Custom implementation for query decomposition and result synthesis
- π§ Mind Map Generation: Visualization of query responses and related concepts
- π€ Embedding Models:
- SentenceTransformers: Local embedding models (fallback)
- OpenAI Embeddings: High-quality embeddings when API key is set
- Ollama Embeddings: Local embeddings when Ollama is configured
- π Model Management: Dynamic provider detection and configuration based on environment variables
- π Flask-Login: For admin authentication
- π Python Libraries: For document processing (PyPDF2, BeautifulSoup, etc.)
- π Docling: Advanced document processing for text extraction in various file formats
Below are detailed flowcharts of WebRAgent's key workflows and components.
The following diagram shows the high-level architecture of WebRAgent, illustrating how all components interact with each other and external systems:
This workflow shows how documents are uploaded, processed, chunked, and stored in both MongoDB and the Qdrant vector database:
This diagram illustrates how text is extracted from various document formats and chunked for optimal retrieval:
This shows how document chunks are embedded and stored in the vector database:
This diagram details how user queries are processed in the standard RAG workflow:
This shows how complex queries are decomposed and processed by the agent search feature:
This diagram explains how web searches are processed:
This illustrates how chat sessions are managed and processed:
WebRAgent/
βββ app/
β βββ models/ # Data models
β β βββ chat.py # Chat session models
β β βββ collection.py # Document collection models
β β βββ document.py # Document models and metadata
β β βββ user.py # User authentication models
β βββ routes/ # Route handlers
β β βββ admin.py # Admin interface routes
β β βββ auth.py # Authentication routes
β β βββ chat.py # Chat interface routes
β β βββ main.py # Main application routes
β βββ services/ # Business logic
β β βββ agent_search_service.py # Query decomposition and agent search
β β βββ chat_service.py # Chat session management
β β βββ claude_service.py # Anthropic Claude integration
β β βββ document_service.py # Document processing
β β βββ llm_service.py # LLM provider abstraction
β β βββ mindmap_service.py # Mind map generation
β β βββ model_service.py # Dynamic model management
β β βββ ollama_service.py # Ollama integration
β β βββ openai_service.py # OpenAI integration
β β βββ qdrant_service.py # Vector database operations
β β βββ rag_service.py # Core RAG functionality
β β βββ searxng_service.py # Web search integration
β β βββ web_search_agent_service.py # Web search with agent capabilities
β βββ static/ # CSS, JS, and other static files
β βββ templates/ # Jinja2 templates
β βββ __init__.py # Flask application factory
βββ data/ # Created at runtime for data storage
β βββ collections/ # Collection metadata storage
β βββ documents/ # Document metadata storage
β βββ models/ # Model configuration storage
β β βββ config.json # Dynamic model configuration
β β βββ dimensions.json # Embedding model dimensions
β βββ uploads/ # Uploaded document files
βββ searxng/ # SearXNG configuration
βββ .dockerignore # Files to exclude from Docker build
βββ .env # Environment variables
βββ .env.example # Example environment file
βββ .gitignore # Git ignore patterns
βββ docker-compose.yml # Docker Compose config
βββ docker-compose.gpu.yml # Docker Compose config with GPU support
βββ Dockerfile # Docker build instructions
βββ requirements.txt # Project dependencies
βββ README.md # Project documentation
βββ run.py # Application entry point
- π‘οΈ In a production environment, use a proper database with password hashing
- π Configure HTTPS for secure communication
- π Set a strong, unique
FLASK_SECRET_KEY - π« Do not expose admin routes to the public internet without proper security measures
MIT
Β© 2024 Dennis Kruyt. All rights reserved.