Thanks to visit codestin.com
Credit goes to github.com

Skip to content

WebRAgent is a retrieval-augmented generation (RAG) web application featuring agent-based query decomposition, vector search with Qdrant, and integration with leading LLM providers for context-rich, dynamic responses.

dkruyt/WebRAgent

Repository files navigation

πŸ” WebRAgent

A Retrieval-Augmented Generation (RAG) web application built with Flask and Qdrant.

Β© 2024 Dennis Kruyt. All rights reserved.

Introduction

WebRAgent is a powerful Retrieval-Augmented Generation system that merges Large Language Models (LLMs) with a vector database (Qdrant) to provide contextually rich answers to user queries. By offering various search modesβ€”including Collection Search for internal documents, Web Search via SearXNG, and a more comprehensive Deep Web Searchβ€”WebRAgent ensures you can find the information you need quickly and thoroughly. For more complex questions, WebRAgent’s Agent Search functionality breaks down queries into sub-problems and compiles a holistic answer. You can also visualize the relationships between concepts using the built-in Mind Map generator.

If you prefer to keep your LLM-powered workflows completely private and self-contained, you can integrate Ollama into WebRAgent. Ollama runs entirely on your local machine.

πŸ“· Screenshots

Search

Search

Context

Context

Collections

Collections

Upload

Upload

πŸ“‹ Overview

This application implements a RAG system that combines the power of Large Language Models (LLMs) with a vector database (Qdrant) to provide context-enhanced responses to user queries. It features:

  • πŸ’¬ User query interface for asking questions
  • πŸ” Admin interface for managing document collections
  • πŸ“„ Document processing and embedding
  • πŸ€– Integration with multiple LLM providers (OpenAI, Claude, Ollama)

✨ Features

Collection Search

Search within your document collections for relevant information. Simply select a specific collection from the dropdown menu to limit queries to that collection’s contents.

Web Search

Search the internet for information using SearXNG. This option fetches search results from various search engines and synthesizes them with LLMs for a comprehensive answer.

Deep Web Search

An enhanced web search that scrapes and processes the full content of web pages to extract more detailed information. This option:

  • Retrieves search results from the web
  • Scrapes the full content of each page
  • Analyzes the content to extract relevant information
  • Takes longer to process but provides more comprehensive results

Agent Search

Enhances the search process by breaking down complex questions into smaller, more focused sub-queries:

  • Analyzes your question to identify key components
  • Creates targeted sub-queries for each component
  • Processes each sub-query separately
  • Synthesizes a comprehensive answer from all results
  • Particularly useful for multi-part questions

Agent Strategies

  • Direct Decomposition: Immediately breaks your query down into sub-queries before searching
  • Informed Decomposition: First performs a preliminary search, then creates targeted follow-up queries based on initial findings

Generate Mind Map

Automatically creates a visual mind map representing the answer, helping you understand the relationships between concepts at a glance.

Number of Results

Controls how many source documents or web pages will be used to generate the answer. Increasing this number can provide a more thorough overview but may increase processing time.

Additional Highlights

  • πŸ–₯️ User Interface: A clean, intuitive interface to submit queries and receive LLM responses
  • πŸ”Ž Vector Search: Retrieve relevant document snippets based on semantic similarity
  • πŸ‘€ Admin Interface: Securely manage collections and upload documents
  • πŸ“ Document Processing: Automatically extract text, chunk, embed, and store documents
  • 🧠 Multiple LLM Support: Configure your preferred LLM provider (OpenAI, Claude, Ollama)
  • πŸ” Dynamic Embedding Models: Automatically detects and uses available embedding models from all configured providers

πŸ“‹ Prerequisites

  • 🐍 Python 3.8+
  • πŸ—„οΈ Qdrant running locally or remotely
  • πŸ”‘ API keys for your chosen LLM provider

πŸš€ Installation

πŸ’» Option 1: Local Installation

  1. Clone the repository:

    git clone https://github.com/dkruyt/WebRAgent.git
    cd WebRAgent
  2. Create and activate a virtual environment:

    python -m venv venv
    source venv/bin/activate  # On Windows: venv\Scripts\activate
  3. Install dependencies:

    pip install -r requirements.txt
  4. Copy the example environment file and configure it:

    cp .env.example .env

    Then edit the .env file with your preferred settings. For example:

    # API Keys for LLM Providers (uncomment and add your keys for the providers you want to use)
    # At least one provider should be configured
    #OPENAI_API_KEY=your_openai_api_key_here
    #CLAUDE_API_KEY=your_claude_api_key_here
    
    # Ollama Configuration (uncomment to use Ollama)
    #OLLAMA_HOST=http://localhost:11434
    
    # Qdrant Configuration
    QDRANT_HOST=localhost
    QDRANT_PORT=6333
    
    # SearXNG Configuration
    SEARXNG_URL=http://searxng:8080
    
    # Flask Secret Key (generate a secure random key for production)
    FLASK_SECRET_KEY=change_me_in_production
    
    # Admin User Configuration
    ADMIN_USERNAME=admin
    ADMIN_PASSWORD=change_me_in_production
    

    The system will automatically detect and use models from the providers you've configured. For example:

    • If OPENAI_API_KEY is set, it will use OpenAI models for both LLM and embeddings.
    • If CLAUDE_API_KEY is set, it will use Claude models for LLM.
    • If OLLAMA_HOST is set, it will use Ollama models for both LLM and embeddings.
    • Sentence Transformers will be used as a fallback embedding model.
  5. Ensure Qdrant is running locally or specify a remote instance in the .env file.

  6. If using Ollama, make sure it’s running locally or specify the remote instance in the .env file.

  7. Start the application:

    python run.py
  8. Access the application at http://localhost:5000.

🐳 Option 2: Docker Installation

  1. Clone the repository:

    git clone https://github.com/dkruyt/WebRAgent.git
    cd WebRAgent
  2. Start the application with Docker Compose:

    docker-compose up -d
  3. The following services will be available:

    • 🌐 RAG Web Application: http://localhost:5000
    • πŸ“Š Qdrant Dashboard: http://localhost:6333/dashboard
    • πŸ” SearXNG Search Engine: http://localhost:8080
  4. To shut down the application:

    docker-compose down

πŸ“₯ Pre-downloading Ollama Models

If you want to pre-download Ollama models before starting the application:

# For main LLM models
ollama pull llama2
ollama pull mistral
ollama pull gemma

# For embedding models
ollama pull nomic-embed-text
ollama pull all-minilm

The system will automatically detect these models if they're available in your Ollama installation.

πŸ“– Usage

πŸ” User Interface

  1. Navigate to the home page (http://localhost:5000).
  2. Choose your search method:
    • Collection Search: Select a collection from the dropdown menu
    • Web Search: Toggle the β€œWeb Search” option
    • Deep Web Search: Toggle β€œDeep Web Search” if you need to scrape and analyze full page contents
  3. Enter your query in the text box.
  4. Configure additional options (optional):
    • Generate Mind Map: Visualize concepts related to your query
    • Agent Search: Enable for complex queries; pick a strategy (Direct or Informed Decomposition)
    • Number of Results: Adjust how many results to retrieve
  5. Submit your query and view the response.
  6. Explore source documents or web sources that informed the answer.

🌐 Web Search

  1. Toggle the β€œWeb Search” or β€œDeep Web Search” option on the main interface.
  2. Enter your query.
  3. The system will:
    • Search the web using SearXNG.
    • Optionally scrape and analyze page content (Deep Web Search).
    • Use an LLM to interpret and synthesize the findings.
    • Present a comprehensive answer along with source links.

πŸ€– Agent Search

  1. Enable the β€œAgent Search” checkbox.
  2. Choose a strategy:
    • Direct Decomposition: Breaks down your question into sub-queries immediately.
    • Informed Decomposition: Performs a preliminary search, then refines sub-queries based on initial results.
  3. Submit your query to receive a comprehensive answer assembled from multiple targeted searches.

πŸ‘€ Admin Interface

  1. Login with admin credentials (specified in your .env file).
  2. Create new collections from the admin dashboard.
  3. Upload documents to collections.
  4. Documents are automatically processed and made available for retrieval in user queries.

πŸ› οΈ Technical Implementation

  • 🌐 Flask: Web framework for the application
  • πŸ—„οΈ Qdrant: Vector database for storing and retrieving document embeddings
  • πŸ” SearXNG: Self-hosted search engine for web search capabilities
  • πŸ€– Agent Framework: Custom implementation for query decomposition and result synthesis
  • 🧠 Mind Map Generation: Visualization of query responses and related concepts
  • πŸ”€ Embedding Models:
    • SentenceTransformers: Local embedding models (fallback)
    • OpenAI Embeddings: High-quality embeddings when API key is set
    • Ollama Embeddings: Local embeddings when Ollama is configured
  • πŸ”Œ Model Management: Dynamic provider detection and configuration based on environment variables
  • πŸ” Flask-Login: For admin authentication
  • πŸ“š Python Libraries: For document processing (PyPDF2, BeautifulSoup, etc.)
  • πŸ“„ Docling: Advanced document processing for text extraction in various file formats

πŸ“Š System Architecture

Below are detailed flowcharts of WebRAgent's key workflows and components.

System Overview

The following diagram shows the high-level architecture of WebRAgent, illustrating how all components interact with each other and external systems:

WebRAgent System Overview

Document Upload and Processing

This workflow shows how documents are uploaded, processed, chunked, and stored in both MongoDB and the Qdrant vector database:

Document Upload Workflow

Text Extraction and Chunking

This diagram illustrates how text is extracted from various document formats and chunked for optimal retrieval:

Text Extraction Workflow

Vector Embedding and Storage

This shows how document chunks are embedded and stored in the vector database:

Vector Embedding Workflow

RAG Query Processing

This diagram details how user queries are processed in the standard RAG workflow:

RAG Query Workflow

Agent Search Workflow

This shows how complex queries are decomposed and processed by the agent search feature:

Agent Search Workflow

Web Search Workflow

This diagram explains how web searches are processed:

Web Search Workflow

Chat Workflow

This illustrates how chat sessions are managed and processed:

Chat Workflow

πŸ“‚ Project Structure

WebRAgent/
β”œβ”€β”€ app/
β”‚   β”œβ”€β”€ models/             # Data models
β”‚   β”‚   β”œβ”€β”€ chat.py         # Chat session models
β”‚   β”‚   β”œβ”€β”€ collection.py   # Document collection models
β”‚   β”‚   β”œβ”€β”€ document.py     # Document models and metadata
β”‚   β”‚   └── user.py         # User authentication models
β”‚   β”œβ”€β”€ routes/             # Route handlers
β”‚   β”‚   β”œβ”€β”€ admin.py        # Admin interface routes
β”‚   β”‚   β”œβ”€β”€ auth.py         # Authentication routes
β”‚   β”‚   β”œβ”€β”€ chat.py         # Chat interface routes
β”‚   β”‚   └── main.py         # Main application routes
β”‚   β”œβ”€β”€ services/           # Business logic
β”‚   β”‚   β”œβ”€β”€ agent_search_service.py     # Query decomposition and agent search
β”‚   β”‚   β”œβ”€β”€ chat_service.py             # Chat session management
β”‚   β”‚   β”œβ”€β”€ claude_service.py           # Anthropic Claude integration
β”‚   β”‚   β”œβ”€β”€ document_service.py         # Document processing
β”‚   β”‚   β”œβ”€β”€ llm_service.py              # LLM provider abstraction
β”‚   β”‚   β”œβ”€β”€ mindmap_service.py          # Mind map generation
β”‚   β”‚   β”œβ”€β”€ model_service.py            # Dynamic model management
β”‚   β”‚   β”œβ”€β”€ ollama_service.py           # Ollama integration
β”‚   β”‚   β”œβ”€β”€ openai_service.py           # OpenAI integration
β”‚   β”‚   β”œβ”€β”€ qdrant_service.py           # Vector database operations
β”‚   β”‚   β”œβ”€β”€ rag_service.py              # Core RAG functionality
β”‚   β”‚   β”œβ”€β”€ searxng_service.py          # Web search integration
β”‚   β”‚   └── web_search_agent_service.py # Web search with agent capabilities
β”‚   β”œβ”€β”€ static/             # CSS, JS, and other static files
β”‚   β”œβ”€β”€ templates/          # Jinja2 templates
β”‚   └── __init__.py         # Flask application factory
β”œβ”€β”€ data/                   # Created at runtime for data storage
β”‚   β”œβ”€β”€ collections/        # Collection metadata storage
β”‚   β”œβ”€β”€ documents/          # Document metadata storage
β”‚   β”œβ”€β”€ models/             # Model configuration storage
β”‚   β”‚   β”œβ”€β”€ config.json     # Dynamic model configuration
β”‚   β”‚   └── dimensions.json # Embedding model dimensions
β”‚   └── uploads/            # Uploaded document files
β”œβ”€β”€ searxng/                # SearXNG configuration
β”œβ”€β”€ .dockerignore           # Files to exclude from Docker build
β”œβ”€β”€ .env                    # Environment variables
β”œβ”€β”€ .env.example            # Example environment file
β”œβ”€β”€ .gitignore              # Git ignore patterns
β”œβ”€β”€ docker-compose.yml      # Docker Compose config
β”œβ”€β”€ docker-compose.gpu.yml  # Docker Compose config with GPU support
β”œβ”€β”€ Dockerfile              # Docker build instructions
β”œβ”€β”€ requirements.txt        # Project dependencies
β”œβ”€β”€ README.md               # Project documentation
└── run.py                  # Application entry point

πŸ”’ Security Notes

  • πŸ›‘οΈ In a production environment, use a proper database with password hashing
  • πŸ” Configure HTTPS for secure communication
  • πŸ”‘ Set a strong, unique FLASK_SECRET_KEY
  • 🚫 Do not expose admin routes to the public internet without proper security measures

πŸ“œ License

MIT

πŸ“ Copyright

Β© 2024 Dennis Kruyt. All rights reserved.

πŸ™ Acknowledgements

About

WebRAgent is a retrieval-augmented generation (RAG) web application featuring agent-based query decomposition, vector search with Qdrant, and integration with leading LLM providers for context-rich, dynamic responses.

Resources

Stars

Watchers

Forks

Packages