Thanks to visit codestin.com
Credit goes to github.com

Skip to content

ParijatSoftware/ChatWithDocs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Document Chat & Summary Application

A RAG-based application that allows you to upload documents, chat with them using natural language, and generate intelligent summaries.

🚀 Quick Start

Prerequisites

  • Python 3.9+ (recommended: Python 3.11)

Option 1: One-Command Setup (Recommended)

  1. Clone the repository:

    git clone <your-repo-url>
    cd ChatWithDocs
  2. Make the run script executable:

    chmod +x run.sh
  3. Run the application:

    ./run.sh

That's it! The script will:

  • ✅ Create a virtual environment
  • ✅ Install all dependencies
  • ✅ Start the backend API server
  • ✅ Launch the Streamlit frontend
  • ✅ Open your browser automatically

Option 2: Manual Setup

If you prefer to set up manually or encounter issues with the script:

Step 1: Create Virtual Environment

# Create virtual environment
python -m venv venv

# Activate it
# On macOS/Linux:
source venv/bin/activate
# On Windows:
venv\Scripts\activate

Step 2: Install Dependencies

# Install backend dependencies
cd backend
pip install -r requirements.txt

# Install frontend dependencies
cd ../frontend
pip install -r requirements.txt

Step 3: Start Services

Terminal 1 - Backend:

cd backend
python main.py

Terminal 2 - Frontend:

cd frontend
streamlit run app.py

🔧 Configuration

LLM Provider Setup

Choose one of the following options:

Option A: OpenAI (Recommended for best results)

  1. Get an API key from OpenAI
  2. In the Streamlit interface:
    • Select "openai" as provider
    • Choose your model (gpt-4, gpt-3.5-turbo)
    • Enter your API key
    • Click "Configure LLM"

Option B: Local LLM with Ollama (Free, runs offline)

  1. Install Ollama:

    # macOS
    brew install ollama
    
    # Linux
    curl -fsSL https://ollama.ai/install.sh | sh
    
    # Windows: Download from https://ollama.ai
  2. Start Ollama and pull a model:

    # Start Ollama service
    ollama serve
    
    # In another terminal, pull a model
    ollama pull llama3     # or mistral, phi3, codellama
  3. Configure in Streamlit:

    • Select "ollama" as provider
    • Choose your model (llama3, mistral, etc.)
    • Click "Configure LLM"

📖 How to Use

1. Upload Documents

  • Go to the Upload tab
  • Drag & drop or select files (PDF, DOCX, TXT)
  • Click "Upload and Process"
  • Wait for processing to complete

2. Chat with Documents

  • Go to the Chat tab
  • Select a document from the sidebar
  • Ask questions like:
    • "What is this document about?"
    • "What are the main conclusions?"
    • "Explain the methodology used"
    • "Find information about X"

3. Generate Summaries

  • Go to the Summary tab
  • Select document and summary type:
    • General: Main points for general audience
    • Executive: Business-focused insights
    • Technical: Detailed technical summary
    • Bullet Points: Easy-to-scan format
  • Choose length (100-1000 words)
  • Click "Generate Summary"

4. Monitor Analytics

  • Go to the Analytics tab
  • View document statistics
  • Check system health
  • Monitor chat history

🌐 Access Points

Once running, access the application at:

🔍 Troubleshooting

Common Issues

"ModuleNotFoundError" or Import Errors

# Ensure you're in the virtual environment
source venv/bin/activate  # or venv\Scripts\activate on Windows

# Reinstall dependencies
pip install -r backend/requirements.txt
pip install -r frontend/requirements.txt

"Port already in use"

# Kill existing processes
lsof -ti:8000 | xargs kill -9  # Backend
lsof -ti:8501 | xargs kill -9  # Frontend

# Or use different ports
streamlit run app.py --server.port 8502

"Cannot connect to backend"

  1. Check if backend is running: http://localhost:8000/health
  2. Ensure both services are running
  3. Check firewall settings

ChromaDB/Vector Store Issues

# Clear vector database if corrupted
rm -rf backend/chroma_db/
rm backend/app_database.db

# Restart application
./run.sh

LLM Configuration Issues

  • OpenAI: Verify API key is valid and has credits
  • Ollama: Ensure Ollama service is running (ollama serve)
  • Check the configuration in Streamlit sidebar

Debug Mode

Enable debug information in the Streamlit sidebar:

  1. Check "Show Debug Info"
  2. Click "Debug Backend Storage" to see document status
  3. Click "Debug Vector Store" to check embeddings

Performance Tips

  • For large documents: Increase chunk size in settings
  • For better results: Use OpenAI models (gpt-4)
  • For privacy: Use local Ollama models
  • For speed: Use smaller models (gpt-3.5-turbo)

📁 Project Structure

ChatWithDocs/
├── run.sh                 # Main startup script
├── backend/
│   ├── main.py           # FastAPI server
│   ├── database.py       # SQLite database
│   ├── requirements.txt  # Python dependencies
│   ├── services/         # Business logic
│   └── models/           # Data models
├── frontend/
│   ├── app.py            # Streamlit interface
│   └── requirements.txt  # Frontend dependencies
├── chroma_db/            # Vector database (auto-created)
├── uploads/              # Temporary file storage
└── .gitignore           # Git ignore rules

Environment Variables

Create a .env file in the backend directory:

# OpenAI Configuration
OPENAI_API_KEY=your_api_key_here

# Ollama Configuration
OLLAMA_BASE_URL=http://localhost:11434

# Database Configuration
DATABASE_URL=sqlite:///app_database.db

# Vector Store Configuration
CHROMA_PERSIST_DIRECTORY=./chroma_db

Customizing Settings

Edit configuration in the service files:

  • Chunk size: services/document_processor.py
  • Similarity threshold: services/chat_service.py
  • Model parameters: services/llm_service.py

🎯 Features

  • Multi-format support: PDF, Word, TXT files
  • Intelligent chat: RAG-based document interaction
  • Smart summaries: Multiple summary styles
  • Dual LLM support: OpenAI API + Local Ollama
  • Vector search: Semantic similarity matching
  • Conversation memory: Multi-turn chat history
  • Source attribution: See which parts of documents were used
  • Real-time processing: Instant document analysis
  • Privacy options: Local-only processing with Ollama

About

RAG-based document chat & summary application

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published