Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Chat with your PDFs using local AI models Process PDF documents and have interactive conversations with their content. Everything runs locally - no data leaves your machine.

License

Notifications You must be signed in to change notification settings

krey-yon/Cliven

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

10 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Cliven πŸ€–

Chat with your PDFs using local AI models!

Cliven is a command-line tool that allows you to process PDF documents and have interactive conversations with their content using local AI models. No data leaves your machine - everything runs locally using ChromaDB for vector storage and Ollama for AI inference.

Features ✨

  • πŸ“„ PDF Processing: Extract and chunk text from PDF documents
  • πŸ” Vector Search: Find relevant content using semantic similarity
  • πŸ€– Local AI Chat: Chat with your documents using Ollama models
  • 🐳 Docker Ready: Easy setup with Docker Compose
  • πŸ’Ύ Local Storage: All data stays on your machine
  • 🎯 Simple CLI: Easy-to-use command-line interface
  • πŸš€ Model Selection: Support for both lightweight (Gemma2) and high-performance (Gemma3) models
  • πŸ“Š Rich UI: Beautiful terminal interface with progress indicators

Quick Start πŸš€

1. Clone the Repository

git clone https://github.com/krey-yon/cliven.git
cd cliven

2. Install Dependencies

pip install -e .

3. Or, alternatively, install directly using pip

pip install cliven

4. Start Services with Docker

# Start with lightweight model (tinyllama:chat)
cliven docker start

# OR start with high-performance model (gemma3:4b)
cliven docker start --BP
# or
cliven docker start --better-performance

This will:

  • Start ChromaDB on port 8000
  • Start Ollama on port 11434
  • Pull the gemma2:2b model (default) or gemma3:4b model (with --BP flag)
  • May take several minutes depending on model and connection speed

4. Process Your First PDF

cliven ingest "path/to/your/document.pdf"

5. Start Chatting

# Chat with existing documents
cliven chat

# OR specify a model
cliven chat --model gemma3:2b

Usage πŸ“–

Available Commands

# Show welcome message and commands
cliven

# Process and store a PDF
cliven ingest <pdf_path> [--chunk-size SIZE] [--overlap SIZE]

# Start interactive chat with existing documents
cliven chat [--model MODEL_NAME] [--max-results COUNT]

# Process PDF and start chat immediately
cliven chat --repl <pdf_path> [--model MODEL_NAME]

# List all processed documents
cliven list

# Delete a specific document
cliven delete <doc_id>

# Clear all documents
cliven clear [--confirm]

# Check system status
cliven status

# Manage Docker services
cliven docker start [--BP | --better-performance]  # Start services
cliven docker stop                                 # Stop services
cliven docker logs                                 # View logs

Examples

# Process a manual with custom chunking
cliven ingest ./documents/user-manual.pdf --chunk-size 1500 --overlap 300

# Start chatting with all processed documents
cliven chat

# Chat with specific model
cliven chat --model gemma3:4b

# Process and chat with a specific PDF using high-performance model
cliven chat --repl ./research-paper.pdf --model gemma3:4b

# Check what documents are stored
cliven list

# Check if services are running
cliven status

# Clear all documents without confirmation
cliven clear --confirm

# Start services with better performance model
cliven docker start --BP

Model Options

Cliven supports multiple AI models:

  • gemma2:2b: Lightweight, fast responses (~1GB model)
  • gemma3:4b: High-performance, better quality responses (~4GB model)

The system automatically selects the best available model, or you can specify one:

# Auto-select best available model
cliven chat

# Use specific model
cliven chat --model gemma3:4b
cliven chat --model gemma2:2b

Architecture πŸ—οΈ

Cliven uses a modern RAG (Retrieval-Augmented Generation) architecture:

  1. PDF Parser: Extracts text from PDFs using pdfplumber
  2. Text Chunker: Splits documents into overlapping chunks using LangChain
  3. Embedder: Creates embeddings using BAAI/bge-small-en-v1.5
  4. Vector Database: Stores embeddings in ChromaDB
  5. Chat Engine: Handles queries and generates responses with Ollama

Components πŸ”§

Core Services

  • ChromaDB: Vector database for storing document embeddings
  • Ollama: Local LLM inference server
  • Gemma2:2b: Lightweight chat model for fast responses
  • Gemma3:4b: High-performance model for better quality responses

Key Files

  • main/cliven.py: Main CLI application with argument parsing
  • main/chat.py: Chat engine with RAG functionality and model management
  • utils/parser.py: PDF text extraction and chunking
  • utils/embedder.py: Text embedding generation using sentence transformers
  • utils/vectordb.py: ChromaDB operations and vector storage
  • utils/chunker.py: Text chunking utilities
  • docker-compose.yml: Service orchestration configuration

System Requirements πŸ“‹

Software Requirements

  • Python 3.8+
  • Docker & Docker Compose
  • 2GB+ RAM (for Gemma2 model)
  • 8GB+ RAM (for Gemma3 4B model)
  • 4GB+ disk space

Python Dependencies

  • typer>=0.9.0 - CLI framework
  • rich>=13.0.0 - Beautiful terminal output
  • pdfplumber>=0.7.0 - PDF text extraction
  • sentence-transformers>=2.2.0 - Text embeddings
  • chromadb>=0.4.0 - Vector database
  • langchain>=0.0.300 - Text processing
  • requests>=2.28.0 - HTTP client

Installation Options πŸ› οΈ

Option 1: Local Development

# Clone repository
git clone https://github.com/krey-yon/cliven.git
cd cliven

# Create virtual environment
python -m venv .venv
.venv\Scripts\activate

# Install dependencies
pip install -e .

# Start services
cliven docker start

Option 2: Production Install

pip install git+https://github.com/krey-yon/cliven.git

Configuration βš™οΈ

Environment Variables

# ChromaDB settings
CHROMA_HOST=localhost
CHROMA_PORT=8000

# Ollama settings
OLLAMA_HOST=localhost
OLLAMA_PORT=11434

Customization

# Use different chunk sizes
cliven ingest document.pdf --chunk-size 1500 --overlap 300

# Use different model
cliven chat --model gemma3:4b

# Adjust context window
cliven chat --max-results 10

# Skip confirmation for clearing
cliven clear --confirm

Model Management

# Check available models
cliven status

# Manually pull models
docker exec -it cliven_ollama ollama pull gemma3:4b
docker exec -it cliven_ollama ollama pull gemma2:2b

# List downloaded models
docker exec -it cliven_ollama ollama list

Troubleshooting πŸ”§

Common Issues

  1. Docker services not starting

    # Check Docker daemon
    docker info
    
    # View service logs
    cliven docker logs
    
    # Restart services
    cliven docker stop
    cliven docker start
  2. Model not found

    # Check available models
    cliven status
    
    # Manually pull model
    docker exec -it cliven_ollama ollama pull gemma3:4b
    docker exec -it cliven_ollama ollama pull gemma2:2b
  3. ChromaDB connection failed

    # Check service status
    cliven status
    
    # Restart services
    cliven docker stop
    cliven docker start
    
    # Check logs
    cliven docker logs
  4. PDF processing errors

    # Check file path and permissions
    dir path\to\file.pdf
    
    # Try with different chunk size
    cliven ingest file.pdf --chunk-size 500
    
    # Check for PDF corruption
    cliven ingest file.pdf --chunk-size 2000 --overlap 100
  5. Model performance issues

    # Switch to lightweight model
    cliven chat --model gemma2:2b
    
    # Or use high-performance model
    cliven chat --model gemma3:4b
    
    # Check system resources
    cliven status

Performance Tips

  • Use gemma2:2b for faster responses on limited hardware
  • Use gemma3:4b for better quality responses with sufficient RAM
  • Use smaller chunk sizes for better context precision
  • Increase overlap for better continuity
  • Monitor RAM usage with large PDFs
  • Use SSD storage for better ChromaDB performance

Contributing 🀝

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests if applicable
  5. Submit a pull request

License πŸ“„

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments πŸ™

Support πŸ’¬


Made with ❀️ by Kreyon

Chat with your PDFs locally and securely!

About

Chat with your PDFs using local AI models Process PDF documents and have interactive conversations with their content. Everything runs locally - no data leaves your machine.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Sponsor this project

 

Packages

No packages published

Languages