`ragged`

This project is my attempt to learn fully AI-based 'vibe' coding and to document my use of AI coding assistants transparently. Expect breaking changes before v1.0.

`ragged`

Privacy-First Local RAG System

Your private, intelligent document assistant that runs entirely on your computer: ragged is a local RAG (Retrieval-Augmented Generation) system that lets you ask questions about your documents and get accurate answers with citations - all while keeping your data completely private and local.

Principles

Privacy First: 100% local by default. External services only with explicit user consent.
User-Friendly: Simple for beginners, powerful for experts (progressive disclosure).
Transparent: Open source, well-documented, educational.
Quality-Focused: Built-in evaluation and testing from the start.
Continuous Improvement: Each version adds value while maintaining stability.

Aspirations

📚 Multi-Format Support: Ingest PDF, TXT, Markdown, and HTML documents
🧠 Semantic Understanding: Uses embeddings to understand meaning, not just keywords
🔍 Smart Retrieval: Finds relevant information across all your documents
💬 Accurate Answers: Generates natural language responses with source citations
🔒 100% Private: Everything runs locally - no data leaves your machine
⚡ Hardware Optimised: Supports CPU, Apple Silicon (MLX), and CUDA (planned)
🎨 Intuitive CLI: Command-line interface with progress bars and colors

How It Works

It's simple: Upload your documents (PDFs, text files, web pages), ask questions, and ragged finds the most relevant information to respond -— all running locally on your machine.

Ingest: Add documents to the knowledge base ('library')
Process: Documents are chunked and embedded for semantic search
Store: Embeddings are stored in a local vector database
Query: Ask questions in natural language
Retrieve: ragged finds the most relevant document chunks
Generate: A local LLM generates an answer with citations (planned)

Quick Start

Prerequisites

Python 3.12
Ollama installed and running (optional for v0.2 web UI)
ChromaDB (via Docker or pip)

Installation

# Clone the repository
git clone https://github.com/REPPL/ragged.git
cd ragged

# Create virtual environment
python3 -m venv .venv/ # or python, depending on your system
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

# Install dependencies
pip install -e .

# Start services
docker compose up -d  # Starts ChromaDB (if using Docker)
ollama serve          # Start Ollama (in separate terminal)

Basic Usage

# Add documents to your knowledge base
ragged add /path/to/document.pdf
ragged add /path/to/notes.txt

# Ask questions
ragged query "What are the key findings?"
ragged query "Explain the methodology"

# Manage your knowledge base
ragged list    # Show collection statistics
ragged clear   # Clear all documents

# Check configuration and health
ragged config show
ragged health

Configuration

ragged uses environment variables for configuration. Create a .env file:

# Environment
RAGGED_ENVIRONMENT=development
RAGGED_LOG_LEVEL=INFO

# Embedding Model
RAGGED_EMBEDDING_MODEL=sentence-transformers  # or: ollama
RAGGED_EMBEDDING_MODEL_NAME=all-MiniLM-L6-v2

# LLM
RAGGED_LLM_MODEL=llama3.2

# Chunking
RAGGED_CHUNK_SIZE=500
RAGGED_CHUNK_OVERLAP=100

# Services
RAGGED_CHROMA_URL=http://localhost:8001
RAGGED_OLLAMA_URL=http://localhost:11434

Troubleshooting

ChromaDB Connection Issues:

# Check ChromaDB is running
docker ps | grep chromadb

# Restart ChromaDB
docker-compose restart chromadb

Ollama Issues:

# Check Ollama is running
ollama list

# Pull required model
ollama pull llama3.2
ollama pull nomic-embed-text

Tests

Running Tests

# Run all tests
pytest

# Run with coverage
pytest --cov=src --cov-report=html

# Run specific test suite
pytest tests/integration/
pytest tests/unit/

Code Quality

# Format code
black src/ tests/

# Lint
ruff check src/ tests/

# Type check
mypy src/

Acknowledgments

Built with:

Ollama - Local LLM inference
ChromaDB - Vector database
sentence-transformers - Embeddings
PyMuPDF4LLM - PDF processing
Click & Rich - CLI

Name		Name	Last commit message	Last commit date
Latest commit History 135 Commits
docs		docs
scripts		scripts
src		src
tests		tests
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
requirements-dev.txt		requirements-dev.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

`ragged`

Privacy-First Local RAG System

Principles

Aspirations

How It Works

Quick Start

Prerequisites

Installation

Basic Usage

Configuration

Troubleshooting

Tests

Running Tests

Code Quality

Acknowledgments

About

Uh oh!

Releases

Contributors 2

Uh oh!

Languages

License

REPPL/ragged

Folders and files

Latest commit

History

Repository files navigation

ragged

Privacy-First Local RAG System

Principles

Aspirations

How It Works

Quick Start

Prerequisites

Installation

Basic Usage

Configuration

Troubleshooting

Tests

Running Tests

Code Quality

Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Contributors 2

Uh oh!

Languages

`ragged`