Thanks to visit codestin.com
Credit goes to github.com

Skip to content
/ KITT Public

Collaborative AI research‑fabrication OS: a swarm‑based platform where specialized agents (routed across local and cloud models) cooperate to decompose, research and synthesize complex tasks while orchestrating 3D‑printing, CNC and workshop ops. Features dynamic model routing, hierarchical research synthesis, agent registries and slot management.

Notifications You must be signed in to change notification settings

Jmi2020/KITT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

KITTY: Your Workshop's AI Sidekick 🚀

Knowledgeable Intelligent Tool-using Tabletop Yoda

An offline-first, voice-enabled fabrication lab orchestrator running on Mac Studio M3 Ultra. Think "JARVIS for your workshop" - but it actually works, runs locally, and won't spy on you. 🔒

Python 3.11+ TypeScript 5.x macOS 14+ MIT License Building the future


🌟 Vision: A Maker Space for Technical AI

KITTY is a technical AI habitat - a maker space purpose-built for AI models like Claude, GPT-5, Llama, Qwen, and Mistral to come "live," run research, and directly control fabrication hardware. Built on the energy-efficient Mac Studio M3 Ultra, it provides a secure network interface to 3D printers, CNC machines, test rigs, and sensing equipment.

What makes KITTY different:

  • 🏠 AI Residency Model: Models can spin up for a single query or remain active for deep, after-hours projects
  • 🤖 Bounded Autonomy: One KITTY-owned project per week with controlled access to printers, inventory, and research
  • ♻️ Sustainable Manufacturing: Prioritizes ethically sourced materials with robotic procurement workflows
  • 🚀 Idea → Prototype Pipeline: Investigate materials, estimate costs, run simulations, then orchestrate fabrication
  • ⚡ Energy Efficient: Mac Studio runs indefinitely with minimal power draw

📖 Full Vision & Roadmap: See NorthStar/ProjectVision.md for the complete multi-phase implementation plan.


🛠️ Complete Tech Stack

🧠 AI/ML Infrastructure

Component Purpose Technology
Q4 Tool Orchestrator Fast tool calling, ReAct agent llama.cpp (Athene V2 Agent Q4_K_M) @ port 8083
Primary Reasoner Deep reasoning with thinking mode Ollama (GPT-OSS 120B) @ port 11434
Fallback Reasoner (DEPRECATED) Legacy fallback only llama.cpp (Llama 3.3 70B F16) @ port 8082
Vision Model Image understanding, multimodal llama.cpp (Gemma 3 27B Q4_K_M) @ port 8086
Summary Model Response compression llama.cpp (Hermes 3 8B Q4_K_M) @ port 8084
Coder Model Code generation specialist llama.cpp (Devstral 2 123B Q5_K_M) @ port 8087
Cloud Providers Shell/Collective model selection OpenAI GPT-5.2, Claude Sonnet 4.5, Perplexity Sonar, Gemini 2.5

🐍 Backend Services (Python 3.11 + FastAPI)

Service Port Purpose
Brain 8000 Core orchestrator, ReAct agent, intelligent routing
Gateway 8080 REST API (HAProxy load-balanced, 3 replicas)
CAD 8200 3D model generation (Zoo, Tripo, local CadQuery)
Fabrication 8300 Printer control, queue management, mesh segmentation, Bambu Labs integration
Voice 8400 Real-time STT/TTS with local Whisper + Kokoro/Piper
Discovery 8500 Network device scanning (mDNS, SSDP, Bamboo/Snapmaker UDP)
Broker 8777 Command execution with allow-list safety
Images 8600 Stable Diffusion generation with RQ workers
Mem0 MCP 8765 Semantic memory with vector embeddings

🎨 Frontend (React 18 + TypeScript + Vite)

Component Purpose
Menu Landing page with navigation cards to all sections
Voice Real-time voice assistant with Local/Cloud toggle
Shell Text chat with function calling, streaming, and cloud model selection
Projects CAD project management with artifact browser
Fabrication Console Printer status, queue management, mesh segmentation, job tracking
Settings Bambu Labs login, preferences, API configuration
I/O Control Feature toggles and provider management
Research Autonomous research pipeline with real-time streaming
Vision Gallery Reference image search and storage
Image Generator Stable Diffusion generation interface
Material Inventory Filament catalog and stock management
Print Intelligence Success prediction and recommendations dashboard
Collective Multi-agent deliberation for better decisions
Wall Terminal Full-screen display mode

🐳 Infrastructure (Docker Compose)

Service Technology Purpose
Load Balancer HAProxy Gateway traffic distribution, health checks
Database PostgreSQL 16 Audit logs, state, projects (clustering optional)
Cache Redis 7 Semantic cache, routing state, feature flags
Vector DB Qdrant 1.11 Memory embeddings, semantic search
Object Storage MinIO CAD artifacts, images, snapshots (S3-compatible)
Message Queue RabbitMQ 3.12 Async events, job distribution
MQTT Broker Eclipse Mosquitto 2.0 Device communication, printer telemetry
Search Engine SearXNG Private, local web search
Smart Home Home Assistant Device control, automation
Metrics Prometheus + Grafana Observability dashboards
Logs Loki Log aggregation
Traces Tempo Distributed tracing

🚀 Quick Start

📋 Prerequisites

  • Hardware: Mac Studio M3 Ultra recommended (256GB+ RAM for large models)
  • OS: macOS 14+ with Xcode command line tools
  • Software: Docker Desktop, Python 3.11, Node 20, Homebrew

🛠️ Installation (5 minutes)

# Clone the repository
git clone https://github.com/yourusername/KITT.git
cd KITT

# Install developer tools
pip install --upgrade pip pre-commit
pre-commit install

# Create environment file
cp .env.example .env
# Edit .env with your settings (see Configuration section below)

# Setup artifacts directory (for accessing 3MF/GLB files in Finder)
./ops/scripts/setup-artifacts-dir.sh

# Start everything
./ops/scripts/start-all.sh

🌐 Accessing KITTY

After startup, open your browser to:

Interface URL Description
Main UI http://localhost:4173 Menu landing page with all features
Voice http://localhost:4173/?view=voice Real-time voice assistant
API Docs http://localhost:8080/docs Swagger/OpenAPI documentation
Grafana http://localhost:3000 Metrics and dashboards

🎤 Voice Service

KITTY includes a hybrid voice system with local-first processing, wake word detection, and cloud fallback:

🏗️ Architecture

┌──────────────────────────────────────────────────────────────────────────────┐
│                          Voice Service (:8400)                                │
├──────────────────────────────────────────────────────────────────────────────┤
│                                                                               │
│  ┌─────────────────────────────────────────────────────────────────────────┐ │
│  │                      Wake Word Detection (Optional)                      │ │
│  │               Porcupine ("Hey Kitty") → Activates Listening              │ │
│  └──────────────────────────────────┬──────────────────────────────────────┘ │
│                                     │                                         │
│                                     ▼                                         │
│  ┌───────────────────────┐         ┌───────────────────────────────────────┐ │
│  │    STT (Speech)       │         │         TTS (Synthesis)               │ │
│  │  ┌─────────────────┐  │         │  ┌─────────────────────────────────┐  │ │
│  │  │  Local Whisper  │  │         │  │  Kokoro ONNX (Apple Silicon)   │  │ │
│  │  │    (base.en)    │  │         │  │  am_michael (male), af (female) │  │ │
│  │  └───────┬─────────┘  │         │  └───────────────┬─────────────────┘  │ │
│  │          │ Fallback   │         │                  │ Fallback           │ │
│  │  ┌───────▼─────────┐  │         │  ┌───────────────▼─────────────────┐  │ │
│  │  │  OpenAI API     │  │         │  │      Piper TTS (Legacy)         │  │ │
│  │  │    Whisper      │  │         │  │    amy/ryan (22050 Hz)          │  │ │
│  │  └─────────────────┘  │         │  └───────────────┬─────────────────┘  │ │
│  └───────────────────────┘         │                  │ Fallback           │
│                                    │  ┌───────────────▼─────────────────┐  │ │
│                                    │  │       OpenAI TTS (tts-1)        │  │ │
│                                    │  └─────────────────────────────────┘  │ │
│                                    └───────────────────────────────────────┘ │
│                                                                               │
│  ┌─────────────────────────────────────────────────────────────────────────┐ │
│  │                        WebSocket Handler                                 │ │
│  Real-time streaming • PTT or Always-Listening • Adaptive chunking     │ │
│  └─────────────────────────────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────────────────────┘

🎭 Local Voice Models

Wake Word Detection (Porcupine)

  • Engine: Picovoice Porcupine v4.0
  • Wake word: "Hey Kitty" (custom trained model)
  • Location: ~/.local/share/kitty/models/Hey-Kitty_en_mac_v4_0_0.ppn
  • Features: Low CPU usage, configurable sensitivity

Speech-to-Text (Whisper.cpp)

  • Model: base.en (English-optimized, ~150MB)
  • Location: ~/.cache/whisper/ggml-base.en.bin
  • Features: VAD, real-time transcription

Text-to-Speech (Kokoro ONNX) - Primary

  • Model: Kokoro v1.0 ONNX (~82MB, optimized for Apple Silicon)
  • Location: ~/.local/share/kitty/models/kokoro-v1.0.onnx
  • Voices: am_michael (male cowboy), af (female), plus 20+ additional voices
  • Sample rate: 24000 Hz
  • Features: Adaptive text chunking, CoreML acceleration, streaming

Text-to-Speech (Piper) - Fallback

  • Models: en_US-amy-medium (female), en_US-ryan-medium (male)
  • Location: /Users/Shared/Coding/models/Piper/
  • Sample rate: 22050 Hz

Voice Mapping (TTS):

  • alloy, nova, shimmer → af/amy (female)
  • echo, fable, onyx → am_michael/ryan (male)

🎛️ Voice Configuration

# Voice service
VOICE_BASE_URL=http://localhost:8400
VOICE_PREFER_LOCAL=true              # Use local models first
VOICE_DEFAULT_VOICE=alloy            # Default TTS voice
VOICE_SAMPLE_RATE=16000              # Audio sample rate

# Local Whisper STT
WHISPER_MODEL=base.en                # Model size (tiny, base, small, medium, large)
WHISPER_MODEL_PATH=                  # Optional custom path

# Wake Word Detection (Porcupine)
PORCUPINE_ACCESS_KEY=your-key        # Get from console.picovoice.ai
WAKE_WORD_ENABLED=true               # Enable wake word detection
WAKE_WORD_MODEL_PATH=~/.local/share/kitty/models/Hey-Kitty_en_mac_v4_0_0.ppn
WAKE_WORD_SENSITIVITY=0.5            # Detection sensitivity (0.0-1.0)

# Local TTS Provider Selection
LOCAL_TTS_PROVIDER=kokoro            # Primary: kokoro, fallback: piper

# Kokoro TTS (Primary)
KOKORO_ENABLED=true
KOKORO_MODEL_PATH=~/.local/share/kitty/models/kokoro-v1.0.onnx
KOKORO_VOICES_PATH=~/.local/share/kitty/models/voices-v1.0.bin
KOKORO_VOICE=am_michael              # am_michael (male), af (female)

# Piper TTS (Fallback)
PIPER_MODEL_DIR=/Users/Shared/Coding/models/Piper

# Cloud TTS (Final Fallback)
OPENAI_TTS_MODEL=tts-1

🚀 Starting Voice Service

# Start voice service standalone
./ops/scripts/start-voice-service.sh

# Stop voice service
./ops/scripts/stop-voice-service.sh

# Check voice status
curl http://localhost:8080/api/voice/status | jq .

📡 Voice API Endpoints

Endpoint Method Description
/api/voice/status GET Provider status (local/cloud availability)
/api/voice/transcribe POST Transcribe audio to text
/api/voice/synthesize POST Convert text to speech
/api/voice/ws WebSocket Real-time bidirectional streaming
/api/voice/chat POST Full voice chat (STT → LLM → TTS)

🔌 WebSocket Events

Event Direction Description
config Client → Server Set session config (mode, voice, prefer_local)
audio.chunk Client → Server Base64 encoded audio chunk
audio.end Client → Server Signal end of speech
wake_word.toggle Client → Server Enable/disable wake word detection
wake_word.detected Server → Client Wake word triggered
transcript Server → Client STT result (partial or final)
response.text Server → Client Streaming text response
response.audio Server → Client TTS audio chunk (base64)
function.call Server → Client Tool invocation started
function.result Server → Client Tool execution result

🎨 Web UI

📱 Menu Landing Page

The UI starts with a Menu page showing all available sections:

Section Icon Description
Voice 🎙️ Real-time voice assistant with STT/TTS
Chat Shell 💬 Text chat with function calling
Fabrication Console 🎨 Text-to-3D model generation
Projects 📁 CAD project management
Dashboard 🖨️ Printers, cameras, and material inventory
Media Hub 🖼️ Vision gallery and image generation
Research Hub 🔬 Research, results, and scheduling
Collective 👥 Multi-agent deliberation for better decisions
Intelligence 📈 Analytics and insights dashboard
Wall Terminal 🖥️ Full-screen display mode
Settings ⚙️ Bambu Labs, preferences, API config

🏃 Running the UI

cd services/ui

# Development mode (hot reload)
npm run dev --host 0.0.0.0 --port 4173

# Production build
npm run build
npm run preview

🎛️ UI Configuration

KITTY_UI_BASE=http://localhost:4173      # UI base URL
VITE_API_BASE=http://localhost:8080      # Gateway API URL

💻 AI Coding Assistant

KITTY includes a local-first AI coding assistant powered by Devstral 2 123B, Mistral's agentic coding model.

Components

Component Description
kitty-code Textual TUI for interactive coding sessions
coder-agent Backend service with Plan→Code→Test workflow
Web UI Coding page at /coding with SSE streaming

Quick Start

# Install kitty-code CLI
cd services/kitty-code
pip install -e .

# Run interactive session
kitty-code "Write a REST API with FastAPI"

# Or use the web UI at http://localhost:4173/coding

Model: Devstral 2 123B

  • Parameters: 125 billion
  • Quantization: Q5_K_M (~82GB sharded GGUF)
  • Context: 16,384 tokens
  • Backend: llama.cpp (natively handles sharded files)
  • Speed: ~5 tokens/second on Mac Studio M3 Ultra

Model Download

# Download via huggingface-cli (~82GB)
huggingface-cli download bartowski/mistralai_Devstral-2-123B-Instruct-2512-GGUF \
  --include "mistralai_Devstral-2-123B-Instruct-2512-Q5_K_M/*" \
  --local-dir ~/models/devstral2/Q5_K_M

Configuration

# Enable Devstral 2 via llama.cpp
LLAMACPP_CODER_ENABLED=true
LLAMACPP_CODER_MODEL=/path/to/devstral2/Q5_K_M/.../00001-of-00003.gguf
LLAMACPP_CODER_CTX=16384
LLAMACPP_CODER_PARALLEL=2
LLAMACPP_CODER_TEMPERATURE=0.2

MCP Integration

kitty-code auto-discovers KITTY services as MCP servers:

kitty_brain  → http://localhost:8000/mcp  (Query routing, research)
kitty_cad    → http://localhost:8200/mcp  (3D model generation)
kitty_fab    → http://localhost:8300/mcp  (Printer control)
kitty_discovery → http://localhost:8500/mcp  (Device scanning)

See services/kitty-code/README.md for full documentation.


⚡ Command Reference

🚀 Start/Stop KITTY

# Start everything (llama.cpp + Docker + Voice)
./ops/scripts/start-all.sh

# Stop everything
./ops/scripts/stop-all.sh

# Start only voice service
./ops/scripts/start-voice-service.sh

# Stop only voice service
./ops/scripts/stop-voice-service.sh

# Check service status
docker compose -f infra/compose/docker-compose.yml ps

💻 CLI Interface

# Install CLI (one-time)
pip install -e services/cli/

# Launch interactive shell
kitty-cli shell

# Inside the shell:
> /help                              # Show available commands
> /voice                             # Toggle voice mode
> /research <query>                  # Autonomous research
> /cad Create a hex box              # Generate CAD model
> /split /path/to/model.stl         # Split oversized model for printing
> /remember Ordered more PLA         # Save long-term note
> /memories PLA                      # Recall saved notes
> /vision gandalf rubber duck        # Search reference images
> /generate futuristic drone         # Generate SD image
> /collective council k=3 Compare... # Multi-agent collaboration
> /exit                              # Exit shell

# Quick one-off queries
kitty-cli say "What printers are online?"
kitty-cli say "Turn on bench lights"

🚀 Unified Launcher TUI

# Install launcher (one-time)
pip install -e services/launcher/

# Launch unified control center
kitty

# TUI Shortcuts:
# k - Start KITTY stack
# x - Stop stack
# c - Launch CLI
# v - Launch Voice interface
# m - Launch Model Manager
# o - Open Web Console
# i - Launch I/O dashboard
# q - Quit

🎯 Core Features

🧭 Intelligent Routing

Query arrives → Local model (free, instant) → Confidence check
                                                    ↓
                          High confidence? ──→ Return answer
                                                    ↓
                          Low confidence? ──→ Escalate to cloud (budget gated)

🤖 ReAct Agent with Tool Use

KITTY uses a ReAct (Reasoning + Acting) agent that can:

  • Reason about complex multi-step tasks
  • Use tools via Model Context Protocol (MCP)
  • Observe results and adapt strategy
  • Iterate until task completion

🛡️ Safety-First Design

  • Hazard workflows: Two-step confirmation for dangerous operations
  • Command allow-lists: Only pre-approved system commands execute
  • Audit logging: Every tool use logged to PostgreSQL
  • Budget gates: Cloud API calls require password confirmation

🏗️ CAD Generation

Generate 3D models from natural language:

kitty-cli cad "Create a phone stand with 45° angle and cable management"

Providers (automatic fallback):

  1. Zoo API (parametric STEP)
  2. Tripo (mesh STL/OBJ)
  3. Local CadQuery (offline)
  4. Local FreeCAD (offline)

🖨️ Print Queue

Multi-printer coordination with intelligent scheduling:

# List queue
./scripts/queue-cli.sh list

# Submit job
./scripts/queue-cli.sh submit /path/to/model.stl "bracket_v2" pla_black_esun 3

# Watch queue
./scripts/queue-cli.sh watch

Supported Printers:

  • Bamboo Labs H2D (MQTT)
  • Elegoo OrangeStorm Giga (Klipper)
  • Snapmaker Artisan (UDP)
  • Any OctoPrint/Moonraker instance

🔪 Mesh Segmentation

Split oversized 3D models into printable parts (supports 3MF and STL):

# Via CLI
kitty-cli shell
> /split /path/to/large_model.3mf

# Via API
curl -X POST http://localhost:8300/api/segmentation/segment \
  -H "Content-Type: application/json" \
  -d '{"mesh_path": "/path/to/model.3mf", "printer_id": "bamboo_h2d"}'

Features:

  • 3MF native: Prefers 3MF input/output for slicer compatibility (STL also supported)
  • Automatic splitting: Detects oversized models and splits into printer-fit parts
  • SDF hollowing: Reduce material usage with configurable wall thickness
  • Alignment joints: Dowel pin holes for accurate part assembly
  • 3MF assembly output: Single file with all parts, colors, and metadata
  • Configurable printers: Load build volumes from printer_config.yaml

Segmentation Options:

Option Description Default
printer_id Target printer for build volume auto-detect
enable_hollowing Enable SDF-based hollowing true
wall_thickness_mm Wall thickness for hollowing 2.0
joint_type Joint type: dowel, dovetail, pyramid, none dowel
max_parts Maximum parts to generate 10

Interfaces:

  • CLI: /split command in kitty-cli shell
  • Voice: "Split this model for printing"
  • Web UI: MeshSegmenter component in Fabrication Console
  • API: POST /api/segmentation/segment on Fabrication service

🏗️ Architecture

┌──────────────────────────────────────────────────────────────────────────────┐
│                           Mac Studio M3 Ultra Host                            │
│                                                                               │
│  ┌─────────────────────────────────────────────────────────────────────────┐ │
│  │                         Local AI Inference Layer                         │ │
│  │                                                                          │ │
│  │  ┌────────────────────────────────────────────────────────────────────┐ │ │
│  │  │                  Ollama (Primary Reasoner) :11434                   │ │ │
│  │  │                      GPT-OSS 120B (Thinking Mode)                   │ │ │
│  │  └────────────────────────────────────────────────────────────────────┘ │ │
│  │                                                                          │ │
│  │  ┌────────────────────────────────────────────────────────────────────┐ │ │
│  │  │                  llama.cpp Servers (Metal GPU)                      │ │ │
│  │  │  ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │ │
│  │  │  │Q4 Tool   │ │F16 DEPR  │ │Vision    │ │Summary   │ │Coder     │ │ │ │
│  │  │  │:8083     │ │:8082     │ │:8086     │ │:8084     │ │:8087     │ │ │ │
│  │  │  │Athene V2 │ │(Fallback)│ │Gemma 27B │ │Hermes 8B │ │Qwen 32B  │ │ │ │
│  │  │  └──────────┘ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │ │ │
│  │  └────────────────────────────────────────────────────────────────────┘ │ │
│  └─────────────────────────────────────────────────────────────────────────┘ │
│                                                                               │
│  ┌─────────────────────────────────────────────────────────────────────────┐ │
│  │                         Docker Compose Services                          │ │
│  │                                                                          │ │
│  │  ┌─────────────────────────────────────────────────────────────────────┐│ │
│  │  │                        HAProxy :8080 (Load Balancer)                ││ │
│  │  └───────────────────────────────┬─────────────────────────────────────┘│ │
│  │                                  │                                       │ │
│  │  ┌───────────────────────────────▼─────────────────────────────────────┐│ │
│  │  │                         Gateway (x3 replicas)                       ││ │
│  │  │                    REST API, Auth, Routing, Proxy                   ││ │
│  │  └───────────────────────────────┬─────────────────────────────────────┘│ │
│  │                                  │                                       │ │
│  │  ┌───────────────────────────────▼─────────────────────────────────────┐│ │
│  │  │                          Brain :8000                                ││ │
│  │  │          Orchestrator • ReAct Agent • Research Pipeline             ││ │
│  │  └──┬────────┬────────┬────────┬────────┬────────┬────────┬───────────┘│ │
│  │     │        │        │        │        │        │        │             │ │
│  │  ┌──▼──┐ ┌──▼──┐ ┌──▼──┐ ┌──▼──┐ ┌──▼──┐ ┌──▼──┐ ┌──▼──┐            │ │
│  │  │ CAD │ │ Fab │ │Voice│ │Disc │ │Brok │ │Imgs │ │Mem0 │            │ │
│  │  │:8200│ │:8300│ │:8400│ │:8500│ │:8777│ │:8600│ │:8765│            │ │
│  │  └─────┘ └─────┘ └─────┘ └─────┘ └─────┘ └─────┘ └─────┘            │ │
│  │                                                                       │ │
│  │  ┌───────────────────────────────────────────────────────────────────┐│ │
│  │  │                    Storage & Infrastructure                       ││ │
│  │  │  PostgreSQL │ Redis │ Qdrant │ MinIO │ RabbitMQ │ Mosquitto      ││ │
│  │  └───────────────────────────────────────────────────────────────────┘│ │
│  └─────────────────────────────────────────────────────────────────────────┘ │
│                                                                               │
│  ┌─────────────────────────────────────────────────────────────────────────┐ │
│  │                              Web UI :4173                                │ │
│  │  Menu │ Voice │ Shell │ Projects │ Fab │ Research │ Settings │ ...     │ │
│  └─────────────────────────────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────────────────────┘
                │                           │                    │
     ┌──────────▼──────────┐     ┌──────────▼──────────┐   ┌────▼────┐
     │    Home Assistant   │     │      3D Printers    │   │ Cameras │
     │    (MQTT + REST)    │     │  Bamboo │ Elegoo    │   │  (Pi)   │
     │  Lights, Climate,   │     │  Snapmaker │ Others │   │         │
     │  Sensors, Locks     │     │  (OctoPrint/Klipper)│   │         │
     └─────────────────────┘     └─────────────────────┘   └─────────┘

🎛️ Configuration

📝 Core Settings (.env)

# User & Safety
USER_NAME=YourName
KITTY_USER_NAME=YourName
HAZARD_CONFIRMATION_PHRASE="Confirm: proceed"
API_OVERRIDE_PASSWORD=omega

# Budget
BUDGET_PER_TASK_USD=0.50
CONFIDENCE_THRESHOLD=0.80

🧠 AI Models

# Ollama (Primary Reasoner)
LOCAL_REASONER_PROVIDER=ollama
OLLAMA_HOST=http://host.docker.internal:11434
OLLAMA_MODEL=gpt-oss:120b
OLLAMA_THINK=medium

# llama.cpp Q4 (Tool Orchestrator)
LLAMACPP_Q4_HOST=http://host.docker.internal:8083
LLAMACPP_Q4_MODEL=athene-v2-agent/Athene-V2-Agent-Q4_K_M.gguf
LLAMACPP_Q4_PORT=8083

# DEPRECATED: llama.cpp F16 (Legacy Fallback - only used when LOCAL_REASONER_PROVIDER=llamacpp)
# LLAMACPP_F16_HOST=http://host.docker.internal:8082
# LLAMACPP_F16_MODEL=llama-3-70b/Llama-3.3-70B-Instruct-F16/...gguf
# LLAMACPP_F16_PORT=8082

# Vision
LLAMACPP_VISION_MODEL=gemma-3-27b-it-GGUF/gemma-3-27b-it-q4_k_m.gguf
LLAMACPP_VISION_MMPROJ=gemma3_27b_mmproj/mmproj-model-f16.gguf
LLAMACPP_VISION_PORT=8086

🧭 Semantic Tool Selection

KITTY uses embedding-based semantic search to intelligently select relevant tools for each query, reducing context usage by ~90% when many tools are available.

How it works:

  1. Tool definitions are converted to text embeddings using all-MiniLM-L6-v2 (384 dimensions)
  2. Embeddings are cached in Redis for cluster-wide sharing
  3. For each query, cosine similarity finds the most relevant tools
  4. Only top-k matching tools are passed to the model (instead of all 50+)

Benefits:

  • Context savings: ~90% reduction (e.g., 600 tokens vs 7,500 for 50 tools)
  • Better tool selection: Semantic matching beats keyword heuristics
  • Cluster-ready: Redis caching shares embeddings across nodes
  • Fast: ~10-15ms per search after initial model load
# Semantic tool selection (default: enabled)
USE_SEMANTIC_TOOL_SELECTION=true
EMBEDDING_MODEL=all-MiniLM-L6-v2
TOOL_SEARCH_TOP_K=5
TOOL_SEARCH_THRESHOLD=0.3

Disabling: Set USE_SEMANTIC_TOOL_SELECTION=false to fall back to keyword-based selection.

🤖 Parallel Agent Orchestration (Experimental)

KITTY supports parallel multi-agent orchestration for complex, multi-step goals. When enabled, complex queries are decomposed into parallelizable tasks executed concurrently across multiple specialized agents.

Architecture:

User Goal → Decompose (Q4) → Dependency Graph → Parallel Execute → Synthesize (GPTOSS)
                                    │
                    ┌───────────────┼───────────────┐
                    ▼               ▼               ▼
                [Task 1]       [Task 2]       [Task 3]
                researcher     cad_designer   fabricator
                    │               │               │
                    └───────────────┼───────────────┘
                                    ▼
                            Final Response

Specialized Agents:

Agent Primary Model Purpose Tool Allowlist
researcher Q4 (Athene V2) Web research, information gathering web_search, fetch_webpage
reasoner GPTOSS 120B Deep analysis, chain-of-thought (none - pure reasoning)
cad_designer Q4 (Athene V2) 3D model generation generate_cad_model, image_search
fabricator Q4 (Athene V2) Print preparation, segmentation fabrication.*
coder Devstral 2 123B Code generation, analysis (all code tools)
vision_analyst Gemma 27B Image understanding vision., camera.
analyst Q4 (Athene V2) Memory search, data analysis memory.*
summarizer Hermes 8B Response compression (none - compression only)

Slot Allocation (20 concurrent slots):

Endpoint Port Model Slots Context
Q4 8083 Athene V2 Q4 6 128K
GPTOSS 11434 GPT-OSS 120B 2 65K
Vision 8086 Gemma 27B 4 4K
Coder 8087 Devstral 2 123B 2 16K
Summary 8084 Hermes 8B 4 4K

Configuration:

# Enable parallel agent orchestration
ENABLE_PARALLEL_AGENTS=false           # Master enable flag (disabled by default)
PARALLEL_AGENT_ROLLOUT_PERCENT=0       # Gradual rollout (0-100%)
PARALLEL_AGENT_MAX_TASKS=6             # Max tasks per execution
PARALLEL_AGENT_MAX_CONCURRENT=8        # Max concurrent slot usage
PARALLEL_AGENT_COMPLEXITY_THRESHOLD=0.6 # Query complexity threshold (0.0-1.0)

# Coder Server (Devstral 2 123B via llama.cpp)
LLAMACPP_CODER_ENABLED=true
LLAMACPP_CODER_MODEL=/path/to/devstral2/Q5_K_M/mistralai_Devstral-2-123B-Instruct-2512-Q5_K_M-00001-of-00003.gguf
LLAMACPP_CODER_HOST=http://localhost:8087
LLAMACPP_CODER_PORT=8087
LLAMACPP_CODER_CTX=16384
LLAMACPP_CODER_PARALLEL=2

Performance Benefits:

Scenario Sequential Parallel Improvement
3-task research ~45s ~15s 3x faster
5-task CAD+fab ~90s ~25s 3.6x faster
GPU utilization ~15% ~60% 4x better

How it works:

  1. Complexity Detection: Queries are scored for complexity (keywords, length, multiple questions)
  2. Task Decomposition: Q4 model breaks goal into independent parallelizable tasks with dependencies
  3. Slot Acquisition: Tasks acquire slots with exponential backoff and fallback tiers
  4. Parallel Execution: Independent tasks run concurrently via asyncio.gather()
  5. Synthesis: GPTOSS 120B aggregates all task results into final response

Enabling:

# In .env
ENABLE_PARALLEL_AGENTS=true
PARALLEL_AGENT_ROLLOUT_PERCENT=100

# Restart llama.cpp servers to pick up increased Q4 slots
./ops/scripts/llama/restart.sh

🎤 Voice Settings

# Voice service
VOICE_BASE_URL=http://localhost:8400
VOICE_PREFER_LOCAL=true
VOICE_DEFAULT_VOICE=alloy
VOICE_SAMPLE_RATE=16000

# Local STT (Whisper)
WHISPER_MODEL=base.en
WHISPER_MODEL_PATH=

# Wake Word Detection (Porcupine)
PORCUPINE_ACCESS_KEY=your-key        # Get free key from console.picovoice.ai
WAKE_WORD_ENABLED=true
WAKE_WORD_MODEL_PATH=~/.local/share/kitty/models/Hey-Kitty_en_mac_v4_0_0.ppn
WAKE_WORD_SENSITIVITY=0.5

# Local TTS Provider
LOCAL_TTS_PROVIDER=kokoro            # kokoro (primary) or piper (fallback)

# Kokoro TTS (Primary)
KOKORO_ENABLED=true
KOKORO_MODEL_PATH=~/.local/share/kitty/models/kokoro-v1.0.onnx
KOKORO_VOICES_PATH=~/.local/share/kitty/models/voices-v1.0.bin

# Piper TTS (Fallback)
PIPER_MODEL_DIR=/Users/Shared/Coding/models/Piper

# Cloud TTS (Final Fallback)
OPENAI_TTS_MODEL=tts-1

☁️ Cloud APIs (Optional)

OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
PERPLEXITY_API_KEY=pplx-...
GOOGLE_API_KEY=AIza...               # For Gemini models
ZOO_API_KEY=your-zoo-key
TRIPO_API_KEY=your-tripo-key

☁️ Cloud Model Selection (Shell & Collective)

The Shell page and Collective Intelligence system support direct cloud model selection. When an API key is present, the corresponding cloud model becomes available in the model selector.

Supported Cloud Models (December 2025):

UI ID Provider Model Cost (per query)
gpt5 OpenAI GPT-5.2 ~$0.01-0.06
claude Anthropic Claude Sonnet 4.5 ~$0.01-0.08
perplexity Perplexity Sonar (web-connected) ~$0.001-0.005
gemini Google Gemini 2.5 Flash ~$0.0075-0.03

Features:

  • Models auto-enable when API key is detected
  • Fallback to local Q4 if cloud provider fails
  • Cost tracking in response metadata
  • Streaming support (non-blocking for cloud)

🔌 Integrations

# Home Assistant
HOME_ASSISTANT_TOKEN=your-long-lived-token

# Bambu Labs (via Settings page or .env)
# Configure through http://localhost:4173/?view=settings

# Database
DATABASE_URL=postgresql://kitty:changeme@postgres:5432/kitty
REDIS_URL=redis://127.0.0.1:6379/0

🐛 Troubleshooting

🎤 Voice Service Issues

Local STT not available:

# Check Whisper model exists
ls ~/.cache/whisper/ggml-base.en.bin

# Download if missing
pip install whispercpp
# Model downloads automatically on first use

Kokoro TTS not available:

# Check Kokoro model files exist
ls ~/.local/share/kitty/models/kokoro-v1.0.onnx
ls ~/.local/share/kitty/models/voices-v1.0.bin
ls ~/.local/share/kitty/models/voices/am_michael.bin

# Copy from HowdyTTS if missing
cp /Users/Shared/Coding/HowdyTTS/models/kokoro-v1.0.onnx ~/.local/share/kitty/models/
cp /Users/Shared/Coding/HowdyTTS/models/voices-v1.0.bin ~/.local/share/kitty/models/
mkdir -p ~/.local/share/kitty/models/voices
cp /Users/Shared/Coding/HowdyTTS/models/voices/*.bin ~/.local/share/kitty/models/voices/

# Install dependencies
cd services/voice && pip install -e ".[local]"

Piper TTS not available (fallback):

# Check Piper models exist
ls /Users/Shared/Coding/models/Piper/*.onnx

# Download from: https://github.com/rhasspy/piper/releases
# Copy en_US-amy-medium.onnx and en_US-ryan-medium.onnx

Wake word not working:

# Check Porcupine model exists
ls ~/.local/share/kitty/models/Hey-Kitty_en_mac_v4_0_0.ppn

# Verify PORCUPINE_ACCESS_KEY is set in .env
grep PORCUPINE_ACCESS_KEY .env

# Get free access key from: https://console.picovoice.ai
# Ensure pyaudio is installed
pip install pyaudio

Check voice status:

curl http://localhost:8080/api/voice/status | jq .
# Should show:
#   stt.local_available: true
#   tts.local_available: true
#   tts.active_provider: "kokoro" (or "piper" fallback)
#   capabilities.wake_word: true (if configured)

🐳 Services Not Starting

# Check Docker
docker ps

# View service logs
docker compose -f infra/compose/docker-compose.yml logs brain
tail -f .logs/llamacpp-q4.log

# Restart specific service
docker compose -f infra/compose/docker-compose.yml restart gateway

🎨 UI Not Loading

# Rebuild and restart UI
cd services/ui
npm run build
npm run preview

# Check gateway proxy
curl http://localhost:8080/api/voice/status

🗺️ Roadmap

📊 Current Status (December 2025)

Phase Status Description
Phase 1: Core Foundation ✅ Complete Docker, llama.cpp, FastAPI, Home Assistant
Phase 2: Tool-Aware Agent ✅ Complete ReAct agent, MCP protocol, CAD generation
Phase 3: Autonomous Learning ✅ Complete Goal identification, research pipeline
Phase 3.5: Research Pipeline ✅ Complete 5-phase autonomous research, multi-model
Phase 4: Fabrication Intelligence 🚧 90% Voice service, dashboards, ML in progress
Phase 5: Safety & Access 📋 Planned UniFi Access, zone presence

🆕 Recent Additions (Phase 4.5)

  • Voice Service: Local Whisper STT + Kokoro/Piper TTS with cloud fallback
  • Wake Word Detection: Hands-free activation with Porcupine ("Hey Kitty")
  • Kokoro TTS: High-quality Apple Silicon-optimized local speech synthesis
  • Menu Landing Page: Card-based navigation to all sections
  • Bambu Labs Integration: Login/status via Settings page
  • Gateway Voice Proxy: Full voice API proxying through gateway
  • Markdown Support: UI renders formatted responses, TTS speaks clean text
  • Collective Intelligence: Multi-agent deliberation for better decisions
  • Print Intelligence: Success prediction and recommendations dashboard

📊 Project Stats

Metric Value
Services 12 FastAPI microservices
Docker Containers 20+ (including infrastructure)
Local AI Models 6 (GPT-OSS 120B, Q4, F16, Vision, Summary, Coder)
Voice Models 5 (Whisper, Kokoro ONNX, Piper amy/ryan, Porcupine wake word)
Cloud Providers 4 (OpenAI, Anthropic, Perplexity, Brave)
UI Pages 16 (Menu, Voice, Shell, Projects, etc.)
Supported Printers Bamboo Labs, Elegoo (Klipper), Snapmaker, OctoPrint
Lines of Python 55,000+
Lines of TypeScript 18,000+

🤝 Contributing

We welcome contributions! See CONTRIBUTING.md for guidelines.

# Development setup
pip install -e ".[dev]"
pre-commit install

# Run tests
pytest tests/ -v

# Linting
ruff check services/ --fix
ruff format services/

📄 License

MIT License - see LICENSE for details.


🙏 Acknowledgments

KITTY stands on the shoulders of giants:


Built with care for makers, by makers

KITTY: Because your workshop deserves an AI assistant that actually understands "turn that thing on over there"

🚀 Powered by Mac Studio M3 Ultra, llama.cpp, and a whole lot of caffeine

About

Collaborative AI research‑fabrication OS: a swarm‑based platform where specialized agents (routed across local and cloud models) cooperate to decompose, research and synthesize complex tasks while orchestrating 3D‑printing, CNC and workshop ops. Features dynamic model routing, hierarchical research synthesis, agent registries and slot management.

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published

Contributors 3

  •  
  •  
  •