KITTY: Your Workshop's AI Sidekick 🚀

Knowledgeable Intelligent Tool-using Tabletop Yoda

An offline-first, voice-enabled fabrication lab orchestrator running on Mac Studio M3 Ultra. Think "JARVIS for your workshop" - but it actually works, runs locally, and won't spy on you. 🔒

🌟 Vision: A Maker Space for Technical AI

KITTY is a technical AI habitat - a maker space purpose-built for AI models like Claude, GPT-5, Llama, Qwen, and Mistral to come "live," run research, and directly control fabrication hardware. Built on the energy-efficient Mac Studio M3 Ultra, it provides a secure network interface to 3D printers, CNC machines, test rigs, and sensing equipment.

What makes KITTY different:

🏠 AI Residency Model: Models can spin up for a single query or remain active for deep, after-hours projects
🤖 Bounded Autonomy: One KITTY-owned project per week with controlled access to printers, inventory, and research
♻️ Sustainable Manufacturing: Prioritizes ethically sourced materials with robotic procurement workflows
🚀 Idea → Prototype Pipeline: Investigate materials, estimate costs, run simulations, then orchestrate fabrication
⚡ Energy Efficient: Mac Studio runs indefinitely with minimal power draw

📖 Full Vision & Roadmap: See NorthStar/ProjectVision.md for the complete multi-phase implementation plan.

🛠️ Complete Tech Stack

🧠 AI/ML Infrastructure

Component	Purpose	Technology
Q4 Tool Orchestrator	Fast tool calling, ReAct agent	llama.cpp (Athene V2 Agent Q4_K_M) @ port 8083
Primary Reasoner	Deep reasoning with thinking mode	Ollama (GPT-OSS 120B) @ port 11434
Fallback Reasoner	(DEPRECATED) Legacy fallback only	llama.cpp (Llama 3.3 70B F16) @ port 8082
Vision Model	Image understanding, multimodal	llama.cpp (Gemma 3 27B Q4_K_M) @ port 8086
Summary Model	Response compression	llama.cpp (Hermes 3 8B Q4_K_M) @ port 8084
Coder Model	Code generation specialist	llama.cpp (Devstral 2 123B Q5_K_M) @ port 8087
Cloud Providers	Shell/Collective model selection	OpenAI GPT-5.2, Claude Sonnet 4.5, Perplexity Sonar, Gemini 2.5

🐍 Backend Services (Python 3.11 + FastAPI)

Service	Port	Purpose
Brain	8000	Core orchestrator, ReAct agent, intelligent routing
Gateway	8080	REST API (HAProxy load-balanced, 3 replicas)
CAD	8200	3D model generation (Zoo, Tripo, local CadQuery)
Fabrication	8300	Printer control, queue management, mesh segmentation, Bambu Labs integration
Voice	8400	Real-time STT/TTS with local Whisper + Kokoro/Piper
Discovery	8500	Network device scanning (mDNS, SSDP, Bamboo/Snapmaker UDP)
Broker	8777	Command execution with allow-list safety
Images	8600	Stable Diffusion generation with RQ workers
Mem0 MCP	8765	Semantic memory with vector embeddings

🎨 Frontend (React 18 + TypeScript + Vite)

Component	Purpose
Menu	Landing page with navigation cards to all sections
Voice	Real-time voice assistant with Local/Cloud toggle
Shell	Text chat with function calling, streaming, and cloud model selection
Projects	CAD project management with artifact browser
Fabrication Console	Printer status, queue management, mesh segmentation, job tracking
Settings	Bambu Labs login, preferences, API configuration
I/O Control	Feature toggles and provider management
Research	Autonomous research pipeline with real-time streaming
Vision Gallery	Reference image search and storage
Image Generator	Stable Diffusion generation interface
Material Inventory	Filament catalog and stock management
Print Intelligence	Success prediction and recommendations dashboard
Collective	Multi-agent deliberation for better decisions
Wall Terminal	Full-screen display mode

🐳 Infrastructure (Docker Compose)

Service	Technology	Purpose
Load Balancer	HAProxy	Gateway traffic distribution, health checks
Database	PostgreSQL 16	Audit logs, state, projects (clustering optional)
Cache	Redis 7	Semantic cache, routing state, feature flags
Vector DB	Qdrant 1.11	Memory embeddings, semantic search
Object Storage	MinIO	CAD artifacts, images, snapshots (S3-compatible)
Message Queue	RabbitMQ 3.12	Async events, job distribution
MQTT Broker	Eclipse Mosquitto 2.0	Device communication, printer telemetry
Search Engine	SearXNG	Private, local web search
Smart Home	Home Assistant	Device control, automation
Metrics	Prometheus + Grafana	Observability dashboards
Logs	Loki	Log aggregation
Traces	Tempo	Distributed tracing

🚀 Quick Start

📋 Prerequisites

Hardware: Mac Studio M3 Ultra recommended (256GB+ RAM for large models)
OS: macOS 14+ with Xcode command line tools
Software: Docker Desktop, Python 3.11, Node 20, Homebrew

🛠️ Installation (5 minutes)

# Clone the repository
git clone https://github.com/yourusername/KITT.git
cd KITT

# Install developer tools
pip install --upgrade pip pre-commit
pre-commit install

# Create environment file
cp .env.example .env
# Edit .env with your settings (see Configuration section below)

# Setup artifacts directory (for accessing 3MF/GLB files in Finder)
./ops/scripts/setup-artifacts-dir.sh

# Start everything
./ops/scripts/start-all.sh

🌐 Accessing KITTY

After startup, open your browser to:

Interface	URL	Description
Main UI	http://localhost:4173	Menu landing page with all features
Voice	http://localhost:4173/?view=voice	Real-time voice assistant
API Docs	http://localhost:8080/docs	Swagger/OpenAPI documentation
Grafana	http://localhost:3000	Metrics and dashboards

🎤 Voice Service

KITTY includes a hybrid voice system with local-first processing, wake word detection, and cloud fallback:

🏗️ Architecture

┌──────────────────────────────────────────────────────────────────────────────┐
│                          Voice Service (:8400)                                │
├──────────────────────────────────────────────────────────────────────────────┤
│                                                                               │
│  ┌─────────────────────────────────────────────────────────────────────────┐ │
│  │                      Wake Word Detection (Optional)                      │ │
│  │               Porcupine ("Hey Kitty") → Activates Listening              │ │
│  └──────────────────────────────────┬──────────────────────────────────────┘ │
│                                     │                                         │
│                                     ▼                                         │
│  ┌───────────────────────┐         ┌───────────────────────────────────────┐ │
│  │    STT (Speech)       │         │         TTS (Synthesis)               │ │
│  │  ┌─────────────────┐  │         │  ┌─────────────────────────────────┐  │ │
│  │  │  Local Whisper  │  │         │  │  Kokoro ONNX (Apple Silicon)   │  │ │
│  │  │    (base.en)    │  │         │  │  am_michael (male), af (female) │  │ │
│  │  └───────┬─────────┘  │         │  └───────────────┬─────────────────┘  │ │
│  │          │ Fallback   │         │                  │ Fallback           │ │
│  │  ┌───────▼─────────┐  │         │  ┌───────────────▼─────────────────┐  │ │
│  │  │  OpenAI API     │  │         │  │      Piper TTS (Legacy)         │  │ │
│  │  │    Whisper      │  │         │  │    amy/ryan (22050 Hz)          │  │ │
│  │  └─────────────────┘  │         │  └───────────────┬─────────────────┘  │ │
│  └───────────────────────┘         │                  │ Fallback           │
│                                    │  ┌───────────────▼─────────────────┐  │ │
│                                    │  │       OpenAI TTS (tts-1)        │  │ │
│                                    │  └─────────────────────────────────┘  │ │
│                                    └───────────────────────────────────────┘ │
│                                                                               │
│  ┌─────────────────────────────────────────────────────────────────────────┐ │
│  │                        WebSocket Handler                                 │ │
│  Real-time streaming • PTT or Always-Listening • Adaptive chunking     │ │
│  └─────────────────────────────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────────────────────┘

🎭 Local Voice Models

Wake Word Detection (Porcupine)

Engine: Picovoice Porcupine v4.0
Wake word: "Hey Kitty" (custom trained model)
Location: ~/.local/share/kitty/models/Hey-Kitty_en_mac_v4_0_0.ppn
Features: Low CPU usage, configurable sensitivity

Speech-to-Text (Whisper.cpp)

Model: base.en (English-optimized, ~150MB)
Location: ~/.cache/whisper/ggml-base.en.bin
Features: VAD, real-time transcription

Text-to-Speech (Kokoro ONNX) - Primary

Model: Kokoro v1.0 ONNX (~82MB, optimized for Apple Silicon)
Location: ~/.local/share/kitty/models/kokoro-v1.0.onnx
Voices: am_michael (male cowboy), af (female), plus 20+ additional voices
Sample rate: 24000 Hz
Features: Adaptive text chunking, CoreML acceleration, streaming

Text-to-Speech (Piper) - Fallback

Models: en_US-amy-medium (female), en_US-ryan-medium (male)
Location: /Users/Shared/Coding/models/Piper/
Sample rate: 22050 Hz

Voice Mapping (TTS):

alloy, nova, shimmer → af/amy (female)
echo, fable, onyx → am_michael/ryan (male)

🎛️ Voice Configuration

# Voice service
VOICE_BASE_URL=http://localhost:8400
VOICE_PREFER_LOCAL=true              # Use local models first
VOICE_DEFAULT_VOICE=alloy            # Default TTS voice
VOICE_SAMPLE_RATE=16000              # Audio sample rate

# Local Whisper STT
WHISPER_MODEL=base.en                # Model size (tiny, base, small, medium, large)
WHISPER_MODEL_PATH=                  # Optional custom path

# Wake Word Detection (Porcupine)
PORCUPINE_ACCESS_KEY=your-key        # Get from console.picovoice.ai
WAKE_WORD_ENABLED=true               # Enable wake word detection
WAKE_WORD_MODEL_PATH=~/.local/share/kitty/models/Hey-Kitty_en_mac_v4_0_0.ppn
WAKE_WORD_SENSITIVITY=0.5            # Detection sensitivity (0.0-1.0)

# Local TTS Provider Selection
LOCAL_TTS_PROVIDER=kokoro            # Primary: kokoro, fallback: piper

# Kokoro TTS (Primary)
KOKORO_ENABLED=true
KOKORO_MODEL_PATH=~/.local/share/kitty/models/kokoro-v1.0.onnx
KOKORO_VOICES_PATH=~/.local/share/kitty/models/voices-v1.0.bin
KOKORO_VOICE=am_michael              # am_michael (male), af (female)

# Piper TTS (Fallback)
PIPER_MODEL_DIR=/Users/Shared/Coding/models/Piper

# Cloud TTS (Final Fallback)
OPENAI_TTS_MODEL=tts-1

🚀 Starting Voice Service

# Start voice service standalone
./ops/scripts/start-voice-service.sh

# Stop voice service
./ops/scripts/stop-voice-service.sh

# Check voice status
curl http://localhost:8080/api/voice/status | jq .

📡 Voice API Endpoints

Endpoint	Method	Description
`/api/voice/status`	GET	Provider status (local/cloud availability)
`/api/voice/transcribe`	POST	Transcribe audio to text
`/api/voice/synthesize`	POST	Convert text to speech
`/api/voice/ws`	WebSocket	Real-time bidirectional streaming
`/api/voice/chat`	POST	Full voice chat (STT → LLM → TTS)

🔌 WebSocket Events

Event	Direction	Description
`config`	Client → Server	Set session config (mode, voice, prefer_local)
`audio.chunk`	Client → Server	Base64 encoded audio chunk
`audio.end`	Client → Server	Signal end of speech
`wake_word.toggle`	Client → Server	Enable/disable wake word detection
`wake_word.detected`	Server → Client	Wake word triggered
`transcript`	Server → Client	STT result (partial or final)
`response.text`	Server → Client	Streaming text response
`response.audio`	Server → Client	TTS audio chunk (base64)
`function.call`	Server → Client	Tool invocation started
`function.result`	Server → Client	Tool execution result

🎨 Web UI

📱 Menu Landing Page

The UI starts with a Menu page showing all available sections:

Section	Icon	Description
Voice	🎙️	Real-time voice assistant with STT/TTS
Chat Shell	💬	Text chat with function calling
Fabrication Console	🎨	Text-to-3D model generation
Projects	📁	CAD project management
Dashboard	🖨️	Printers, cameras, and material inventory
Media Hub	🖼️	Vision gallery and image generation
Research Hub	🔬	Research, results, and scheduling
Collective	👥	Multi-agent deliberation for better decisions
Intelligence	📈	Analytics and insights dashboard
Wall Terminal	🖥️	Full-screen display mode
Settings	⚙️	Bambu Labs, preferences, API config

🏃 Running the UI

cd services/ui

# Development mode (hot reload)
npm run dev --host 0.0.0.0 --port 4173

# Production build
npm run build
npm run preview

🎛️ UI Configuration

KITTY_UI_BASE=http://localhost:4173      # UI base URL
VITE_API_BASE=http://localhost:8080      # Gateway API URL

💻 AI Coding Assistant

KITTY includes a local-first AI coding assistant powered by Devstral 2 123B, Mistral's agentic coding model.

Components

Component	Description
kitty-code	Textual TUI for interactive coding sessions
coder-agent	Backend service with Plan→Code→Test workflow
Web UI	Coding page at `/coding` with SSE streaming

Quick Start

# Install kitty-code CLI
cd services/kitty-code
pip install -e .

# Run interactive session
kitty-code "Write a REST API with FastAPI"

# Or use the web UI at http://localhost:4173/coding

Model: Devstral 2 123B

Parameters: 125 billion
Quantization: Q5_K_M (~82GB sharded GGUF)
Context: 16,384 tokens
Backend: llama.cpp (natively handles sharded files)
Speed: ~5 tokens/second on Mac Studio M3 Ultra

Model Download

# Download via huggingface-cli (~82GB)
huggingface-cli download bartowski/mistralai_Devstral-2-123B-Instruct-2512-GGUF \
  --include "mistralai_Devstral-2-123B-Instruct-2512-Q5_K_M/*" \
  --local-dir ~/models/devstral2/Q5_K_M

Configuration

# Enable Devstral 2 via llama.cpp
LLAMACPP_CODER_ENABLED=true
LLAMACPP_CODER_MODEL=/path/to/devstral2/Q5_K_M/.../00001-of-00003.gguf
LLAMACPP_CODER_CTX=16384
LLAMACPP_CODER_PARALLEL=2
LLAMACPP_CODER_TEMPERATURE=0.2

MCP Integration

kitty-code auto-discovers KITTY services as MCP servers:

kitty_brain  → http://localhost:8000/mcp  (Query routing, research)
kitty_cad    → http://localhost:8200/mcp  (3D model generation)
kitty_fab    → http://localhost:8300/mcp  (Printer control)
kitty_discovery → http://localhost:8500/mcp  (Device scanning)

See services/kitty-code/README.md for full documentation.

⚡ Command Reference

🚀 Start/Stop KITTY

# Start everything (llama.cpp + Docker + Voice)
./ops/scripts/start-all.sh

# Stop everything
./ops/scripts/stop-all.sh

# Start only voice service
./ops/scripts/start-voice-service.sh

# Stop only voice service
./ops/scripts/stop-voice-service.sh

# Check service status
docker compose -f infra/compose/docker-compose.yml ps

💻 CLI Interface

# Install CLI (one-time)
pip install -e services/cli/

# Launch interactive shell
kitty-cli shell

# Inside the shell:
> /help                              # Show available commands
> /voice                             # Toggle voice mode
> /research <query>                  # Autonomous research
> /cad Create a hex box              # Generate CAD model
> /split /path/to/model.stl         # Split oversized model for printing
> /remember Ordered more PLA         # Save long-term note
> /memories PLA                      # Recall saved notes
> /vision gandalf rubber duck        # Search reference images
> /generate futuristic drone         # Generate SD image
> /collective council k=3 Compare... # Multi-agent collaboration
> /exit                              # Exit shell

# Quick one-off queries
kitty-cli say "What printers are online?"
kitty-cli say "Turn on bench lights"

🚀 Unified Launcher TUI

# Install launcher (one-time)
pip install -e services/launcher/

# Launch unified control center
kitty

# TUI Shortcuts:
# k - Start KITTY stack
# x - Stop stack
# c - Launch CLI
# v - Launch Voice interface
# m - Launch Model Manager
# o - Open Web Console
# i - Launch I/O dashboard
# q - Quit

🎯 Core Features

🧭 Intelligent Routing

Query arrives → Local model (free, instant) → Confidence check
                                                    ↓
                          High confidence? ──→ Return answer
                                                    ↓
                          Low confidence? ──→ Escalate to cloud (budget gated)

🤖 ReAct Agent with Tool Use

KITTY uses a ReAct (Reasoning + Acting) agent that can:

Reason about complex multi-step tasks
Use tools via Model Context Protocol (MCP)
Observe results and adapt strategy
Iterate until task completion

🛡️ Safety-First Design

Hazard workflows: Two-step confirmation for dangerous operations
Command allow-lists: Only pre-approved system commands execute
Audit logging: Every tool use logged to PostgreSQL
Budget gates: Cloud API calls require password confirmation

🏗️ CAD Generation

Generate 3D models from natural language:

kitty-cli cad "Create a phone stand with 45° angle and cable management"

Providers (automatic fallback):

Zoo API (parametric STEP)
Tripo (mesh STL/OBJ)
Local CadQuery (offline)
Local FreeCAD (offline)

🖨️ Print Queue

Multi-printer coordination with intelligent scheduling:

# List queue
./scripts/queue-cli.sh list

# Submit job
./scripts/queue-cli.sh submit /path/to/model.stl "bracket_v2" pla_black_esun 3

# Watch queue
./scripts/queue-cli.sh watch

Supported Printers:

Bamboo Labs H2D (MQTT)
Elegoo OrangeStorm Giga (Klipper)
Snapmaker Artisan (UDP)
Any OctoPrint/Moonraker instance

🔪 Mesh Segmentation

Split oversized 3D models into printable parts (supports 3MF and STL):

# Via CLI
kitty-cli shell
> /split /path/to/large_model.3mf

# Via API
curl -X POST http://localhost:8300/api/segmentation/segment \
  -H "Content-Type: application/json" \
  -d '{"mesh_path": "/path/to/model.3mf", "printer_id": "bamboo_h2d"}'

Features:

3MF native: Prefers 3MF input/output for slicer compatibility (STL also supported)
Automatic splitting: Detects oversized models and splits into printer-fit parts
SDF hollowing: Reduce material usage with configurable wall thickness
Alignment joints: Dowel pin holes for accurate part assembly
3MF assembly output: Single file with all parts, colors, and metadata
Configurable printers: Load build volumes from printer_config.yaml

Segmentation Options:

Option	Description	Default
`printer_id`	Target printer for build volume	auto-detect
`enable_hollowing`	Enable SDF-based hollowing	`true`
`wall_thickness_mm`	Wall thickness for hollowing	`2.0`
`joint_type`	Joint type: `dowel`, `dovetail`, `pyramid`, `none`	`dowel`
`max_parts`	Maximum parts to generate	`10`

Interfaces:

CLI: /split command in kitty-cli shell
Voice: "Split this model for printing"
Web UI: MeshSegmenter component in Fabrication Console
API: POST /api/segmentation/segment on Fabrication service

🏗️ Architecture

┌──────────────────────────────────────────────────────────────────────────────┐
│                           Mac Studio M3 Ultra Host                            │
│                                                                               │
│  ┌─────────────────────────────────────────────────────────────────────────┐ │
│  │                         Local AI Inference Layer                         │ │
│  │                                                                          │ │
│  │  ┌────────────────────────────────────────────────────────────────────┐ │ │
│  │  │                  Ollama (Primary Reasoner) :11434                   │ │ │
│  │  │                      GPT-OSS 120B (Thinking Mode)                   │ │ │
│  │  └────────────────────────────────────────────────────────────────────┘ │ │
│  │                                                                          │ │
│  │  ┌────────────────────────────────────────────────────────────────────┐ │ │
│  │  │                  llama.cpp Servers (Metal GPU)                      │ │ │
│  │  │  ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │ │
│  │  │  │Q4 Tool   │ │F16 DEPR  │ │Vision    │ │Summary   │ │Coder     │ │ │ │
│  │  │  │:8083     │ │:8082     │ │:8086     │ │:8084     │ │:8087     │ │ │ │
│  │  │  │Athene V2 │ │(Fallback)│ │Gemma 27B │ │Hermes 8B │ │Qwen 32B  │ │ │ │
│  │  │  └──────────┘ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │ │ │
│  │  └────────────────────────────────────────────────────────────────────┘ │ │
│  └─────────────────────────────────────────────────────────────────────────┘ │
│                                                                               │
│  ┌─────────────────────────────────────────────────────────────────────────┐ │
│  │                         Docker Compose Services                          │ │
│  │                                                                          │ │
│  │  ┌─────────────────────────────────────────────────────────────────────┐│ │
│  │  │                        HAProxy :8080 (Load Balancer)                ││ │
│  │  └───────────────────────────────┬─────────────────────────────────────┘│ │
│  │                                  │                                       │ │
│  │  ┌───────────────────────────────▼─────────────────────────────────────┐│ │
│  │  │                         Gateway (x3 replicas)                       ││ │
│  │  │                    REST API, Auth, Routing, Proxy                   ││ │
│  │  └───────────────────────────────┬─────────────────────────────────────┘│ │
│  │                                  │                                       │ │
│  │  ┌───────────────────────────────▼─────────────────────────────────────┐│ │
│  │  │                          Brain :8000                                ││ │
│  │  │          Orchestrator • ReAct Agent • Research Pipeline             ││ │
│  │  └──┬────────┬────────┬────────┬────────┬────────┬────────┬───────────┘│ │
│  │     │        │        │        │        │        │        │             │ │
│  │  ┌──▼──┐ ┌──▼──┐ ┌──▼──┐ ┌──▼──┐ ┌──▼──┐ ┌──▼──┐ ┌──▼──┐            │ │
│  │  │ CAD │ │ Fab │ │Voice│ │Disc │ │Brok │ │Imgs │ │Mem0 │            │ │
│  │  │:8200│ │:8300│ │:8400│ │:8500│ │:8777│ │:8600│ │:8765│            │ │
│  │  └─────┘ └─────┘ └─────┘ └─────┘ └─────┘ └─────┘ └─────┘            │ │
│  │                                                                       │ │
│  │  ┌───────────────────────────────────────────────────────────────────┐│ │
│  │  │                    Storage & Infrastructure                       ││ │
│  │  │  PostgreSQL │ Redis │ Qdrant │ MinIO │ RabbitMQ │ Mosquitto      ││ │
│  │  └───────────────────────────────────────────────────────────────────┘│ │
│  └─────────────────────────────────────────────────────────────────────────┘ │
│                                                                               │
│  ┌─────────────────────────────────────────────────────────────────────────┐ │
│  │                              Web UI :4173                                │ │
│  │  Menu │ Voice │ Shell │ Projects │ Fab │ Research │ Settings │ ...     │ │
│  └─────────────────────────────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────────────────────┘
                │                           │                    │
     ┌──────────▼──────────┐     ┌──────────▼──────────┐   ┌────▼────┐
     │    Home Assistant   │     │      3D Printers    │   │ Cameras │
     │    (MQTT + REST)    │     │  Bamboo │ Elegoo    │   │  (Pi)   │
     │  Lights, Climate,   │     │  Snapmaker │ Others │   │         │
     │  Sensors, Locks     │     │  (OctoPrint/Klipper)│   │         │
     └─────────────────────┘     └─────────────────────┘   └─────────┘

🎛️ Configuration

📝 Core Settings (.env)

# User & Safety
USER_NAME=YourName
KITTY_USER_NAME=YourName
HAZARD_CONFIRMATION_PHRASE="Confirm: proceed"
API_OVERRIDE_PASSWORD=omega

# Budget
BUDGET_PER_TASK_USD=0.50
CONFIDENCE_THRESHOLD=0.80

🧠 AI Models

# Ollama (Primary Reasoner)
LOCAL_REASONER_PROVIDER=ollama
OLLAMA_HOST=http://host.docker.internal:11434
OLLAMA_MODEL=gpt-oss:120b
OLLAMA_THINK=medium

# llama.cpp Q4 (Tool Orchestrator)
LLAMACPP_Q4_HOST=http://host.docker.internal:8083
LLAMACPP_Q4_MODEL=athene-v2-agent/Athene-V2-Agent-Q4_K_M.gguf
LLAMACPP_Q4_PORT=8083

# DEPRECATED: llama.cpp F16 (Legacy Fallback - only used when LOCAL_REASONER_PROVIDER=llamacpp)
# LLAMACPP_F16_HOST=http://host.docker.internal:8082
# LLAMACPP_F16_MODEL=llama-3-70b/Llama-3.3-70B-Instruct-F16/...gguf
# LLAMACPP_F16_PORT=8082

# Vision
LLAMACPP_VISION_MODEL=gemma-3-27b-it-GGUF/gemma-3-27b-it-q4_k_m.gguf
LLAMACPP_VISION_MMPROJ=gemma3_27b_mmproj/mmproj-model-f16.gguf
LLAMACPP_VISION_PORT=8086

🧭 Semantic Tool Selection

KITTY uses embedding-based semantic search to intelligently select relevant tools for each query, reducing context usage by ~90% when many tools are available.

How it works:

Tool definitions are converted to text embeddings using all-MiniLM-L6-v2 (384 dimensions)
Embeddings are cached in Redis for cluster-wide sharing
For each query, cosine similarity finds the most relevant tools
Only top-k matching tools are passed to the model (instead of all 50+)

Benefits:

Context savings: ~90% reduction (e.g., 600 tokens vs 7,500 for 50 tools)
Better tool selection: Semantic matching beats keyword heuristics
Cluster-ready: Redis caching shares embeddings across nodes
Fast: ~10-15ms per search after initial model load

# Semantic tool selection (default: enabled)
USE_SEMANTIC_TOOL_SELECTION=true
EMBEDDING_MODEL=all-MiniLM-L6-v2
TOOL_SEARCH_TOP_K=5
TOOL_SEARCH_THRESHOLD=0.3

Disabling: Set USE_SEMANTIC_TOOL_SELECTION=false to fall back to keyword-based selection.

🤖 Parallel Agent Orchestration (Experimental)

KITTY supports parallel multi-agent orchestration for complex, multi-step goals. When enabled, complex queries are decomposed into parallelizable tasks executed concurrently across multiple specialized agents.

Architecture:

User Goal → Decompose (Q4) → Dependency Graph → Parallel Execute → Synthesize (GPTOSS)
                                    │
                    ┌───────────────┼───────────────┐
                    ▼               ▼               ▼
                [Task 1]       [Task 2]       [Task 3]
                researcher     cad_designer   fabricator
                    │               │               │
                    └───────────────┼───────────────┘
                                    ▼
                            Final Response

Specialized Agents:

Agent	Primary Model	Purpose	Tool Allowlist
researcher	Q4 (Athene V2)	Web research, information gathering	web_search, fetch_webpage
reasoner	GPTOSS 120B	Deep analysis, chain-of-thought	(none - pure reasoning)
cad_designer	Q4 (Athene V2)	3D model generation	generate_cad_model, image_search
fabricator	Q4 (Athene V2)	Print preparation, segmentation	fabrication.*
coder	Devstral 2 123B	Code generation, analysis	(all code tools)
vision_analyst	Gemma 27B	Image understanding	vision., camera.
analyst	Q4 (Athene V2)	Memory search, data analysis	memory.*
summarizer	Hermes 8B	Response compression	(none - compression only)

Slot Allocation (20 concurrent slots):

Endpoint	Port	Model	Slots	Context
Q4	8083	Athene V2 Q4	6	128K
GPTOSS	11434	GPT-OSS 120B	2	65K
Vision	8086	Gemma 27B	4	4K
Coder	8087	Devstral 2 123B	2	16K
Summary	8084	Hermes 8B	4	4K

Configuration:

# Enable parallel agent orchestration
ENABLE_PARALLEL_AGENTS=false           # Master enable flag (disabled by default)
PARALLEL_AGENT_ROLLOUT_PERCENT=0       # Gradual rollout (0-100%)
PARALLEL_AGENT_MAX_TASKS=6             # Max tasks per execution
PARALLEL_AGENT_MAX_CONCURRENT=8        # Max concurrent slot usage
PARALLEL_AGENT_COMPLEXITY_THRESHOLD=0.6 # Query complexity threshold (0.0-1.0)

# Coder Server (Devstral 2 123B via llama.cpp)
LLAMACPP_CODER_ENABLED=true
LLAMACPP_CODER_MODEL=/path/to/devstral2/Q5_K_M/mistralai_Devstral-2-123B-Instruct-2512-Q5_K_M-00001-of-00003.gguf
LLAMACPP_CODER_HOST=http://localhost:8087
LLAMACPP_CODER_PORT=8087
LLAMACPP_CODER_CTX=16384
LLAMACPP_CODER_PARALLEL=2

Performance Benefits:

Scenario	Sequential	Parallel	Improvement
3-task research	~45s	~15s	3x faster
5-task CAD+fab	~90s	~25s	3.6x faster
GPU utilization	~15%	~60%	4x better

How it works:

Complexity Detection: Queries are scored for complexity (keywords, length, multiple questions)
Task Decomposition: Q4 model breaks goal into independent parallelizable tasks with dependencies
Slot Acquisition: Tasks acquire slots with exponential backoff and fallback tiers
Parallel Execution: Independent tasks run concurrently via asyncio.gather()
Synthesis: GPTOSS 120B aggregates all task results into final response

Enabling:

# In .env
ENABLE_PARALLEL_AGENTS=true
PARALLEL_AGENT_ROLLOUT_PERCENT=100

# Restart llama.cpp servers to pick up increased Q4 slots
./ops/scripts/llama/restart.sh

🎤 Voice Settings

# Voice service
VOICE_BASE_URL=http://localhost:8400
VOICE_PREFER_LOCAL=true
VOICE_DEFAULT_VOICE=alloy
VOICE_SAMPLE_RATE=16000

# Local STT (Whisper)
WHISPER_MODEL=base.en
WHISPER_MODEL_PATH=

# Wake Word Detection (Porcupine)
PORCUPINE_ACCESS_KEY=your-key        # Get free key from console.picovoice.ai
WAKE_WORD_ENABLED=true
WAKE_WORD_MODEL_PATH=~/.local/share/kitty/models/Hey-Kitty_en_mac_v4_0_0.ppn
WAKE_WORD_SENSITIVITY=0.5

# Local TTS Provider
LOCAL_TTS_PROVIDER=kokoro            # kokoro (primary) or piper (fallback)

# Kokoro TTS (Primary)
KOKORO_ENABLED=true
KOKORO_MODEL_PATH=~/.local/share/kitty/models/kokoro-v1.0.onnx
KOKORO_VOICES_PATH=~/.local/share/kitty/models/voices-v1.0.bin

# Piper TTS (Fallback)
PIPER_MODEL_DIR=/Users/Shared/Coding/models/Piper

# Cloud TTS (Final Fallback)
OPENAI_TTS_MODEL=tts-1

☁️ Cloud APIs (Optional)

OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
PERPLEXITY_API_KEY=pplx-...
GOOGLE_API_KEY=AIza...               # For Gemini models
ZOO_API_KEY=your-zoo-key
TRIPO_API_KEY=your-tripo-key

☁️ Cloud Model Selection (Shell & Collective)

The Shell page and Collective Intelligence system support direct cloud model selection. When an API key is present, the corresponding cloud model becomes available in the model selector.

Supported Cloud Models (December 2025):

UI ID	Provider	Model	Cost (per query)
`gpt5`	OpenAI	GPT-5.2	~$0.01-0.06
`claude`	Anthropic	Claude Sonnet 4.5	~$0.01-0.08
`perplexity`	Perplexity	Sonar (web-connected)	~$0.001-0.005
`gemini`	Google	Gemini 2.5 Flash	~$0.0075-0.03

Features:

Models auto-enable when API key is detected
Fallback to local Q4 if cloud provider fails
Cost tracking in response metadata
Streaming support (non-blocking for cloud)

🔌 Integrations

# Home Assistant
HOME_ASSISTANT_TOKEN=your-long-lived-token

# Bambu Labs (via Settings page or .env)
# Configure through http://localhost:4173/?view=settings

# Database
DATABASE_URL=postgresql://kitty:changeme@postgres:5432/kitty
REDIS_URL=redis://127.0.0.1:6379/0

🐛 Troubleshooting

🎤 Voice Service Issues

Local STT not available:

# Check Whisper model exists
ls ~/.cache/whisper/ggml-base.en.bin

# Download if missing
pip install whispercpp
# Model downloads automatically on first use

Kokoro TTS not available:

# Check Kokoro model files exist
ls ~/.local/share/kitty/models/kokoro-v1.0.onnx
ls ~/.local/share/kitty/models/voices-v1.0.bin
ls ~/.local/share/kitty/models/voices/am_michael.bin

# Copy from HowdyTTS if missing
cp /Users/Shared/Coding/HowdyTTS/models/kokoro-v1.0.onnx ~/.local/share/kitty/models/
cp /Users/Shared/Coding/HowdyTTS/models/voices-v1.0.bin ~/.local/share/kitty/models/
mkdir -p ~/.local/share/kitty/models/voices
cp /Users/Shared/Coding/HowdyTTS/models/voices/*.bin ~/.local/share/kitty/models/voices/

# Install dependencies
cd services/voice && pip install -e ".[local]"

Piper TTS not available (fallback):

# Check Piper models exist
ls /Users/Shared/Coding/models/Piper/*.onnx

# Download from: https://github.com/rhasspy/piper/releases
# Copy en_US-amy-medium.onnx and en_US-ryan-medium.onnx

Wake word not working:

# Check Porcupine model exists
ls ~/.local/share/kitty/models/Hey-Kitty_en_mac_v4_0_0.ppn

# Verify PORCUPINE_ACCESS_KEY is set in .env
grep PORCUPINE_ACCESS_KEY .env

# Get free access key from: https://console.picovoice.ai
# Ensure pyaudio is installed
pip install pyaudio

Check voice status:

curl http://localhost:8080/api/voice/status | jq .
# Should show:
#   stt.local_available: true
#   tts.local_available: true
#   tts.active_provider: "kokoro" (or "piper" fallback)
#   capabilities.wake_word: true (if configured)

🐳 Services Not Starting

# Check Docker
docker ps

# View service logs
docker compose -f infra/compose/docker-compose.yml logs brain
tail -f .logs/llamacpp-q4.log

# Restart specific service
docker compose -f infra/compose/docker-compose.yml restart gateway

🎨 UI Not Loading

# Rebuild and restart UI
cd services/ui
npm run build
npm run preview

# Check gateway proxy
curl http://localhost:8080/api/voice/status

🗺️ Roadmap

📊 Current Status (December 2025)

Phase	Status	Description
Phase 1: Core Foundation	✅ Complete	Docker, llama.cpp, FastAPI, Home Assistant
Phase 2: Tool-Aware Agent	✅ Complete	ReAct agent, MCP protocol, CAD generation
Phase 3: Autonomous Learning	✅ Complete	Goal identification, research pipeline
Phase 3.5: Research Pipeline	✅ Complete	5-phase autonomous research, multi-model
Phase 4: Fabrication Intelligence	🚧 90%	Voice service, dashboards, ML in progress
Phase 5: Safety & Access	📋 Planned	UniFi Access, zone presence

🆕 Recent Additions (Phase 4.5)

Voice Service: Local Whisper STT + Kokoro/Piper TTS with cloud fallback
Wake Word Detection: Hands-free activation with Porcupine ("Hey Kitty")
Kokoro TTS: High-quality Apple Silicon-optimized local speech synthesis
Menu Landing Page: Card-based navigation to all sections
Bambu Labs Integration: Login/status via Settings page
Gateway Voice Proxy: Full voice API proxying through gateway
Markdown Support: UI renders formatted responses, TTS speaks clean text
Collective Intelligence: Multi-agent deliberation for better decisions
Print Intelligence: Success prediction and recommendations dashboard

📊 Project Stats

Metric	Value
Services	12 FastAPI microservices
Docker Containers	20+ (including infrastructure)
Local AI Models	6 (GPT-OSS 120B, Q4, F16, Vision, Summary, Coder)
Voice Models	5 (Whisper, Kokoro ONNX, Piper amy/ryan, Porcupine wake word)
Cloud Providers	4 (OpenAI, Anthropic, Perplexity, Brave)
UI Pages	16 (Menu, Voice, Shell, Projects, etc.)
Supported Printers	Bamboo Labs, Elegoo (Klipper), Snapmaker, OctoPrint
Lines of Python	55,000+
Lines of TypeScript	18,000+

🤝 Contributing

We welcome contributions! See CONTRIBUTING.md for guidelines.

# Development setup
pip install -e ".[dev]"
pre-commit install

# Run tests
pytest tests/ -v

# Linting
ruff check services/ --fix
ruff format services/

📄 License

MIT License - see LICENSE for details.

🙏 Acknowledgments

KITTY stands on the shoulders of giants:

llama.cpp: Local LLM inference
Whisper.cpp: Local speech recognition
Kokoro: High-quality local TTS (ONNX)
Piper: Fast local text-to-speech
Picovoice Porcupine: Wake word detection
FastAPI: Python web framework
Home Assistant: Smart home integration
Zoo: Parametric CAD API
Qdrant: Vector database

Built with care for makers, by makers

_{KITTY: Because your workshop deserves an AI assistant that actually understands "turn that thing on over there"}

_{🚀 Powered by Mac Studio M3 Ultra, llama.cpp, and a whole lot of caffeine}

Name		Name	Last commit message	Last commit date
Latest commit History 570 Commits
.github/workflows		.github/workflows
NorthStar		NorthStar
Research		Research
config		config
data		data
docs		docs
infra		infra
knowledge		knowledge
logs		logs
ops		ops
plans		plans
scripts		scripts
services		services
specs		specs
test-results		test-results
test_outputs		test_outputs
tests		tests
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
.gitleaks.toml		.gitleaks.toml
.ollama.pid		.ollama.pid
.pre-commit-config.yaml		.pre-commit-config.yaml
ENV_REPORT.md		ENV_REPORT.md
IO_CONTROL_PLAN.md		IO_CONTROL_PLAN.md
KITT.code-workspace		KITT.code-workspace
KITTY_OperationsManual.md		KITTY_OperationsManual.md
PR_DESCRIPTION.md		PR_DESCRIPTION.md
QUICK_REFERENCE.txt		QUICK_REFERENCE.txt
README.md		README.md
README_OLD.md		README_OLD.md
SD_SECTION.md		SD_SECTION.md
package-lock.json		package-lock.json
package.json		package.json
test_kokoro_emma.wav		test_kokoro_emma.wav
test_kokoro_female.wav		test_kokoro_female.wav
test_kokoro_output.wav		test_kokoro_output.wav
test_unified_prompts.py		test_unified_prompts.py

Jmi2020/KITT

Folders and files

Latest commit

History

Repository files navigation

KITTY: Your Workshop's AI Sidekick 🚀

🌟 Vision: A Maker Space for Technical AI

🛠️ Complete Tech Stack

🧠 AI/ML Infrastructure

🐍 Backend Services (Python 3.11 + FastAPI)

🎨 Frontend (React 18 + TypeScript + Vite)

🐳 Infrastructure (Docker Compose)

🚀 Quick Start

📋 Prerequisites

🛠️ Installation (5 minutes)

🌐 Accessing KITTY

🎤 Voice Service

🏗️ Architecture

🎭 Local Voice Models

🎛️ Voice Configuration

🚀 Starting Voice Service

📡 Voice API Endpoints

🔌 WebSocket Events

🎨 Web UI

📱 Menu Landing Page

🏃 Running the UI

🎛️ UI Configuration

💻 AI Coding Assistant

Components

Quick Start

Model: Devstral 2 123B

Model Download

Configuration

MCP Integration

⚡ Command Reference

🚀 Start/Stop KITTY

💻 CLI Interface

🚀 Unified Launcher TUI

🎯 Core Features

🧭 Intelligent Routing

🤖 ReAct Agent with Tool Use

🛡️ Safety-First Design

🏗️ CAD Generation

🖨️ Print Queue

🔪 Mesh Segmentation

🏗️ Architecture

🎛️ Configuration

📝 Core Settings (.env)

🧠 AI Models

🧭 Semantic Tool Selection

🤖 Parallel Agent Orchestration (Experimental)

🎤 Voice Settings

☁️ Cloud APIs (Optional)

☁️ Cloud Model Selection (Shell & Collective)

🔌 Integrations

🐛 Troubleshooting

🎤 Voice Service Issues

🐳 Services Not Starting

🎨 UI Not Loading

🗺️ Roadmap

📊 Current Status (December 2025)

🆕 Recent Additions (Phase 4.5)

📊 Project Stats

🤝 Contributing

📄 License

🙏 Acknowledgments

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 3

Uh oh!

Languages

Packages