๐ง Currently In Development Research Release
Sanctum Cochlea is a fork of the letta-voice experiment, audio ingest system for Sanctum and Letta installations.
This repository demonstrates how to use Letta and LiveKit to create low-latency voice agents with memory, tool execution, and persistence. This is a research implementation based on the original Letta Voice architecture, designed to be compatible with:
- Direct Letta Integration - Standard Letta Cloud or self-hosted instances
- Broca Middleware - Advanced routing and caching layer
- Thalamus Refinement - Transcript cleaning and structuring pipeline
See Audio Pipeline Architecture for detailed information about the multi-layer approach.
For immediate setup instructions, see our Quick Reference Card
This project includes detailed documentation to help you get started:
- ๐ Documentation Index - Complete overview and navigation
- ๐ง Basic Setup Guide - Initial installation and configuration
- ๐ API Endpoints Setup - LiveKit, Deepgram, Cartesia, and Sanctum setup
- ๐ VPS Connection Guide - Connect to self-hosted Sanctum instances
- โ๏ธ Environment Configuration - All environment variables and options
- ๐ Troubleshooting Guide - Common issues and solutions
Creates low-latency voice agents using:
- Sanctum Instance - Your self-hosted AI agent platform (supports OpenAI, Anthropic, Ollama, and other LLM providers)
- LiveKit - Real-time voice communication
- Deepgram - Speech-to-text conversion
- Cartesia - Text-to-speech conversion
User Voice โ LiveKit โ Sanctum Cochlea Agent โ Letta Instance โ AI Models
โ
Speech Processing (STT/TTS)
Sanctum Cochlea is designed to integrate with the broader SanctumOS modular architecture, functioning as a specialized sensory processing module within a larger agentic operating system.
SanctumOS is a modular, self-hosted agentic operating system designed to run fully autonomous, context-rich AI agents. It emphasizes:
- Modularity: Clean, plugin-driven integration of specialized components
- Privacy: Complete self-hosting with no vendor lock-in
- Autonomy: Fully autonomous agents with infinite memory management
- Real-time Processing: Sensory filtering and event processing capabilities
Within the SanctumOS ecosystem, Cochlea serves as the sensory input module, responsible for:
- Audio Capture: Real-time audio stream processing via LiveKit
- Speech-to-Text: Converting audio to text using Deepgram
- Sensory Filtering: Preprocessing and filtering audio data
- Data Preparation: Formatting sensory input for downstream processing
The target architecture implements a sophisticated multi-layer pipeline:
Cochlea โ Thalamus โ Broca โ Prime Agent โ Voicebox
Layer Breakdown:
- Cochlea (Audio Input): Raw audio capture and STT conversion
- Thalamus (Refinement): Clean, structure, and refine transcripts
- Broca (Routing): Route events with proper queue management
- Prime Agent (Processing): Natural language processing via Letta
- Voicebox (Audio Output): High-quality TTS conversion
This architecture mirrors the human auditory system:
- Cochlea: Transduces sound waves into neural signals (audio โ text)
- Thalamus: Relays and refines sensory information (transcript processing)
- Broca: Language production and routing (message management)
- Cortex: Higher-level processing (AI reasoning)
For detailed technical specifications, see Audio Pipeline Architecture
- Python 3.10+
- Working Sanctum Instance - Self-hosted AI agent platform (configured with your preferred LLM provider)
- API Accounts - LiveKit, Deepgram, and Cartesia
๐ Detailed API setup instructions: See API Endpoints Setup Guide
git clone [email protected]:your-org/sanctum-cochlea.git
cd sanctum-cochlea
pip install -r requirements.txtCreate a .env file:
# Sanctum Instance Configuration
LETTA_API_KEY=your_sanctum_api_key
LETTA_BASE_URL=http://YOUR_SANCTUM_IP:8283/v1
# LiveKit Configuration
LIVEKIT_URL=wss://<YOUR-ROOM>.livekit.cloud
LIVEKIT_API_KEY=your_livekit_api_key
LIVEKIT_API_SECRET=your_livekit_api_secret
# Speech Services
DEEPGRAM_API_KEY=your_deepgram_api_key
CARTESIA_API_KEY=your_cartesia_api_keypython main.py devSanctum Cochlea requires a working Sanctum instance (self-hosted AI agent platform) configured with your preferred LLM provider:
- OpenAI - GPT-4, GPT-3.5, etc.
- Anthropic - Claude Sonnet, Claude Haiku, etc.
- Ollama - Local models like Llama, Mistral, etc.
- Other Providers - Any LLM provider supported by Sanctum
๐ Bootstrap Installer (Recommended): Use the sanctumos/installer repository for automated setup.
- Set
LETTA_BASE_URL=http://YOUR_SANCTUM_IP:8283/v1in your.env - See VPS Connection Guide for detailed setup
This implementation includes a custom VPSLettaLLM class that bypasses LiveKit's openai.LLM.with_letta() method, which is hardcoded to use the /voice-beta endpoint that only exists in Letta Cloud.
Why This Was Necessary:
- LiveKit's
openai.LLM.with_letta()only works with Letta Cloud - Self-hosted Letta instances use standard
/messagesendpoints - The
/voice-betaendpoint doesn't exist on VPS installations
How It Works:
class VPSLettaLLM(LLM):
def __init__(self, agent_id: str, base_url: str, api_key: str):
# Direct integration with Letta's standard message API
# Bypasses LiveKit's cloud-only voice-beta endpointBenefits:
- โ Works with any Letta instance (cloud or self-hosted)
- โ Uses standard Letta message API endpoints
- โ Maintains full compatibility with LiveKit's async context manager protocol
- โ Provides better error handling and debugging capabilities
Current Implementation:
- Direct
requests.postcalls to/agents/{agent_id}/messages - Proper
ChatContexttoletta_messagesconversion - Async context manager compatibility for LiveKit integration
- Comprehensive error handling and retry logic
To run your own Letta instance:
docker run \
-v ~/.letta/.persist/pgdata:/var/lib/postgresql/data \
-p 8283:8283 \
-e OPENAI_API_KEY=${OPENAI_API_KEY} \
letta/letta:latestIf your Letta server isn't exposed to the public internet, you can use ngrok:
- Install ngrok
- Add your authtoken:
ngrok config add-authtoken <YOUR-AUTHTOKEN> - Ensure Letta is running at
http://localhost:8283 - Set the base URL to your ngrok URL:
export LETTA_BASE_URL=https://xxxx.ngrok.app
- Letta server running with IP set in
LETTA_BASE_URL - Agent created in Letta (via ADE or REST API)
- Agent ID set in environment:
export LETTA_AGENT_ID=agent-xxxxxxx
python main.py dev- Go to LiveKit Agents Playground
- Connect to your room
- Chat with your agent
The project is now fully configurable through environment variables without code changes:
- Agent customization: Model, embedding, sleep-time settings
- Connection options: Cloud vs self-hosted
- Performance tuning: Buffer sizes, memory management
See Environment Configuration for all available options.
- Modular Architecture: Clean separation of concerns with pluggable components
- Privacy-First: Complete self-hosting with no external dependencies
- Scalability: Independent scaling of sensory, processing, and routing layers
- Fault Tolerance: Isolated failures don't cascade through the system
- Real-time Processing: Optimized for low-latency voice interactions
- End-to-End Latency: <2 seconds from speech to response
- Voice Detection: >95% sentence completion rate
- Error Handling: <1% message processing failures
- Audio Quality: Natural, interruption-free conversations
Detailed performance metrics and optimization strategies will be documented as the pipeline evolves.
Demo and monitoring information for agent interactions will be added here.
- Start with: Quick Reference Card
- Setup issues: Basic Setup Guide
- VPS connection: VPS Connection Guide
- Configuration: Environment Configuration
- Problems: Troubleshooting Guide
- Status: In Development
- Goal: Full integration with Broca middleware for advanced routing and caching
- Features:
- Message deduplication and queue management
- User context and core block management
- Platform-specific response routing
- Retry logic and error handling
- Status: Research Phase
- Goal: Extensive testing and integration of Thalamus refinement pipeline
- Features:
- Real-time transcript cleaning and structuring
- Speaker-aware segment grouping
- Sentence completion logic
- Noise filtering and TTS feedback prevention
- Session-based state management
- Status: Planning Phase
- Goal: Full integration with SanctumOS modular architecture
- Features:
- Complete Cochlea-Thalamus-Broca pipeline implementation
- SanctumOS plugin architecture compatibility
- Modular component deployment and management
- Cross-platform sensory processing capabilities
- Real-time event processing and memory management
- Status: Pending
- Goal: Complete technical documentation and research findings
- Deliverables:
- Comprehensive architecture documentation
- Performance benchmarks and analysis
- Integration guides for all supported platforms
- Best practices and implementation recommendations
- Letta Documentation: docs.letta.com
- LiveKit Documentation: docs.livekit.io
- Deepgram Documentation: developers.deepgram.com
- Cartesia Documentation: docs.cartesia.ai
- SanctumOS Overview: sanctumos.org - Modular agentic operating system
- Sanctum Installer: sanctumos/installer - Bootstrap installer for easy setup
- Thalamus Project: Real-time event processing and sensory refinement
- Broca Middleware: Message routing and queue management
- Audio Pipeline Architecture: docs/architecture/AUDIO_PIPELINE_ARCHITECTURE.md - Detailed technical specifications
Code: This project is licensed under the GNU Affero General Public License v3.0 (AGPLv3). See LICENSE for details.
Documentation: All non-code content (documentation, README files, etc.) is licensed under the Creative Commons Attribution-ShareAlike 4.0 International License (CC-BY-SA 4.0). See LICENSE-DOCS for details.
Ready to get started? Begin with our Quick Reference Card or dive into the Complete Documentation.