Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Research Release: Audio input compatible with standard Thalamus, Broca or Letta inputs within Sanctum.

License

Unknown, Unknown licenses found

Licenses found

Unknown
LICENSE
Unknown
LICENSE-DOCS
Notifications You must be signed in to change notification settings

sanctumos/cochlea

ย 
ย 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

47 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Sanctum Cochlea

๐Ÿšง Currently In Development Research Release

Sanctum Cochlea is a fork of the letta-voice experiment, audio ingest system for Sanctum and Letta installations.

This repository demonstrates how to use Letta and LiveKit to create low-latency voice agents with memory, tool execution, and persistence. This is a research implementation based on the original Letta Voice architecture, designed to be compatible with:

  • Direct Letta Integration - Standard Letta Cloud or self-hosted instances
  • Broca Middleware - Advanced routing and caching layer
  • Thalamus Refinement - Transcript cleaning and structuring pipeline

See Audio Pipeline Architecture for detailed information about the multi-layer approach.

๐Ÿš€ Quick Start

For immediate setup instructions, see our Quick Reference Card

๐Ÿ“š Comprehensive Documentation

This project includes detailed documentation to help you get started:

๐ŸŽฏ What This Project Does

Creates low-latency voice agents using:

  • Sanctum Instance - Your self-hosted AI agent platform (supports OpenAI, Anthropic, Ollama, and other LLM providers)
  • LiveKit - Real-time voice communication
  • Deepgram - Speech-to-text conversion
  • Cartesia - Text-to-speech conversion

๐Ÿ—๏ธ Architecture

Current Implementation

User Voice โ†’ LiveKit โ†’ Sanctum Cochlea Agent โ†’ Letta Instance โ†’ AI Models
                โ†“
            Speech Processing (STT/TTS)

Target Architecture: SanctumOS Integration

Sanctum Cochlea is designed to integrate with the broader SanctumOS modular architecture, functioning as a specialized sensory processing module within a larger agentic operating system.

SanctumOS Modular Framework

SanctumOS is a modular, self-hosted agentic operating system designed to run fully autonomous, context-rich AI agents. It emphasizes:

  • Modularity: Clean, plugin-driven integration of specialized components
  • Privacy: Complete self-hosting with no vendor lock-in
  • Autonomy: Fully autonomous agents with infinite memory management
  • Real-time Processing: Sensory filtering and event processing capabilities

Cochlea's Role in SanctumOS

Within the SanctumOS ecosystem, Cochlea serves as the sensory input module, responsible for:

  • Audio Capture: Real-time audio stream processing via LiveKit
  • Speech-to-Text: Converting audio to text using Deepgram
  • Sensory Filtering: Preprocessing and filtering audio data
  • Data Preparation: Formatting sensory input for downstream processing

Multi-Layer Audio Pipeline

The target architecture implements a sophisticated multi-layer pipeline:

Cochlea โ†’ Thalamus โ†’ Broca โ†’ Prime Agent โ†’ Voicebox

Layer Breakdown:

  1. Cochlea (Audio Input): Raw audio capture and STT conversion
  2. Thalamus (Refinement): Clean, structure, and refine transcripts
  3. Broca (Routing): Route events with proper queue management
  4. Prime Agent (Processing): Natural language processing via Letta
  5. Voicebox (Audio Output): High-quality TTS conversion

Biological Inspiration

This architecture mirrors the human auditory system:

  • Cochlea: Transduces sound waves into neural signals (audio โ†’ text)
  • Thalamus: Relays and refines sensory information (transcript processing)
  • Broca: Language production and routing (message management)
  • Cortex: Higher-level processing (AI reasoning)

For detailed technical specifications, see Audio Pipeline Architecture

โšก Quick Setup (Minimal)

Prerequisites

  • Python 3.10+
  • Working Sanctum Instance - Self-hosted AI agent platform (configured with your preferred LLM provider)
  • API Accounts - LiveKit, Deepgram, and Cartesia

๐Ÿ“‹ Detailed API setup instructions: See API Endpoints Setup Guide

Installation

git clone [email protected]:your-org/sanctum-cochlea.git
cd sanctum-cochlea 
pip install -r requirements.txt

Basic Configuration

Create a .env file:

# Sanctum Instance Configuration
LETTA_API_KEY=your_sanctum_api_key
LETTA_BASE_URL=http://YOUR_SANCTUM_IP:8283/v1

# LiveKit Configuration
LIVEKIT_URL=wss://<YOUR-ROOM>.livekit.cloud
LIVEKIT_API_KEY=your_livekit_api_key
LIVEKIT_API_SECRET=your_livekit_api_secret

# Speech Services
DEEPGRAM_API_KEY=your_deepgram_api_key
CARTESIA_API_KEY=your_cartesia_api_key

Run

python main.py dev

๐Ÿ”— Sanctum Instance Requirements

Required: Working Sanctum Instance

Sanctum Cochlea requires a working Sanctum instance (self-hosted AI agent platform) configured with your preferred LLM provider:

  • OpenAI - GPT-4, GPT-3.5, etc.
  • Anthropic - Claude Sonnet, Claude Haiku, etc.
  • Ollama - Local models like Llama, Mistral, etc.
  • Other Providers - Any LLM provider supported by Sanctum

Easy Installation

๐Ÿš€ Bootstrap Installer (Recommended): Use the sanctumos/installer repository for automated setup.

Configuration

  • Set LETTA_BASE_URL=http://YOUR_SANCTUM_IP:8283/v1 in your .env
  • See VPS Connection Guide for detailed setup

๐Ÿ”ง Technical Implementation Details

Custom LLM Wrapper for VPS Integration

This implementation includes a custom VPSLettaLLM class that bypasses LiveKit's openai.LLM.with_letta() method, which is hardcoded to use the /voice-beta endpoint that only exists in Letta Cloud.

Why This Was Necessary:

  • LiveKit's openai.LLM.with_letta() only works with Letta Cloud
  • Self-hosted Letta instances use standard /messages endpoints
  • The /voice-beta endpoint doesn't exist on VPS installations

How It Works:

class VPSLettaLLM(LLM):
    def __init__(self, agent_id: str, base_url: str, api_key: str):
        # Direct integration with Letta's standard message API
        # Bypasses LiveKit's cloud-only voice-beta endpoint

Benefits:

  • โœ… Works with any Letta instance (cloud or self-hosted)
  • โœ… Uses standard Letta message API endpoints
  • โœ… Maintains full compatibility with LiveKit's async context manager protocol
  • โœ… Provides better error handling and debugging capabilities

Current Implementation:

  • Direct requests.post calls to /agents/{agent_id}/messages
  • Proper ChatContext to letta_messages conversion
  • Async context manager compatibility for LiveKit integration
  • Comprehensive error handling and retry logic

๐Ÿณ Self-Hosting Letta

To run your own Letta instance:

docker run \
  -v ~/.letta/.persist/pgdata:/var/lib/postgresql/data \
  -p 8283:8283 \
  -e OPENAI_API_KEY=${OPENAI_API_KEY} \
  letta/letta:latest

Using ngrok for Local Development

If your Letta server isn't exposed to the public internet, you can use ngrok:

  1. Install ngrok
  2. Add your authtoken: ngrok config add-authtoken <YOUR-AUTHTOKEN>
  3. Ensure Letta is running at http://localhost:8283
  4. Set the base URL to your ngrok URL:
    export LETTA_BASE_URL=https://xxxx.ngrok.app

๐ŸŽฎ Running a Voice Agent

Prerequisites

  1. Letta server running with IP set in LETTA_BASE_URL
  2. Agent created in Letta (via ADE or REST API)
  3. Agent ID set in environment:
    export LETTA_AGENT_ID=agent-xxxxxxx

Start the Agent

python main.py dev

Test Your Agent

  1. Go to LiveKit Agents Playground
  2. Connect to your room
  3. Chat with your agent

๐Ÿ”ง Advanced Configuration

The project is now fully configurable through environment variables without code changes:

  • Agent customization: Model, embedding, sleep-time settings
  • Connection options: Cloud vs self-hosted
  • Performance tuning: Buffer sizes, memory management

See Environment Configuration for all available options.

โšก Performance & Benefits

SanctumOS Integration Benefits

  • Modular Architecture: Clean separation of concerns with pluggable components
  • Privacy-First: Complete self-hosting with no external dependencies
  • Scalability: Independent scaling of sensory, processing, and routing layers
  • Fault Tolerance: Isolated failures don't cascade through the system
  • Real-time Processing: Optimized for low-latency voice interactions

Current Performance Targets

  • End-to-End Latency: <2 seconds from speech to response
  • Voice Detection: >95% sentence completion rate
  • Error Handling: <1% message processing failures
  • Audio Quality: Natural, interruption-free conversations

Detailed performance metrics and optimization strategies will be documented as the pipeline evolves.

๐Ÿ‘๏ธ Viewing Agent Interactions

Demo and monitoring information for agent interactions will be added here.

๐Ÿ†˜ Need Help?

  1. Start with: Quick Reference Card
  2. Setup issues: Basic Setup Guide
  3. VPS connection: VPS Connection Guide
  4. Configuration: Environment Configuration
  5. Problems: Troubleshooting Guide

๐Ÿ—บ๏ธ Development Roadmap

Phase 1: Broca Plugin Compatibility

  • Status: In Development
  • Goal: Full integration with Broca middleware for advanced routing and caching
  • Features:
    • Message deduplication and queue management
    • User context and core block management
    • Platform-specific response routing
    • Retry logic and error handling

Phase 2: Thalamus Protocol Compatibility

  • Status: Research Phase
  • Goal: Extensive testing and integration of Thalamus refinement pipeline
  • Features:
    • Real-time transcript cleaning and structuring
    • Speaker-aware segment grouping
    • Sentence completion logic
    • Noise filtering and TTS feedback prevention
    • Session-based state management

Phase 3: SanctumOS Integration

  • Status: Planning Phase
  • Goal: Full integration with SanctumOS modular architecture
  • Features:
    • Complete Cochlea-Thalamus-Broca pipeline implementation
    • SanctumOS plugin architecture compatibility
    • Modular component deployment and management
    • Cross-platform sensory processing capabilities
    • Real-time event processing and memory management

Phase 4: Full Whitepaper

  • Status: Pending
  • Goal: Complete technical documentation and research findings
  • Deliverables:
    • Comprehensive architecture documentation
    • Performance benchmarks and analysis
    • Integration guides for all supported platforms
    • Best practices and implementation recommendations

๐Ÿ“– Additional Resources

Core Technologies

SanctumOS Ecosystem

๐Ÿ“„ License

Code: This project is licensed under the GNU Affero General Public License v3.0 (AGPLv3). See LICENSE for details.

Documentation: All non-code content (documentation, README files, etc.) is licensed under the Creative Commons Attribution-ShareAlike 4.0 International License (CC-BY-SA 4.0). See LICENSE-DOCS for details.


Ready to get started? Begin with our Quick Reference Card or dive into the Complete Documentation.

About

Research Release: Audio input compatible with standard Thalamus, Broca or Letta inputs within Sanctum.

Resources

License

Unknown, Unknown licenses found

Licenses found

Unknown
LICENSE
Unknown
LICENSE-DOCS

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%