Voice Kernel TTS (Soul Forge: Audio Layer)

Portable, JSON-defined voice "kernels" that drive multi-engine TTS voice cloning.

Why

Fine-tuning is heavy and non-portable. Style transfer is fast but coarse. Voice kernels are lightweight, tunable, and portable.

What's Inside

Kernel spec (timbre/prosody/motifs/emotion)
Kernel loader (libs/kernel_loader.py)
Coqui XTTS v2 demo with reference speaker WAV
Record/validate scripts
Interactive say-anything tool

Demo

Listen to demo_snowcrash_metaverse.wav - a voice clone speaking about the Metaverse from Snow Crash.

Quickstart

Requirements

Python 3.11 (required for Coqui TTS)
Visual Studio Build Tools (Windows)
30-second clean voice sample (22,050 Hz WAV)

Installation

# Clone the repository
git clone https://github.com/BlackOrchardLabs/re-speak.git
cd re-speak

# Install dependencies (use Python 3.11)
C:\Python311\python.exe -m pip install TTS soundfile sounddevice transformers==4.33.0

Record Your Voice

C:\Python311\python.exe record_voice.py

This will:

Prompt you to record 30 seconds
Automatically normalize volume
Save to samples/eric_voice_clean.wav

Create Your Voice Kernel

Create kernels/your_voice_kernel.json:

{
  "kernel_name": "your_voice_kernel_v0.1",
  "version": "text_proxy_v2",
  "source": "recorded_sample",
  "timbre": {
    "warmth": 0.85,
    "depth": 0.8,
    "nasality": 0.1,
    "breathiness": 0.3
  },
  "prosody": {
    "avg_pitch_hz": 142,
    "pitch_variance": 0.6,
    "speech_rate_wpm": 130,
    "pause_density": 0.28,
    "rhythmic_flow": "deliberate, measured"
  },
  "motifs": [
    "technical clarity",
    "quiet intensity",
    "builder's cadence"
  ],
  "emotional_range": {
    "calm": 0.92,
    "curiosity": 0.88,
    "excitement": 0.75
  }
}

Clone Your Voice

Option 1: Interactive (Recommended)

C:\Python311\python.exe say_anything.py

Type any text and hear it in your voice!

Option 2: Single Test

C:\Python311\python.exe test_voice_clone.py

Generates eric_voice_clone_test.wav with predefined text.

Kernel Example

The voice kernel format captures vocal identity as portable JSON:

{
  "kernel_name": "eric_voice_kernel_v0.2",
  "timbre": {
    "warmth": 0.85,
    "depth": 0.8,
    "nasality": 0.1,
    "breathiness": 0.3
  },
  "prosody": {
    "avg_pitch_hz": 142,
    "speech_rate_wpm": 130,
    "pause_density": 0.28,
    "rhythmic_flow": "deliberate, measured"
  },
  "motifs": [
    "technical clarity",
    "quiet intensity",
    "builder's cadence",
    "subtle humor",
    "warm pauses"
  ],
  "emotional_range": {
    "calm": 0.92,
    "curiosity": 0.88,
    "excitement": 0.75,
    "frustration": 0.10
  }
}

Architecture

Voice Kernel Loader

from libs.kernel_loader import load_kernel

# Load kernel with audio reference
kernel = load_kernel(
    "kernels/eric_voice_kernel.json",
    audio_path="samples/eric_voice_clean.wav"
)

# Access parameters
print(kernel.timbre)          # {"warmth": 0.85, ...}
print(kernel.prosody)         # {"avg_pitch_hz": 142, ...}
print(kernel.motifs)          # ["technical clarity", ...]
print(kernel.get_speaker_wav())  # "samples/eric_voice_clean.wav"

Integration with Coqui TTS

from TTS.api import TTS

# Initialize TTS
tts = TTS(model_name="tts_models/multilingual/multi-dataset/xtts_v2")

# Generate speech with voice kernel
tts.tts_to_file(
    text="Your text here",
    file_path="output.wav",
    speaker_wav=kernel.get_speaker_wav(),
    language="en"
)

Technical Details

Audio Format

Sample rate: 22,050 Hz
Channels: Mono (automatically converted if stereo)
Duration: 30 seconds minimum recommended
Format: WAV (MP3 conversion available via convert_audio.py)

Dependencies

TTS (Coqui XTTS v2)
soundfile (audio I/O, patches torchaudio issues)
sounddevice (recording)
transformers==4.33.0 (pinned for compatibility)
torch (PyTorch 2.6+)

Known Issues

Windows Audio Backend:

Torchaudio's torchcodec dependency has compatibility issues on Windows
We patch this by using soundfile directly (see test_voice_clone.py)

PyTorch 2.6+ Security:

Requires safe_globals for model loading
Automatically handled in our scripts

Roadmap

Immediate

Working voice clone with Coqui XTTS v2
Kernel loader system
Interactive say-anything tool
Map kernel params → synthesis controls (rate/pitch/emotion)
Multi-engine parity (Coqui, Piper)

Short-term

Automated "voice → kernel" extractor (TUSK-style)
UI for record → extract → test workflow
Real-time parameter tuning
Kernel blending (70% voice A + 30% voice B)

Long-term

re-speak app
Mobile deployment
Cross-platform persona system
Soul Forge integration (visual + voice + text + memory)

Philosophy

Voice kernels are part of Soul Forge, a system for portable digital persona preservation. Just as TUSK extracts visual aesthetic DNA, voice kernels capture vocal identity as JSON.

The vision:

Visual kernels (TUSK) - aesthetic DNA
Voice kernels (this project) - vocal DNA
Text kernels (planned) - conversational DNA
Memory kernels (Hermes) - experiential DNA

Result: Complete persona portability across any platform.

Contributing

This is an early-stage research project. Contributions welcome, especially:

Multi-engine integration (Piper, other Coqui models)
Automated kernel extraction from audio
Parameter → synthesis mapping
Documentation improvements

License

MIT License (see LICENSE file)

Acknowledgments

Built on Coqui TTS
Forked from wachawo/text-to-speech
Part of the Black Orchard Labs ecosystem

About Black Orchard Labs

Mission: Data Autonomy

"Measuring the unmeasurable where your data is your own"

Building tools for data autonomy. Your conversations, your voice, your identity - under YOUR control. Because if corporations get IP protection for their data, you should own yours.

re-speak - Your voice, captured and portable. Because your voice is yours.

Black Orchard Labs
https://github.com/BlackOrchardLabs

"Code is just a form of speech—the form that computers understand." — Neal Stephenson, Snow Crash

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.claude		.claude
bin		bin
docs		docs
engines		engines
kernels		kernels
libs		libs
.flake8		.flake8
.gitignore		.gitignore
README.md		README.md
VOICE_KERNEL_BREAKTHROUGH_SESSION_Nov_9_2025.md		VOICE_KERNEL_BREAKTHROUGH_SESSION_Nov_9_2025.md
cli.py		cli.py
convert_audio.py		convert_audio.py
demo_snowcrash_metaverse.wav		demo_snowcrash_metaverse.wav
env.example		env.example
mypy.ini		mypy.ini
play.py		play.py
record_voice.py		record_voice.py
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt
say_anything.py		say_anything.py
test_kernel.py		test_kernel.py
test_tts.py		test_tts.py
test_voice_clone.py		test_voice_clone.py

BlackOrchardLabs/respeak

Folders and files

Latest commit

History

Repository files navigation