Echo TTS - Browser Client

A modern React + TypeScript application for OpenAI-compatible Text-to-Speech with dynamic voice creation, user authentication, and audio history management.

Echo TTS provides a comprehensive platform for converting text to speech using any OpenAI-compatible TTS service. Features include dynamic voice creation with admin approval, user authentication via Supabase, persistent audio history storage, and seamless Docker deployment for production environments.

✨ Features

Core TTS Functionality

🎵 Stream & Play: Accumulates audio chunks and plays automatically upon completion
📚 Persistent History: Keeps the last 5 generated audio clips in IndexedDB for quick replay
💾 Download Support: Export generated audio files as .ogg (Opus) format
🔄 Auto-play: Generated audio plays automatically with fallback handling

Dynamic Voice Creation

🎤 Voice Recording: Record audio directly from microphone or upload audio files (.wav, .ogg, .m4a, .mp3)
✏️ Custom Voices: Create personal reference voices (30-60 seconds) with admin approval workflow
👥 User Roles: Tiered access system (user → voice_creator → admin) via Supabase authentication
📊 Voice Management: Dynamic voice listing replacing static configuration, real-time updates
🔒 Quota System: Fair usage with 20 voice limit per user to prevent abuse

Platform Features

🔐 Supabase Auth: Secure user authentication with role-based access control
🗃️ Database-Backed: PostgreSQL-backed voice metadata and request tracking
☁️ Cloud Storage: S3-compatible storage for raw uploads and processed audio files
🔧 Runtime Configuration: Change API endpoints and models via environment variables without rebuilding
🎨 Modern UI: Clean Material-UI interface with light/dark theme toggle
🪝 Custom Hooks Architecture: Modular, reusable React hooks for clean separation of concerns
♻️ Optimized Performance: Built with React best practices and modern ES2022 features
🐳 Docker Ready: Containerized for internal shared_net usage with Nginx Proxy Manager

🏗️ Architecture

Echo TTS employs a comprehensive multi-service architecture with authentication, database storage, and dynamic voice management:

Core Services

Frontend (React/TypeScript): User interface with authentication and voice management
Express Server (port 4173): Serves static files and injects runtime environment variables
TTS Bridge Service: OpenAI-compatible API endpoint with voice management and Supabase integration
RunPod Serverless: Audio processing and inference with shared volume access
Supabase: Authentication, PostgreSQL database, and real-time subscriptions
S3 Storage: Raw uploads and processed audio files with lifecycle management

Deployment Architecture

Docker Container: Runs on shared_net network without host port exposure
Nginx Proxy Manager: Routes external traffic to the container
Production Mode: Express server serves static files with runtime env injection
Development Mode: Vite dev server (port 5173) with hot module replacement

Authentication & Authorization Flow

User authenticates via Supabase (JWT tokens)
Roles: user (default) → voice_creator_pending → voice_creator → admin
Bridge service validates JWTs and enforces role-based access
Admin approval required for voice creation permissions

Voice Creation Pipeline

Upload: Users upload/record 30-60s audio to S3 (uploads/ prefix)
Registration: Bridge registers voice in database with pending status
Approval: Admin reviews and approves requests via UI
Processing: Bridge normalizes audio to .ogg Opus format
Deployment: Processed files stored in S3 (processed/) and RunPod shared volume
Availability: Voice becomes available in TTS service

Data Flow

User input flows through React components with authentication context
TTS requests go through bridge service with role validation
Audio responses stored as blobs in IndexedDB with automatic cleanup
Voice metadata managed in PostgreSQL with real-time updates
Audio files processed and stored in S3 with local caching

Frontend Architecture

Authentication Context: Supabase auth state management
Custom Hooks: Modular logic for TTS, audio playback, history, URL lifecycle
Theme Context: Dynamic light/dark mode switching with MUI theming
TypeScript: Full type safety with ES2022 target
State Management: React hooks with memoization and real-time updates

🚀 Quick Start

Prerequisites

Node.js 18+
Docker and Docker Compose
OpenAI-compatible TTS service
Supabase project (for authentication and database)
S3-compatible storage service (for voice file storage)

Docker Deployment (Recommended)

Setup Supabase:
- Create a new Supabase project
- Run supabase/sql/001_schema.sql in the Supabase SQL Editor
- Create an admin user in the user_roles table
- Get your Supabase URL and keys
Create network (if not exists):
```
docker network create shared_net
```

Configure environment: Create a .env file with your configuration:

# TTS Configuration
TTS_ENDPOINT=http://your-tts-service:8000/v1/audio/speech
TTS_MODEL=gpt-4o-mini-tts

# Supabase Configuration
VITE_SUPABASE_URL=https://your-project.supabase.co
VITE_SUPABASE_ANON_KEY=your-anon-key
SUPABASE_SERVICE_KEY=your-service-key

# S3 Configuration (for voice storage)
S3_BUCKET=your-bucket
S3_REGION=us-east-1
S3_ACCESS_KEY=your-access-key
S3_SECRET=your-secret-key

# Bridge API Configuration
BRIDGE_API_URL=http://your-bridge-service:3000

Deploy:
```
docker-compose up -d --build
```
Configure Nginx Proxy Manager: Forward traffic to echo-tts-ui container on port 4173
Access the application:
- Open your browser to the configured domain
- Sign in with Supabase auth
- Admin users can approve voice creation requests

Local Development

Clone and install:

git clone https://github.com/your-org/echo-tts-app.git
cd echo-tts-app
npm install

Setup environment: Create .env.local with your configuration:

# TTS Configuration
VITE_OPEN_AI_TTS_ENDPOINT=http://localhost:8000/v1/audio/speech
VITE_OPEN_AI_TTS_MODEL=gpt-4o-mini-tts

# Supabase Configuration
VITE_SUPABASE_URL=https://your-project.supabase.co
VITE_SUPABASE_ANON_KEY=your-anon-key

# Bridge API Configuration
VITE_BRIDGE_API_URL=http://localhost:3000

Start development server:
```
npm run dev
```
Access the application: Open http://localhost:5173 in your browser
Test voice creation:
- Sign in with Supabase auth
- Request voice creation access (requires admin approval)
- Upload or record a voice sample (30-60 seconds)

📖 Configuration

Environment Variables

Core TTS Configuration

Variable	Description	Required	Default
`VITE_OPEN_AI_TTS_ENDPOINT`	Full URL to the TTS POST endpoint	✅	-
`VITE_OPEN_AI_TTS_MODEL`	Model ID for TTS requests	❌	`gpt-4o-mini-tts`
`VITE_OPEN_AI_TTS_VOICES`	JSON array of default voices (deprecated)	❌	`[{"id":"alloy","label":"Alloy"},...]`

Supabase Authentication

Variable	Description	Required	Default
`VITE_SUPABASE_URL`	Supabase project URL	✅	-
`VITE_SUPABASE_ANON_KEY`	Supabase anonymous key	✅	-
`SUPABASE_SERVICE_KEY`	Supabase service key (server-side)	✅	-

Bridge API

Variable	Description	Required	Default
`VITE_BRIDGE_API_URL`	Bridge service base URL	✅	-

S3 Storage

Variable	Description	Required	Default
`S3_BUCKET`	S3 bucket name	✅	-
`S3_REGION`	S3 bucket region	✅	-
`S3_ACCESS_KEY`	S3 access key	✅	-
`S3_SECRET`	S3 secret key	✅	-
`S3_REFERENCE_PREFIX`	Prefix for voice files	❌	`reference-voices/`

Migration Note

The VITE_OPEN_AI_TTS_VOICES variable is deprecated. Voice configuration is now managed dynamically through the Supabase database and bridge API.

Voice Creation Guidelines

When creating custom voices:

Audio Requirements:
- Duration: 30-60 seconds
- Formats: .wav, .ogg, .m4a, .mp3
- Quality: Clear, consistent speech with minimal background noise
- Content: Read a pangram for diverse phoneme coverage

Recommended Pangram:

"The quick brown fox jumps over the lazy dog. Pack my box with five dozen liquor jugs. How vexingly quick daft zebras jump!"

User Roles:
- user: Can use existing voices and TTS functionality
- voice_creator_pending: Requested voice creation access
- voice_creator: Can create and manage own voices
- admin: Can approve requests and manage all voices

API Request Format

The application sends requests in this format:

{
  model: "gpt-4o-mini-tts",
  input: "Your text here",
  voice: "alloy",
  format: "opus"
}

🛠️ Development Commands

# Install dependencies
npm install

# Start development server with hot reload
npm run dev

# Build for production
npm run build

# Preview production build
npm run preview

# Start production server
npm start

# Type checking
npx tsc --noEmit

🧪 Testing the Integration

Health Check

curl http://localhost:4173/health

Direct TTS API Test

curl -X POST http://your-tts-service:8000/v1/audio/speech \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini-tts",
    "input": "Hello, world!",
    "voice": "alloy",
    "format": "opus"
  }' \
  --output test.ogg

📁 Project Structure

echoTTS-app/
├── src/                    # React application source
│   ├── contexts/          # React contexts
│   │   ├── ThemeContext.tsx     # Theme management (light/dark mode)
│   │   └── AuthContext.tsx      # Supabase authentication state
│   ├── hooks/             # Custom React hooks
│   │   ├── index.ts            # Hook exports
│   │   ├── useAudioPlayer.ts   # Audio playback logic
│   │   ├── useHistory.ts       # History + IndexedDB management
│   │   ├── useObjectUrls.ts    # Blob URL lifecycle management
│   │   ├── useTTS.ts           # TTS API integration
│   │   ├── useAuth.ts          # Supabase auth integration
│   │   ├── useVoices.ts        # Dynamic voice management
│   │   └── useVoiceCreation.ts # Voice upload/creation flow
│   ├── components/        # Reusable UI components
│   │   ├── VoiceRecorder.tsx   # Microphone recording component
│   │   ├── VoiceUploader.tsx   # File upload component
│   │   ├── VoiceApproval.tsx   # Admin approval interface
│   │   └── VoiceManager.tsx    # Voice list and management
│   ├── App.tsx            # Main application component
│   ├── config.ts          # Configuration management
│   ├── supabaseClient.ts  # Supabase client initialization
│   ├── main.tsx           # React app initialization
│   └── vite-env.d.ts      # Vite type definitions
├── supabase/              # Database schema and migrations
│   └── sql/
│       └── 001_schema.sql  # Database schema for voices and auth
├── docs/                  # Documentation
│   ├── diagrams/          # Architecture diagrams
│   └── ADD_VOICE.md       # Voice creation feature specification
├── server.js              # Express server for production
├── index.html             # HTML template with env injection
├── docker-compose.yml     # Docker deployment configuration
├── Dockerfile             # Container build instructions
├── vite.config.ts         # Vite configuration
├── tsconfig.json          # TypeScript configuration (ES2022)
└── .env.example           # Example environment configuration

🔧 Technical Details

State Management

Custom Hooks Architecture: Modular, composable hooks following React best practices
- useTTS: TTS API integration with loading and error states
- useAudioPlayer: Audio playback management with cleanup
- useHistory: IndexedDB persistence with atomic operations
- useObjectUrls: Automatic blob URL lifecycle management
- useAuth: Supabase authentication state management
- useVoices: Dynamic voice listing with real-time updates
- useVoiceCreation: Voice upload, recording, and submission workflow
React Contexts:
- ThemeContext: Light/dark mode theming with MUI
- AuthContext: Supabase auth state and user role management
IndexedDB: Persistent storage of audio history via idb-keyval v6+
Object URLs: Efficient audio playback without base64 encoding
Real-time Updates: Supabase real-time subscriptions for voice approval status

Audio Handling

Format: Opus codec in OGG container
Storage: Binary blobs in IndexedDB with atomic operations
Playback: HTML5 Audio API with fallback error handling
Download: Dynamic anchor element creation
URL Management: Automatic creation and cleanup to prevent memory leaks

Voice Processing Pipeline

Upload Support: .wav, .ogg, .m4a, .mp3 formats accepted
Duration Validation: Client and server-side enforcement (30-60 seconds)
Audio Normalization: FFmpeg conversion to standardized Opus format
File Storage: S3 with prefixes (uploads/ for raw, processed/ for final)
Quality Assurance: Admin approval workflow before voice activation

Authentication & Security

JWT Validation: Supabase JWT tokens validated on bridge service
Role-Based Access: Database-enforced permissions per endpoint
Request Signing: All API calls require valid authentication
File Security: Presigned URLs with expiration for uploads
Input Validation: Duration, format, and size validation on both client and server

Modern React Patterns

Custom Hooks: Separation of concerns with reusable logic
TypeScript: Full type safety with ES2022 target
Memoization: Optimized performance with useMemo and useCallback
Error Boundaries: Proper error handling throughout the application
Clean Code: Reduced component complexity (App.tsx: 286 → 198 lines)

Environment Injection

The Express server injects runtime environment variables into index.html:

window.__ENV__ = {
  VITE_OPEN_AI_TTS_ENDPOINT: "...",
  VITE_OPEN_AI_TTS_MODEL: "...",
  VITE_OPEN_AI_TTS_VOICES: "..."
};

🐳 Docker Configuration

Build Context

Multi-stage build for optimized production image
Node.js 18 Alpine base image
Nginx Proxy Manager compatible

Network Configuration

Uses external shared_net network
No host ports exposed (internal-only)
Health check endpoint at /health

Environment Injection

Runtime environment variables are passed through Docker environment:

environment:
  - VITE_OPEN_AI_TTS_ENDPOINT=${TTS_ENDPOINT}
  - VITE_OPEN_AI_TTS_MODEL=${TTS_MODEL}

🚧 Development Workflow

Voice Creation Feature Development

The voice creation feature is documented in ADD_VOICE.md. This comprehensive specification covers:

Database Schema: Supabase tables, triggers, and RLS policies
API Endpoints: Bridge service endpoints for voice management
Frontend Components: React components for recording, uploading, and approval
Security: Authentication, authorization, and input validation
Migration: Steps to upgrade from static to dynamic voices

Setting Up Development Environment

Database Setup:

# Run the schema in Supabase SQL Editor
cat supabase/sql/001_schema.sql | pbcopy  # Copy to clipboard
# Paste and execute in Supabase dashboard

Environment Variables:

cp .env.example .env.local
# Fill in your Supabase and S3 credentials

Test Workflow:
- Start the bridge service separately (see ADD_VOICE.md)
- Run npm run dev for the frontend
- Test auth flow with different user roles
- Verify voice creation and approval pipeline

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Fork the repository
Create your feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add some amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

Voice Creation Feature Contributions

When contributing to the voice creation feature:

Follow the specification in ADD_VOICE.md
Test all user roles and permissions
Verify audio processing and storage
Ensure proper error handling and validation
Update documentation as needed

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

OpenAI for the TTS API specification
Material-UI for the excellent React component library
Vite for the fast development tooling
IndexedDB for client-side persistence

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
docs/diagrams		docs/diagrams
images		images
public		public
src		src
supabase		supabase
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
ADD_VOICE.md		ADD_VOICE.md
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
index.html		index.html
package.json		package.json
server.js		server.js
tsconfig.json		tsconfig.json
tsconfig.node.json		tsconfig.node.json
vite.config.ts		vite.config.ts

sruckh/echoTTS-app

Folders and files

Latest commit

History

Repository files navigation

Echo TTS - Browser Client

✨ Features

Core TTS Functionality

Dynamic Voice Creation

Platform Features

🏗️ Architecture

Core Services

Deployment Architecture

Authentication & Authorization Flow

Voice Creation Pipeline

Data Flow

Frontend Architecture

🚀 Quick Start

Prerequisites

Docker Deployment (Recommended)

Local Development

📖 Configuration

Environment Variables

Core TTS Configuration

Supabase Authentication

Bridge API

S3 Storage

Migration Note

Voice Creation Guidelines

API Request Format

🛠️ Development Commands

🧪 Testing the Integration

Health Check

Direct TTS API Test

📁 Project Structure

🔧 Technical Details

State Management

Audio Handling

Voice Processing Pipeline

Authentication & Security

Modern React Patterns

Environment Injection

🐳 Docker Configuration

Build Context

Network Configuration

Environment Injection

🚧 Development Workflow

Voice Creation Feature Development

Setting Up Development Environment

🤝 Contributing

Voice Creation Feature Contributions

📄 License

🙏 Acknowledgments

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages