A modern React + TypeScript application for OpenAI-compatible Text-to-Speech with dynamic voice creation, user authentication, and audio history management.
Echo TTS provides a comprehensive platform for converting text to speech using any OpenAI-compatible TTS service. Features include dynamic voice creation with admin approval, user authentication via Supabase, persistent audio history storage, and seamless Docker deployment for production environments.
- π΅ Stream & Play: Accumulates audio chunks and plays automatically upon completion
- π Persistent History: Keeps the last 5 generated audio clips in IndexedDB for quick replay
- πΎ Download Support: Export generated audio files as
.ogg(Opus) format - π Auto-play: Generated audio plays automatically with fallback handling
- π€ Voice Recording: Record audio directly from microphone or upload audio files (.wav, .ogg, .m4a, .mp3)
- βοΈ Custom Voices: Create personal reference voices (30-60 seconds) with admin approval workflow
- π₯ User Roles: Tiered access system (user β voice_creator β admin) via Supabase authentication
- π Voice Management: Dynamic voice listing replacing static configuration, real-time updates
- π Quota System: Fair usage with 20 voice limit per user to prevent abuse
- π Supabase Auth: Secure user authentication with role-based access control
- ποΈ Database-Backed: PostgreSQL-backed voice metadata and request tracking
- βοΈ Cloud Storage: S3-compatible storage for raw uploads and processed audio files
- π§ Runtime Configuration: Change API endpoints and models via environment variables without rebuilding
- π¨ Modern UI: Clean Material-UI interface with light/dark theme toggle
- πͺ Custom Hooks Architecture: Modular, reusable React hooks for clean separation of concerns
- β»οΈ Optimized Performance: Built with React best practices and modern ES2022 features
- π³ Docker Ready: Containerized for internal
shared_netusage with Nginx Proxy Manager
Echo TTS employs a comprehensive multi-service architecture with authentication, database storage, and dynamic voice management:
- Frontend (React/TypeScript): User interface with authentication and voice management
- Express Server (port 4173): Serves static files and injects runtime environment variables
- TTS Bridge Service: OpenAI-compatible API endpoint with voice management and Supabase integration
- RunPod Serverless: Audio processing and inference with shared volume access
- Supabase: Authentication, PostgreSQL database, and real-time subscriptions
- S3 Storage: Raw uploads and processed audio files with lifecycle management
- Docker Container: Runs on
shared_netnetwork without host port exposure - Nginx Proxy Manager: Routes external traffic to the container
- Production Mode: Express server serves static files with runtime env injection
- Development Mode: Vite dev server (port 5173) with hot module replacement
- User authenticates via Supabase (JWT tokens)
- Roles:
user(default) βvoice_creator_pendingβvoice_creatorβadmin - Bridge service validates JWTs and enforces role-based access
- Admin approval required for voice creation permissions
- Upload: Users upload/record 30-60s audio to S3 (uploads/ prefix)
- Registration: Bridge registers voice in database with
pendingstatus - Approval: Admin reviews and approves requests via UI
- Processing: Bridge normalizes audio to .ogg Opus format
- Deployment: Processed files stored in S3 (processed/) and RunPod shared volume
- Availability: Voice becomes available in TTS service
- User input flows through React components with authentication context
- TTS requests go through bridge service with role validation
- Audio responses stored as blobs in IndexedDB with automatic cleanup
- Voice metadata managed in PostgreSQL with real-time updates
- Audio files processed and stored in S3 with local caching
- Authentication Context: Supabase auth state management
- Custom Hooks: Modular logic for TTS, audio playback, history, URL lifecycle
- Theme Context: Dynamic light/dark mode switching with MUI theming
- TypeScript: Full type safety with ES2022 target
- State Management: React hooks with memoization and real-time updates
- Node.js 18+
- Docker and Docker Compose
- OpenAI-compatible TTS service
- Supabase project (for authentication and database)
- S3-compatible storage service (for voice file storage)
-
Setup Supabase:
- Create a new Supabase project
- Run
supabase/sql/001_schema.sqlin the Supabase SQL Editor - Create an admin user in the
user_rolestable - Get your Supabase URL and keys
-
Create network (if not exists):
docker network create shared_net
-
Configure environment: Create a
.envfile with your configuration:# TTS Configuration TTS_ENDPOINT=http://your-tts-service:8000/v1/audio/speech TTS_MODEL=gpt-4o-mini-tts # Supabase Configuration VITE_SUPABASE_URL=https://your-project.supabase.co VITE_SUPABASE_ANON_KEY=your-anon-key SUPABASE_SERVICE_KEY=your-service-key # S3 Configuration (for voice storage) S3_BUCKET=your-bucket S3_REGION=us-east-1 S3_ACCESS_KEY=your-access-key S3_SECRET=your-secret-key # Bridge API Configuration BRIDGE_API_URL=http://your-bridge-service:3000
-
Deploy:
docker-compose up -d --build
-
Configure Nginx Proxy Manager: Forward traffic to
echo-tts-uicontainer on port4173 -
Access the application:
- Open your browser to the configured domain
- Sign in with Supabase auth
- Admin users can approve voice creation requests
-
Clone and install:
git clone https://github.com/your-org/echo-tts-app.git cd echo-tts-app npm install -
Setup environment: Create
.env.localwith your configuration:# TTS Configuration VITE_OPEN_AI_TTS_ENDPOINT=http://localhost:8000/v1/audio/speech VITE_OPEN_AI_TTS_MODEL=gpt-4o-mini-tts # Supabase Configuration VITE_SUPABASE_URL=https://your-project.supabase.co VITE_SUPABASE_ANON_KEY=your-anon-key # Bridge API Configuration VITE_BRIDGE_API_URL=http://localhost:3000
-
Start development server:
npm run dev
-
Access the application: Open http://localhost:5173 in your browser
-
Test voice creation:
- Sign in with Supabase auth
- Request voice creation access (requires admin approval)
- Upload or record a voice sample (30-60 seconds)
| Variable | Description | Required | Default |
|---|---|---|---|
VITE_OPEN_AI_TTS_ENDPOINT |
Full URL to the TTS POST endpoint | β | - |
VITE_OPEN_AI_TTS_MODEL |
Model ID for TTS requests | β | gpt-4o-mini-tts |
VITE_OPEN_AI_TTS_VOICES |
JSON array of default voices (deprecated) | β | [{"id":"alloy","label":"Alloy"},...] |
| Variable | Description | Required | Default |
|---|---|---|---|
VITE_SUPABASE_URL |
Supabase project URL | β | - |
VITE_SUPABASE_ANON_KEY |
Supabase anonymous key | β | - |
SUPABASE_SERVICE_KEY |
Supabase service key (server-side) | β | - |
| Variable | Description | Required | Default |
|---|---|---|---|
VITE_BRIDGE_API_URL |
Bridge service base URL | β | - |
| Variable | Description | Required | Default |
|---|---|---|---|
S3_BUCKET |
S3 bucket name | β | - |
S3_REGION |
S3 bucket region | β | - |
S3_ACCESS_KEY |
S3 access key | β | - |
S3_SECRET |
S3 secret key | β | - |
S3_REFERENCE_PREFIX |
Prefix for voice files | β | reference-voices/ |
The VITE_OPEN_AI_TTS_VOICES variable is deprecated. Voice configuration is now managed dynamically through the Supabase database and bridge API.
When creating custom voices:
-
Audio Requirements:
- Duration: 30-60 seconds
- Formats: .wav, .ogg, .m4a, .mp3
- Quality: Clear, consistent speech with minimal background noise
- Content: Read a pangram for diverse phoneme coverage
-
Recommended Pangram:
"The quick brown fox jumps over the lazy dog. Pack my box with five dozen liquor jugs. How vexingly quick daft zebras jump!" -
User Roles:
user: Can use existing voices and TTS functionalityvoice_creator_pending: Requested voice creation accessvoice_creator: Can create and manage own voicesadmin: Can approve requests and manage all voices
The application sends requests in this format:
{
model: "gpt-4o-mini-tts",
input: "Your text here",
voice: "alloy",
format: "opus"
}# Install dependencies
npm install
# Start development server with hot reload
npm run dev
# Build for production
npm run build
# Preview production build
npm run preview
# Start production server
npm start
# Type checking
npx tsc --noEmitcurl http://localhost:4173/healthcurl -X POST http://your-tts-service:8000/v1/audio/speech \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o-mini-tts",
"input": "Hello, world!",
"voice": "alloy",
"format": "opus"
}' \
--output test.oggechoTTS-app/
βββ src/ # React application source
β βββ contexts/ # React contexts
β β βββ ThemeContext.tsx # Theme management (light/dark mode)
β β βββ AuthContext.tsx # Supabase authentication state
β βββ hooks/ # Custom React hooks
β β βββ index.ts # Hook exports
β β βββ useAudioPlayer.ts # Audio playback logic
β β βββ useHistory.ts # History + IndexedDB management
β β βββ useObjectUrls.ts # Blob URL lifecycle management
β β βββ useTTS.ts # TTS API integration
β β βββ useAuth.ts # Supabase auth integration
β β βββ useVoices.ts # Dynamic voice management
β β βββ useVoiceCreation.ts # Voice upload/creation flow
β βββ components/ # Reusable UI components
β β βββ VoiceRecorder.tsx # Microphone recording component
β β βββ VoiceUploader.tsx # File upload component
β β βββ VoiceApproval.tsx # Admin approval interface
β β βββ VoiceManager.tsx # Voice list and management
β βββ App.tsx # Main application component
β βββ config.ts # Configuration management
β βββ supabaseClient.ts # Supabase client initialization
β βββ main.tsx # React app initialization
β βββ vite-env.d.ts # Vite type definitions
βββ supabase/ # Database schema and migrations
β βββ sql/
β βββ 001_schema.sql # Database schema for voices and auth
βββ docs/ # Documentation
β βββ diagrams/ # Architecture diagrams
β βββ ADD_VOICE.md # Voice creation feature specification
βββ server.js # Express server for production
βββ index.html # HTML template with env injection
βββ docker-compose.yml # Docker deployment configuration
βββ Dockerfile # Container build instructions
βββ vite.config.ts # Vite configuration
βββ tsconfig.json # TypeScript configuration (ES2022)
βββ .env.example # Example environment configuration
- Custom Hooks Architecture: Modular, composable hooks following React best practices
useTTS: TTS API integration with loading and error statesuseAudioPlayer: Audio playback management with cleanupuseHistory: IndexedDB persistence with atomic operationsuseObjectUrls: Automatic blob URL lifecycle managementuseAuth: Supabase authentication state managementuseVoices: Dynamic voice listing with real-time updatesuseVoiceCreation: Voice upload, recording, and submission workflow
- React Contexts:
ThemeContext: Light/dark mode theming with MUIAuthContext: Supabase auth state and user role management
- IndexedDB: Persistent storage of audio history via
idb-keyvalv6+ - Object URLs: Efficient audio playback without base64 encoding
- Real-time Updates: Supabase real-time subscriptions for voice approval status
- Format: Opus codec in OGG container
- Storage: Binary blobs in IndexedDB with atomic operations
- Playback: HTML5 Audio API with fallback error handling
- Download: Dynamic anchor element creation
- URL Management: Automatic creation and cleanup to prevent memory leaks
- Upload Support: .wav, .ogg, .m4a, .mp3 formats accepted
- Duration Validation: Client and server-side enforcement (30-60 seconds)
- Audio Normalization: FFmpeg conversion to standardized Opus format
- File Storage: S3 with prefixes (uploads/ for raw, processed/ for final)
- Quality Assurance: Admin approval workflow before voice activation
- JWT Validation: Supabase JWT tokens validated on bridge service
- Role-Based Access: Database-enforced permissions per endpoint
- Request Signing: All API calls require valid authentication
- File Security: Presigned URLs with expiration for uploads
- Input Validation: Duration, format, and size validation on both client and server
- Custom Hooks: Separation of concerns with reusable logic
- TypeScript: Full type safety with ES2022 target
- Memoization: Optimized performance with
useMemoanduseCallback - Error Boundaries: Proper error handling throughout the application
- Clean Code: Reduced component complexity (App.tsx: 286 β 198 lines)
The Express server injects runtime environment variables into index.html:
window.__ENV__ = {
VITE_OPEN_AI_TTS_ENDPOINT: "...",
VITE_OPEN_AI_TTS_MODEL: "...",
VITE_OPEN_AI_TTS_VOICES: "..."
};- Multi-stage build for optimized production image
- Node.js 18 Alpine base image
- Nginx Proxy Manager compatible
- Uses external
shared_netnetwork - No host ports exposed (internal-only)
- Health check endpoint at
/health
Runtime environment variables are passed through Docker environment:
environment:
- VITE_OPEN_AI_TTS_ENDPOINT=${TTS_ENDPOINT}
- VITE_OPEN_AI_TTS_MODEL=${TTS_MODEL}The voice creation feature is documented in ADD_VOICE.md. This comprehensive specification covers:
- Database Schema: Supabase tables, triggers, and RLS policies
- API Endpoints: Bridge service endpoints for voice management
- Frontend Components: React components for recording, uploading, and approval
- Security: Authentication, authorization, and input validation
- Migration: Steps to upgrade from static to dynamic voices
-
Database Setup:
# Run the schema in Supabase SQL Editor cat supabase/sql/001_schema.sql | pbcopy # Copy to clipboard # Paste and execute in Supabase dashboard
-
Environment Variables:
cp .env.example .env.local # Fill in your Supabase and S3 credentials -
Test Workflow:
- Start the bridge service separately (see ADD_VOICE.md)
- Run
npm run devfor the frontend - Test auth flow with different user roles
- Verify voice creation and approval pipeline
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add some amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
When contributing to the voice creation feature:
- Follow the specification in
ADD_VOICE.md - Test all user roles and permissions
- Verify audio processing and storage
- Ensure proper error handling and validation
- Update documentation as needed
This project is licensed under the MIT License - see the LICENSE file for details.
- OpenAI for the TTS API specification
- Material-UI for the excellent React component library
- Vite for the fast development tooling
- IndexedDB for client-side persistence