Open Source AI Database for Voice Agent Transcripts
A comprehensive AI-powered database and analytics platform for storing, analyzing, and extracting insights from voice agent call transcripts. Built with FastAPI, React/Next.js, and PostgreSQL, featuring advanced AI analysis, transcript enhancement, and intelligent data extraction.
Voice Summary is an open-source AI database specifically designed for voice agent transcripts and call analytics. It provides:
- π€ AI-Powered Transcript Analysis - Advanced machine learning for call outcome analysis
- π Intelligent Data Extraction - Automatic extraction of customer information and business insights
- π·οΈ Smart Classification & Labeling - AI-driven call categorization and sentiment analysis
- π΅ Advanced Audio Processing - Voice analysis with pause detection and conversation health scoring
- βοΈ Cloud-Ready Architecture - Built with FastAPI, React, PostgreSQL, and AWS S3 integration
Perfect for call centers, voice bot developers, customer service teams, and AI researchers who need comprehensive voice analytics and transcript management.
- β¨ Features
- πΌοΈ What you will get
- π Quick Start
- π₯ Data Ingestion
- π AI-Powered Data Extraction Pipeline
- π API Endpoints
- π― Use Cases
- ποΈ Project Structure
- π οΈ Development
- π Deployment
- π€ Contributing
- π Documentation
- π Troubleshooting
- π License
- π€ AI-Powered Transcript Analysis: Advanced AI models for call outcome analysis, quality assessment, and performance evaluation
- π Intelligent Data Extraction: Automatic extraction of customer information, call reasons, and business insights from transcripts
- π·οΈ Smart Classification & Labeling: AI-driven call categorization, sentiment analysis, and business action labeling
- π Enhanced Transcript Processing: Automatic timestamp alignment, turn-by-turn conversation analysis, and transcript normalization
- π΅ Advanced Audio Analysis: AI-powered voice analysis with pause detection, speech segmentation, and conversation health scoring
- βοΈ S3 Integration: Secure audio file storage with automatic format detection
- π Modern Web UI: Beautiful React/Next.js frontend with real-time timeline visualization
- π Flexible Data Ingestion: Support for both direct API calls and Bolna platform integration
- π FastAPI Backend: High-performance async API with automatic documentation
- ποΈ PostgreSQL Database: Robust data storage with Alembic migrations
- β‘ Asynchronous Processing: Real-time API responses with background AI processing
- Python 3.9+
- Node.js 18+
- PostgreSQL 12+
- AWS S3 bucket (for audio storage)
- OpenAI API key (for AI-powered analysis)
# Clone the repository
git clone https://github.com/DrDroidLab/voicesummary.git
cd voicesummary
# Run the complete setup script
./setup.sh
The setup script will:
- β Check all prerequisites
- β Create Python virtual environment
- β Install Python dependencies
- β Install Node.js dependencies
- β Set up database and run migrations
- β Create convenient start scripts
If you prefer manual setup:
# 1. Clone and navigate
git clone https://github.com/DrDroidLab/voicesummary.git
cd voicesummary
# 2. Setup Python backend
uv sync
# 3. Setup frontend
cd frontend
npm install
cd ..
# 4. Configure environment
cp env.example .env
# Edit .env with your credentials
# 5. Setup database
alembic upgrade head
# Option 1: Use the generated script
./start_backend.sh
# Option 2: Manual start
uv run uvicorn app.main:app --reload --host 0.0.0.0 --port 8000
# Option 1: Use the generated script (in new terminal)
./start_frontend.sh
# Option 2: Manual start
cd frontend
npm run dev
- Frontend: http://localhost:3000
- Backend API: http://localhost:8000
- API Documentation: http://localhost:8000/docs
- Interactive API: http://localhost:8000/redoc
Create a .env
file in the project root:
# Database Configuration
DATABASE_URL=postgresql://username:password@localhost:5432/voicesummary
# AWS S3 Configuration
AWS_ACCESS_KEY_ID=your_access_key
AWS_SECRET_ACCESS_KEY=your_secret_key
AWS_REGION=us-east-1
S3_BUCKET_NAME=your-audio-bucket
# OpenAI API (required for AI-powered analysis)
OPENAI_API_KEY=your_openai_api_key
# Optional: Bolna API (if using Bolna platform)
BOLNA_API_KEY=your_bolna_api_key
# Create PostgreSQL database
createdb voicesummary
# Run migrations
alembic upgrade head
Voice Summary supports two main data ingestion methods for voice agent transcripts:
For full AI functionality, you need to add your OpenAI API key to the environment variables:
OPENAI_API_KEY=your_openai_api_key
What happens with OpenAI API key:
- β AI Transcript Analysis: Intelligent call outcome analysis, quality assessment, and improvement areas
- β Agent Performance Evaluation: AI-powered goal achievement analysis and script adherence evaluation
- β Executive Summaries: Intelligent call summaries with key insights and recommendations
- β Data Extraction Pipeline: Automatic extraction, classification, and labeling of call data using AI
What happens without OpenAI API key:
- β Audio Analysis: Pause detection, speech segmentation, conversation health scoring
- β Basic Processing: Audio file processing and S3 storage
- β No AI Transcript Analysis: Call outcome, quality metrics, and improvement areas won't be generated
- β No Agent Evaluation: Performance analysis and script adherence won't be available
- β No Data Extraction: Structured data extraction, classification, and labeling won't be available
Use the REST API to directly ingest voice agent call data with your own S3 storage:
# Create a new call record
curl -X POST "http://localhost:8000/api/calls/" \
-H "Content-Type: application/json" \
-d '{
"call_id": "call_123",
"transcript": {
"turns": [
{
"role": "AGENT",
"content": "Hello, how can I help you?",
"timestamp": "2025-01-01T10:00:00Z"
},
{
"role": "USER",
"content": "I need help with my order",
"timestamp": "2025-01-01T10:00:01Z"
}
]
},
"audio_file_url": "https://your-s3-bucket.s3.amazonaws.com/audio/call_123.mp3",
"timestamp": "2025-01-01T10:00:00Z"
}'
Benefits:
- β Full control over S3 storage
- β Custom audio processing pipelines
- β Integration with any voice agent platform
- β Real-time data ingestion
- β AI-powered analysis and insights
Use the built-in Bolna integration for automatic voice agent call processing:
# Run the Bolna fetcher
python app/integrations/fetch_bolna_calls_simple.py
Benefits:
- β Automatic call discovery and processing
- β Built-in audio analysis and enhancement
- β Transcript normalization and timestamp alignment
- β Seamless S3 upload and storage
- β AI-powered insights and analysis
Voice Summary includes a sophisticated AI-driven data extraction pipeline that automatically processes voice agent call transcripts to extract structured information, classify calls, and apply relevant business labels.
- Customer Information: Name, email, phone, account number, customer ID
- Product Mentions: Products and services discussed during the call
- Call Reasons: Primary and secondary reasons for the call
- Resolution Info: How the call was resolved and follow-up requirements
- Call Category: customer_support, technical_support, billing_inquiry, sales_inquiry, etc.
- Sentiment Analysis: positive, neutral, negative, mixed
- Complexity Level: simple, moderate, complex, very_complex
- Urgency: urgent, escalation_required, follow_up_needed
- Business Actions: satisfaction_survey, training_opportunity, upsell_opportunity
The AI pipeline is configured through config/agent_prompts.yaml
. You can easily add new extraction, classification, or labeling prompts:
# Add new extraction prompt
extraction:
call_duration:
prompt: |
Extract call duration and timing information from this transcript: {transcript}
Return only a JSON object with: total_duration_seconds, agent_speaking_time, customer_speaking_time
# Add new classification prompt
classification:
language_detected:
prompt: |
Detect the primary language used in this call.
Categories: english, spanish, french, other
Return only the category name.
Transcript: {transcript}
# Add new labeling prompt
labeling:
- label: "high_value_customer"
prompt: |
Determine if this call involves a high-value customer.
High-value indicators: premium account, large transactions, VIP status
Return only "yes" or "no".
Transcript: {transcript}
The AI pipeline runs automatically when calls are created via the API.
# Process a specific call through AI pipeline
curl -X POST "http://localhost:8000/api/calls/{call_id}/process-data-pipeline" \
-H "Content-Type: application/json" \
-d '{"call_id": "call_123", "force_reprocess": false}'
# Get AI-extracted data
curl "http://localhost:8000/api/calls/{call_id}/extracted-data"
# Check AI processing status
curl "http://localhost:8000/api/calls/{call_id}/extracted-data/status"
- Navigate to any call detail page
- Click on the "Extracted Data" tab
- View AI processing status, extracted information, classifications, and labels
- Reprocess data if needed
# Process all existing calls through AI pipeline
python test_all_calls_pipeline.py
# Force reprocess all calls (including already processed ones)
python test_all_calls_pipeline.py --force
# Show help
python test_all_calls_pipeline.py --help
{
"extraction_data": {
"customer_info": {
"customer_name": "John Smith",
"account_number": "123456789",
"customer_email": null
},
"call_reason": {
"primary_reason": "billing inquiry",
"urgency_level": "medium",
"call_type": "inquiry"
}
},
"classification_data": {
"call_category": "billing_inquiry",
"sentiment_analysis": "positive",
"complexity_level": "simple"
},
"labeling_data": {
"urgent": false,
"satisfaction_survey": true,
"follow_up_needed": false
}
}
- Add New AI Prompts: Simply add new sections to
config/agent_prompts.yaml
- No Code Changes: New AI prompts are automatically picked up by the pipeline
- Error Handling: Missing prompts are gracefully skipped
- Parallel Processing: All AI prompts run simultaneously for maximum efficiency
voicesummary/
βββ app/ # Backend application
β βββ api/ # API endpoints
β β βββ calls.py # Call management API
β βββ integrations/ # External platform integrations
β β βββ fetch_bolna_calls_simple.py # Bolna integration
β βββ utils/ # Utility modules
β β βββ audio_processor.py # Audio analysis & processing
β β βββ improved_voice_analyzer.py # AI voice analysis
β β βββ call_data_pipeline.py # AI data extraction pipeline
β β βββ s3.py # S3 operations
β βββ models.py # Database models
β βββ schemas.py # API schemas
β βββ main.py # FastAPI application
βββ frontend/ # React/Next.js frontend
β βββ app/ # Next.js app directory
β βββ components/ # React components
β β βββ AudioPlayer.tsx # Audio playback
β β βββ EnhancedTimeline.tsx # Timeline visualization
β β βββ ExtractedData.tsx # AI data extraction display
β β βββ TranscriptViewer.tsx # Transcript display
β βββ types/ # TypeScript type definitions
βββ config/ # Configuration files
β βββ agent_prompts.yaml # AI data extraction pipeline prompts
βββ alembic/ # Database migrations
βββ setup.sh # Complete setup script
βββ start_backend.sh # Backend start script
βββ start_frontend.sh # Frontend start script
βββ requirements.txt # Python dependencies
Method | Endpoint | Description |
---|---|---|
POST |
/api/calls/ |
Create new call record |
GET |
/api/calls/ |
List all calls (paginated) |
GET |
/api/calls/{call_id} |
Get call details |
GET |
/api/calls/{call_id}/audio |
Download audio file |
GET |
/api/calls/{call_id}/transcript |
Get transcript JSON |
PUT |
/api/calls/{call_id} |
Update call record |
DELETE |
/api/calls/{call_id} |
Delete call record |
Method | Endpoint | Description |
---|---|---|
POST |
/api/calls/{call_id}/process-audio |
Process audio file |
Method | Endpoint | Description |
---|---|---|
POST |
/api/calls/{call_id}/process-data-pipeline |
Process call through AI data extraction pipeline |
GET |
/api/calls/{call_id}/extracted-data |
Get AI-extracted data for a call |
GET |
/api/calls/{call_id}/extracted-data/status |
Get AI processing status of extracted data |
- AI-Powered Call Quality Assessment: Intelligent analysis of conversation health, pause patterns, and interruption rates
- Agent Performance Monitoring: AI-driven tracking of agent response times and conversation flow
- Issue Detection: AI identification of call termination problems and audio quality issues
- Intelligent Call Classification: Automatic categorization of calls by type, sentiment, and complexity
- Business Intelligence Extraction: AI-powered extraction of customer information and call reasons
- AI Speech Analysis: Advanced conversation pattern analysis and speech characteristics
- Machine Learning Training: Use AI-enhanced transcripts for training ML models
- Quality Assurance: AI-powered monitoring and improvement of voice agent performance
- AI Call Analytics: Generate intelligent reports on call volumes, durations, and outcomes
- Customer Experience Analysis: AI-powered conversation sentiment and satisfaction metrics
- Operational Insights: AI identification of bottlenecks and optimization opportunities
- Data Extraction: Automatically extract structured data from call transcripts using AI
- Smart Labeling: Apply AI-driven business-relevant labels for follow-up actions
# Run with auto-reload
uv run uvicorn app.main:app --reload
# Run tests
pytest
# Format code
black .
isort .
cd frontend
# Install dependencies
npm install
# Start development server
npm run dev
# Build for production
npm run build
# Create new migration
alembic revision --autogenerate -m "Description of changes"
# Apply migrations
alembic upgrade head
# Rollback migration
alembic downgrade -1
# Install production dependencies
uv sync --frozen
# Set production environment variables
export ENVIRONMENT=production
# Run production server
gunicorn app.main:app -w 4 -k uvicorn.workers.UvicornWorker
# Build frontend
cd frontend && npm run build
# Required
DATABASE_URL=postgresql://user:pass@host:5432/db
AWS_ACCESS_KEY_ID=your_key
AWS_SECRET_ACCESS_KEY=your_secret
S3_BUCKET_NAME=your_bucket
# OpenAI API (required for AI-powered analysis)
OPENAI_API_KEY=your_openai_api_key
# Optional
ENVIRONMENT=production
DEBUG=false
LOG_LEVEL=info
We welcome contributions! Here's how to get started:
- Fork the repository
- Create a feature branch:
git checkout -b feature/amazing-feature
- Make your changes and add tests
- Commit your changes:
git commit -m 'Add amazing feature'
- Push to the branch:
git push origin feature/amazing-feature
- Open a Pull Request
- Follow PEP 8 for Python code
- Use TypeScript for frontend code
- Add tests for new features
- Update documentation as needed
- Ensure all tests pass before submitting
- API Reference: http://localhost:8000/docs (when running)
- AI Audio Processing: See
app/utils/audio_processor.py
- AI Data Extraction: See
app/utils/call_data_pipeline.py
- Frontend Components: See
frontend/components/
- Database Models: See
app/models.py
# Check Python version
python3 --version # Should be 3.9+
# Verify uv installation
uv --version # Should show uv version
# Check dependencies
pip list | grep fastapi
# Check Node.js version
node --version # Should be 18+
# Clear npm cache
npm cache clean --force
# Reinstall dependencies
rm -rf node_modules package-lock.json
npm install
# Test PostgreSQL connection
psql $DATABASE_URL -c "SELECT 1"
# Check if database exists
psql -l | grep voicesummary
# Run migrations
alembic upgrade head
# Test AWS credentials
aws sts get-caller-identity
# Verify bucket permissions
aws s3 ls s3://your-bucket-name
# Test OpenAI API key
curl -H "Authorization: Bearer $OPENAI_API_KEY" \
https://api.openai.com/v1/models
# Check API key in environment
echo $OPENAI_API_KEY
- Issues: Create a GitHub issue with detailed error information
- Discussions: Use GitHub Discussions for questions and ideas
- Reachout: Reach out to the author at [email protected]
This project is licensed under the MIT License - see the LICENSE file for details.
- FastAPI for the excellent web framework
- Next.js for the powerful React framework
- PostgreSQL for reliable data storage
- AWS S3 for scalable file storage
- OpenAI for AI-powered analysis capabilities
- Librosa for audio processing capabilities
If you find this project useful, please give it a βοΈ on GitHub!
Voice Analytics: AI-powered call analysis, transcript processing, voice agent analytics
Call Center Solutions: Customer service analytics, call quality assessment, performance monitoring
AI Database: Machine learning, natural language processing, intelligent data extraction
Voice Bot Analytics: Conversation intelligence, speech analytics, chatbot performance
Open Source: Free, community-driven, extensible voice analytics platform
FastAPI: High-performance Python web framework for voice data APIs
React/Next.js: Modern frontend for voice analytics dashboard
PostgreSQL: Robust database for voice transcript storage and analysis
OpenAI Integration: Advanced AI models for transcript analysis and insights
Built with β€οΈ for the voice agent community
Voice Summary - The Open Source AI Database for Voice Agent Transcripts