AI-powered concert audio analysis system using PyTorch Audio Spectrogram Transformer (AST) for automatic classification and segmentation of philharmonic concert recordings.
-
πΌ Audio Classification: Automatically classifies audio into 5 categories:
- π΅ MUSIC - orchestral music
- π APPLAUSE - audience applause
- π£οΈ SPEECH - announcements, speeches
- π₯ PUBLIC - audience noise, intermission
- π» TUNING - instrument tuning
-
π¨ Visual Waveform Editor: DAW-style interface for reviewing and correcting predictions
-
π€ Self-Improving ML Loop: Export corrected segments β retrain model β improved accuracy
-
π Model Management: Train, compare, and switch between models with measured accuracy
-
π Uncertainty Review: Filter low-confidence predictions for manual review
-
β‘ GPU Accelerated: CUDA support for fast training and inference
Overview of available models, training data statistics, and recent analyses
Browse unsorted recordings and organize by date using ID3 tags
Automatic sorting of concert recordings by date
Visual waveform editor with color-coded segments for each class
Edit predictions, adjust boundaries, and export corrected segments
Train new models, compare accuracy, and activate best-performing models
Review low-confidence predictions for manual verification and export
filharmonia-ai/
βββ backend/ # FastAPI + PyTorch backend
β βββ app/
β β βββ api/v1/ # REST API endpoints
β β βββ services/ # Core business logic
β β β βββ ast_training.py # Model training service
β β β βββ ast_inference.py # Model inference service
β β β βββ analyze.py # Audio analysis pipeline
β β βββ config.py # Settings and paths
β βββ pytorch_dataset.py # Custom PyTorch dataset
β βββ requirements.txt
β
βββ frontend/ # React + TypeScript + Vite
β βββ src/
β β βββ components/ # UI components
β β βββ pages/ # Page views
β β βββ api/ # API client
β βββ package.json
β
βββ docs/ # Screenshots and documentation
Run the automated setup script to install all dependencies:
Windows:
setup.batLinux/Mac:
chmod +x setup.sh
./setup.shThe setup script will automatically:
- β Check Python and Node.js installation
- β Create Python virtual environment
- β Install PyTorch with CUDA support
- β Install all backend and frontend dependencies
- β Verify installation is complete
- β Create configuration file from template
Note: Don't run
pip install -r requirements.txtdirectly - PyTorch CUDA requires special handling which the setup script does automatically.
After installation, start both servers:
Windows:
start.batLinux/Mac:
./start.sh
# To stop servers:
./stop.shThe application will be available at:
- Frontend: http://localhost:5173
- Backend API: http://localhost:8000
- Python 3.11+
- Node.js 18+
- NVIDIA GPU (optional but recommended for training)
- CUDA 12.x (if using GPU)
After installation, configure your data directory (optional):
- Edit
.envfile (created by setup script) - Set
FILHARMONIA_BASE_DIRto your desired location - If not set, defaults to
project_root/FILHARMONIA_DATA/
The system uses Audio Spectrogram Transformer (AST) from MIT:
- Pre-trained on AudioSet-10M
- Fine-tuned on concert recordings
- ~86M parameters
- Training time: ~4h on RTX 3080 Ti
Training new model:
- Prepare training data in
TRAINING DATA/DATA/folder (5 class subfolders) - Open web UI β "Training" tab
- Click "Start Training"
- Monitor progress in real-time
- Click "π Measure" to evaluate accuracy
- Click "Activate" to deploy new model
Current best model (ast_20251009_222204.pth):
- Test Accuracy: 97.75%
- Per-class accuracy:
- APPLAUSE: 100%
- MUSIC: 100%
- PUBLIC: 96.2%
- SPEECH: 100%
- TUNING: 85.7%
Optional: Download pre-trained model (trained on classical concert recordings):
π€ Hugging Face Model Hub (recommended)
Model specs:
- Architecture: Audio Spectrogram Transformer (MIT/PSLA)
- Test accuracy: 97.75%
- Training data: ~1200 min of classical concert recordings
- Size: 1.03 GB
Installation:
- Download
ast_20251009_222204.pthfrom Hugging Face - Place in
RECOGNITION_MODELS/ast_active.pth - Start backend and run analysis
Important: This model is trained on classical philharmonic concerts. For other music genres (rock, jazz, pop), you'll need to retrain with your own data using the web UI.
Edit backend/app/config.py to configure:
- Training data paths
- Model save location
- Sample rate & duration
- GPU/CPU device selection
- Sort Recordings: Organize MP3 files by date using ID3 tags
- Analyze: Process concerts through AST model (~5 min per 1h concert)
- Review: Visual waveform editor for corrections
- Export: Generate tracklists for clients
- Train: Export corrected segments β retrain model β improved accuracy
Backend:
- FastAPI (REST API)
- PyTorch + torchaudio (ML)
- HuggingFace Transformers (AST model)
- scikit-learn (dataset splitting)
Frontend:
- React 18 + TypeScript
- Vite (build tool)
- TanStack Query (data fetching)
- Recharts (visualizations)
- Tailwind CSS (styling)
- Development Guide - Setup, architecture, and development workflow
- API Reference - Complete REST API documentation
Contributions are welcome! Feel free to open issues or submit pull requests.
MIT License - see LICENSE file for details.
- β MVP completed (Oct 2025)
- β Migrated from Keras CNN to PyTorch AST
- β Achieved 97.75% test accuracy
- β Reduced monthly processing time from 4-6h to ~30 min
- β Implemented self-improving ML loop
Last Updated: December 2025 Status: π Production Ready (MVP)