Thanks to visit codestin.com
Credit goes to Github.com

Skip to content

szymontex/filharmonia-ai

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

9 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🎡 Filharmonia AI

AI-powered concert audio analysis system using PyTorch Audio Spectrogram Transformer (AST) for automatic classification and segmentation of philharmonic concert recordings.

πŸ“‹ Features

  • 🎼 Audio Classification: Automatically classifies audio into 5 categories:

    • 🎡 MUSIC - orchestral music
    • πŸ‘ APPLAUSE - audience applause
    • πŸ—£οΈ SPEECH - announcements, speeches
    • πŸ‘₯ PUBLIC - audience noise, intermission
    • 🎻 TUNING - instrument tuning
  • 🎨 Visual Waveform Editor: DAW-style interface for reviewing and correcting predictions

  • πŸ€– Self-Improving ML Loop: Export corrected segments β†’ retrain model β†’ improved accuracy

  • πŸ“Š Model Management: Train, compare, and switch between models with measured accuracy

  • πŸ“ˆ Uncertainty Review: Filter low-confidence predictions for manual review

  • ⚑ GPU Accelerated: CUDA support for fast training and inference

πŸ“Έ Screenshots

Main Dashboard

Main Dashboard Overview of available models, training data statistics, and recent analyses

File Browser & Sorting

File Browser Browse unsorted recordings and organize by date using ID3 tags

Sort Recordings Automatic sorting of concert recordings by date

Waveform Editor

CSV Waveform Editor Visual waveform editor with color-coded segments for each class

CSV Editor - Detailed View Edit predictions, adjust boundaries, and export corrected segments

Model Management

Model Versioning Train new models, compare accuracy, and activate best-performing models

Uncertainty Review (Active Learning)

Uncertainty Review Review low-confidence predictions for manual verification and export

πŸ—οΈ Architecture

filharmonia-ai/
β”œβ”€β”€ backend/              # FastAPI + PyTorch backend
β”‚   β”œβ”€β”€ app/
β”‚   β”‚   β”œβ”€β”€ api/v1/      # REST API endpoints
β”‚   β”‚   β”œβ”€β”€ services/    # Core business logic
β”‚   β”‚   β”‚   β”œβ”€β”€ ast_training.py    # Model training service
β”‚   β”‚   β”‚   β”œβ”€β”€ ast_inference.py   # Model inference service
β”‚   β”‚   β”‚   └── analyze.py         # Audio analysis pipeline
β”‚   β”‚   └── config.py    # Settings and paths
β”‚   β”œβ”€β”€ pytorch_dataset.py         # Custom PyTorch dataset
β”‚   └── requirements.txt
β”‚
β”œβ”€β”€ frontend/            # React + TypeScript + Vite
β”‚   β”œβ”€β”€ src/
β”‚   β”‚   β”œβ”€β”€ components/  # UI components
β”‚   β”‚   β”œβ”€β”€ pages/       # Page views
β”‚   β”‚   └── api/         # API client
β”‚   └── package.json
β”‚
└── docs/                # Screenshots and documentation

πŸš€ Quick Start

First-Time Installation

Run the automated setup script to install all dependencies:

Windows:

setup.bat

Linux/Mac:

chmod +x setup.sh
./setup.sh

The setup script will automatically:

  • βœ… Check Python and Node.js installation
  • βœ… Create Python virtual environment
  • βœ… Install PyTorch with CUDA support
  • βœ… Install all backend and frontend dependencies
  • βœ… Verify installation is complete
  • βœ… Create configuration file from template

Note: Don't run pip install -r requirements.txt directly - PyTorch CUDA requires special handling which the setup script does automatically.


Running the Application

After installation, start both servers:

Windows:

start.bat

Linux/Mac:

./start.sh

# To stop servers:
./stop.sh

The application will be available at:


Prerequisites

  • Python 3.11+
  • Node.js 18+
  • NVIDIA GPU (optional but recommended for training)
  • CUDA 12.x (if using GPU)

Configuration

After installation, configure your data directory (optional):

  1. Edit .env file (created by setup script)
  2. Set FILHARMONIA_BASE_DIR to your desired location
  3. If not set, defaults to project_root/FILHARMONIA_DATA/

πŸ“Š Model Training

The system uses Audio Spectrogram Transformer (AST) from MIT:

  • Pre-trained on AudioSet-10M
  • Fine-tuned on concert recordings
  • ~86M parameters
  • Training time: ~4h on RTX 3080 Ti

Training new model:

  1. Prepare training data in TRAINING DATA/DATA/ folder (5 class subfolders)
  2. Open web UI β†’ "Training" tab
  3. Click "Start Training"
  4. Monitor progress in real-time
  5. Click "πŸ“Š Measure" to evaluate accuracy
  6. Click "Activate" to deploy new model

🎯 Performance

Current best model (ast_20251009_222204.pth):

  • Test Accuracy: 97.75%
  • Per-class accuracy:
    • APPLAUSE: 100%
    • MUSIC: 100%
    • PUBLIC: 96.2%
    • SPEECH: 100%
    • TUNING: 85.7%

πŸ€– Pre-trained Model

Optional: Download pre-trained model (trained on classical concert recordings):

πŸ€— Hugging Face Model Hub (recommended)

Model specs:

  • Architecture: Audio Spectrogram Transformer (MIT/PSLA)
  • Test accuracy: 97.75%
  • Training data: ~1200 min of classical concert recordings
  • Size: 1.03 GB

Installation:

  1. Download ast_20251009_222204.pth from Hugging Face
  2. Place in RECOGNITION_MODELS/ast_active.pth
  3. Start backend and run analysis

Important: This model is trained on classical philharmonic concerts. For other music genres (rock, jazz, pop), you'll need to retrain with your own data using the web UI.

πŸ”§ Configuration

Edit backend/app/config.py to configure:

  • Training data paths
  • Model save location
  • Sample rate & duration
  • GPU/CPU device selection

πŸ“ Workflow

  1. Sort Recordings: Organize MP3 files by date using ID3 tags
  2. Analyze: Process concerts through AST model (~5 min per 1h concert)
  3. Review: Visual waveform editor for corrections
  4. Export: Generate tracklists for clients
  5. Train: Export corrected segments β†’ retrain model β†’ improved accuracy

πŸ› οΈ Tech Stack

Backend:

  • FastAPI (REST API)
  • PyTorch + torchaudio (ML)
  • HuggingFace Transformers (AST model)
  • scikit-learn (dataset splitting)

Frontend:

  • React 18 + TypeScript
  • Vite (build tool)
  • TanStack Query (data fetching)
  • Recharts (visualizations)
  • Tailwind CSS (styling)

πŸ“š Documentation

🀝 Contributing

Contributions are welcome! Feel free to open issues or submit pull requests.

πŸ“„ License

MIT License - see LICENSE file for details.

πŸŽ‰ Achievements

  • βœ… MVP completed (Oct 2025)
  • βœ… Migrated from Keras CNN to PyTorch AST
  • βœ… Achieved 97.75% test accuracy
  • βœ… Reduced monthly processing time from 4-6h to ~30 min
  • βœ… Implemented self-improving ML loop

Last Updated: December 2025 Status: πŸš€ Production Ready (MVP)

About

Audio classification for concert recordings. PyTorch AST model, FastAPI + React, 97.75% accuracy.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published