Thanks to visit codestin.com
Credit goes to github.com

Skip to content

louisaberdeen/signal-MAE

Repository files navigation

This projecct is my first truly vibe-coded project so use as your own risk! I have to say Opus 4.5 has the sauce.

FiftyOne Visualisation

FiftyOne is a very neat tool for visualising all sort of datsets. Here for example we have the ESC-50 dataset of environmental sources (helicopters, dogs, etc.). Besides being able to easily view the spectrograms of our audio files we can incorporate embeddings and similiarty search directly into our filtering workflows. To demonstate this I quickly trained a MAE model based on AudioMAE++: https://arxiv.org/abs/2507.10464 to generate embeddings for this project. Standard view You can simply select on a sound file of interest and like magic instantly find your files with similar acoustic structure and texture. WOW! Similarity view I simulated the sound files being collected across Dartmoor, a national park near where I grew up, to simulate using geospatial analysis to find trends. Map view We can use the embeddings to justify intutions we have about relationships between sounds. For example, in this plot we can see that clapping and a helicopter are more similar to each other than a rooster or chicken. Embedding view

AudioMAE++ Framework

A modular audio/signal machine learning framework with plugin-based architecture for training masked autoencoders. Supports ESC-50 environmental sound classification and extensibility to RF signals (RadioML).

Features

  • Plugin Registry System: Decorator-based registration for models, data loaders, and transforms
  • Self-Contained Notebooks: Generate portable notebooks for Google Colab
  • Automatic Test Generation: Verify plugin interface compliance
  • FiftyOne Integration: Visualize embeddings with similarity search and UMAP/t-SNE

Installation

Using uv (Recommended)

uv is a fast Python package installer.

# Install uv (if not already installed)
curl -LsSf https://astral.sh/uv/install.sh | sh

# Create virtual environment and install dependencies
uv venv .venv
source .venv/bin/activate
uv pip install -e ".[all]"

Using pip

python -m venv .venv
source .venv/bin/activate
pip install -e ".[all]"

Dependency Groups

Group Command Includes
Base pip install -e . torch, numpy, pandas, einops, tqdm, pillow
Audio pip install -e ".[audio]" + librosa, scipy
Visualization pip install -e ".[visualization]" + fiftyone, umap-learn, matplotlib
Training pip install -e ".[training]" + mlflow
Dev pip install -e ".[dev]" + pytest, pytest-cov
All pip install -e ".[all]" Everything

Quick Start

# Activate environment
source .venv/bin/activate

# Verify plugins are registered
python -c "from src import model_registry; print(model_registry.list())"
# Output: ['audiomae++', 'baseline']

Project Structure

.
├── src/                        # Core framework
│   ├── registry.py            # PluginRegistry class
│   ├── config.py              # Config dataclass
│   ├── models/                # Model plugins
│   │   ├── audiomae.py       # AudioMAE++ implementation
│   │   ├── baseline.py       # Baseline MAE
│   │   └── classifier.py     # Classification wrapper
│   ├── data/                  # Data loader plugins
│   │   ├── esc50.py          # ESC-50 loader
│   │   └── custom.py         # Generic + RF loaders
│   ├── transforms/            # Transform plugins
│   │   ├── audio.py          # Audio spectrograms
│   │   └── rf.py             # RF spectrograms
│   ├── training/              # Loss functions
│   └── embeddings/            # Embedding utilities
├── tests/                     # Test suite
│   ├── generate_tests.py     # Auto test generation
│   └── generated/            # Generated tests (gitignored)
├── notebooks/                 # Notebook generation
│   ├── generate.py           # NotebookGenerator
│   └── generated/            # Generated notebooks (gitignored)
├── data/                      # Datasets
│   └── ESC-50-master/        # ESC-50 dataset
└── checkpoints/               # Model checkpoints

Usage

Using the Plugin Registry

from src import model_registry, data_loader_registry, transform_registry
from src.config import Config

# Create a model
config = Config(img_size=224, patch_size=16, embed_dim=768)
model = model_registry.create("audiomae++", config)

# Create a data loader
from pathlib import Path
loader = data_loader_registry.create("esc50", Path("data/ESC-50-master"))
metadata = loader.load_metadata()

# Create a transform
transform = transform_registry.create("audio_spectrogram", img_size=224)

Extract Embeddings

import torch
from src import model_registry
from src.config import Config

config = Config()
model = model_registry.create("audiomae++", config)
model.eval()

# Input: batch of spectrograms [B, 3, 224, 224]
x = torch.randn(4, 3, 224, 224)

# Get embeddings
embedding = model.get_embedding(x, pooling_mode="mean")  # [4, 768]

Fine-tune for Classification

from src.models.classifier import AudioMAEClassifier

# Wrap pretrained model for classification
classifier = AudioMAEClassifier(model, num_classes=50, freeze_encoder=True)
logits = classifier(spectrograms)  # [B, 50]

Commands

Generate Tests

After adding new plugins, generate tests to verify interface compliance:

python tests/generate_tests.py

This creates test files in tests/generated/:

  • test_models_interface.py - Model ABC compliance
  • test_data_loaders_interface.py - Data loader ABC compliance
  • test_transforms_interface.py - Transform ABC compliance
  • test_model_architectures.py - Various input size compatibility

Run Tests

# Run all generated tests
python -m pytest tests/generated/ -v

# Run specific test file
python -m pytest tests/generated/test_model_architectures.py -v

# Run with coverage
python -m pytest tests/generated/ --cov=src

Generate Training Notebooks

Create self-contained notebooks for Google Colab:

# Generate notebook for AudioMAE++ on ESC-50
python notebooks/generate.py --model audiomae++ --dataset esc50

# List available modules
python notebooks/generate.py --list-modules

Generated notebooks are saved to notebooks/generated/ and contain all code inline (no external imports required).

Adding New Plugins

New Model

# src/models/my_model.py
from src.registry import model_registry
from src.models.base import BaseAutoencoder

@model_registry.register("my-model", version="1.0")
class MyModel(BaseAutoencoder):
    def __init__(self, config):
        super().__init__()
        self.config = config
        # ... build model

    def forward_encoder(self, x, mask_ratio=0.75):
        # Return: latent, mask, ids_restore
        ...

    def get_embedding(self, x, pooling_mode="mean"):
        # Return: embedding [B, embed_dim]
        ...

    @property
    def embed_dim(self): return self.config.embed_dim

    @property
    def num_patches(self): return self.config.num_patches

New Data Loader

# src/data/my_dataset.py
from src.registry import data_loader_registry
from src.data.base import BaseDataLoader

@data_loader_registry.register("my-dataset")
class MyDataLoader(BaseDataLoader):
    def __init__(self, data_root):
        self.data_root = data_root

    def load_metadata(self):
        # Return DataFrame with: filepath, label, lat, lon
        ...

    def get_sample_paths(self):
        # Return list of Path objects
        ...

    def validate(self):
        # Return True if dataset is valid
        ...

New Transform

# src/transforms/my_transform.py
from src.registry import transform_registry
from src.transforms.base import BaseTransform

@transform_registry.register("my-transform")
class MyTransform(BaseTransform):
    def __init__(self, img_size=224):
        self.img_size = img_size

    def __call__(self, signal, sample_rate):
        # Return tensor [3, H, W]
        ...

    @property
    def output_channels(self): return 3

    @property
    def output_size(self): return (self.img_size, self.img_size)

After adding a plugin, import it in the corresponding __init__.py to trigger registration.

Available Plugins

Models

Key Description
audiomae++ AudioMAE++ with Macaron blocks, SwiGLU, RoPE
baseline Standard ViT-MAE for comparison

Data Loaders

Key Description
esc50 ESC-50 environmental sounds (2000 clips, 50 classes)
custom Generic audio dataset loader
rf RF/IQ signal dataset loader

Transforms

Key Description
audio_spectrogram Audio to mel spectrogram (3-channel RGB)
audio_spectrogram_raw Audio to mel spectrogram (1-channel)
iq_spectrogram IQ signal to spectrogram
iq_constellation IQ signal to constellation diagram

Configuration

Key configuration options in src/config.py:

from src.config import Config

config = Config(
    # Audio processing
    sample_rate=22050,
    n_mels=128,
    audio_duration=5,

    # Model architecture
    img_size=224,
    patch_size=16,
    embed_dim=768,
    encoder_depth=12,
    decoder_depth=8,

    # Training
    mask_ratio=0.75,
    use_contrastive_loss=True,

    # Architecture variants
    use_macaron=True,    # Macaron-style blocks
    use_swiglu=True,     # SwiGLU activation
    use_rope=True,       # Rotary position embeddings
)

FiftyOne Visualization

After generating embeddings, visualize with FiftyOne:

import fiftyone as fo

# Load dataset
dataset = fo.load_dataset("esc50_audiomae")

# Launch app
session = fo.launch_app(dataset)

# Similarity search
similar = dataset.sort_by_similarity(sample_id, k=10)

License

MIT

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors