Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Filling MIDI velocities from given MIDI notes. A UNet trained on expert performances from the Piano-e-Competition.

License

Notifications You must be signed in to change notification settings

zhanh-he/midi-velocity-colorizer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

40 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

midi-velocity-colorizer

PyTorch implementation for filling MIDI velocities from given MIDI notes. The model is an U-Net image colorizor & trained on expert performances from the Piano-e-Competition (MAESTRO dataset). It can work on all instrumental MIDI, but best expressiveness on piano (will train with other instruments in future).

Diagram of U-Net Architecture

This repo provides supplementary materials for our paper: "Filling MIDI Velocity using U-Net Image Colorizer" in CMMR2025.

📝 Additional Materials

  • WandB workspace (public available, but need login wandB first)
  • WandB report
    • This wandb report includes quantitative results, refer to Tables 2 & 3 in the paper.
    • This wandb report includes hyperparameter search not detailed in the paper (to save spaces).

📁 Code Contents

  • train.py — model training entry point
  • evaluation.ipynb — reproduce Tables 2 & 3 quantitative results in our paper
  • interface.ipynb — colorize & visualize the midi; reproduce Figures 1–4 in our paper
  • results/ - directory for model, outputs and audio demos
    • checkpoints.zip download and extract here to interface with our trained model
    • subjective_test_audio.rar listening test samples generated from our colorized MIDI using PianoTeq 8 (see Section 5.2 of the paper)
  • compare/ — Flat model and Kim2023's Seq2Seq model.

0. Hyperparameter Setup

All training settings are defined in conf/config.yaml.

  • Some data filtering operations were implemented but not used in our experiments.
  • To reproduce our results, please refer to the training logs and parameter tracking in our WandB workspace.

1. Dataset Setup

Please download the following datasets and place them under the /dataset folder:

2. Environment Setup

Tested on:

  • Ubuntu 20.04 (CUDA 12.0)
  • Ubuntu 22.04 (CUDA 12.2)

Using Conda & Pre-built Env (recommended):

conda env create -f environment.yaml

Manual Build Your Env:

conda create --name velocity_pred python=3.11
conda install pytorch=2.2.2 torchvision=0.17.2 torchaudio=2.2.2 pytorch-cuda=12.1 -c pytorch -c nvidia
conda install lightning -c conda-forge

3. WandB Integration (optional)

We use Weights & Biases for experiment tracking.

wandb login

If preferred, you can switch to TensorBoard by modifying train.py.

4. Train the Model (optional)

Edit training options in conf/config.yaml. Then run:

# Re-implemented ConvAE baseline
python train.py exp.test_dataset="MAESTRO" matrix.seg_time=10 ae.model="ConvAE" loss.type="BCELoss" loss.mask='element_wise' loss.weight='u_shape' loss.cosim=0.2 exp.save_k_ckpt=10

# Proposed U-Net with 2x2 attention window
python train.py exp.test_dataset="MAESTRO" matrix.seg_time=10 ae.model="UNet" ae.ablation="attn" ae.attn_window=2 loss.type="BCELoss" loss.mask='element_wise' loss.weight='u_shape' loss.cosim=0.2 exp.save_k_ckpt=10

5. Evaluate the Model (Tables 2 & 3)

Use evaluation.ipynb to reproduce our results. You can skip the training via download & unzip our pretrain model checkpoints.zip to results/checkpoints folder.

For other models' results in Tables 2&3:

  • Flat model: compare/Flat_model/flat_evaluation.ipynb
  • Kim2023 Seq2Seq model: compare/Kim2023_model/seq2seq_evalaution.ipynb

6. Interface & Demo (Figure 4)

Use interface.ipynb to fill the MIDI (without velocity), you will visualise & obtain the MIDI (w. velocity filled) accordingly. You can skip the training, directly use our pretrain model checkpoints.zip as Step 5. Following are MIDI example from [Human] and [ConvAE, UNet]'s results, find more example in our interface.ipynb, and audio demos in subjective_test_audio.rar.

Example of model results

License

Copyright 2025 Dolby. The code and checkpoints are released for academic and non-commercial use only. Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0)

About

Filling MIDI velocities from given MIDI notes. A UNet trained on expert performances from the Piano-e-Competition.

Topics

Resources

License

Stars

Watchers

Forks