Thanks to visit codestin.com
Credit goes to github.com

Skip to content

PyTorch implementation of the paper Using Pairwise Link Prediction and Graph Attention Networks for Music Structure Analysis presented at ISMIR 2024.

Notifications You must be signed in to change notification settings

morgan76/LinkSeg

Repository files navigation

LinkSeg: Using Pairwise Link Prediction and Graph Attention Networks for Music Structure Analysis

Morgan Buisson1, Brian McFee2,3, Slim Essid1
1 LTCI, Télécom Paris, Institut Polytechnique de Paris, France
2 Music and Audio Research Laboratory, New York University, USA
3 Center for Data Science, New York University, USA

This repository contains the official PyTorch implementation of the paper Using Pairwise Link Prediction and Graph Attention Networks for Music Structure Analysis presented at ISMIR 2024.

We introduce LinkSeg, a novel approach to music structure analysis based on pairwise link prediction. This method consists in predicting whether any pair of time instants within a track belongs to the same structural element (segment or section). This problem comes down to classifying each individual component (i.e. link) of the track's self-similarity matrix into one of the three categories: "same-segment", "same-section" or "different section". The link features calculated from this task are then combined with frame-wise features through a graph attention module to predict segment boundaries and musical section labels.

This project is focused on the segmentation of popular music genres, therefore, predicted section labels follow a 7-class taxonomy containing: "Intro", "Verse", "Chorus", "Bridge", "Instrumental", "Outro" and "Silence".

This repository provides code for training the system from scratch along with some checkpoints for predicting the structure of new tracks.

Table of Contents

  1. Requirements
  2. Dataset
  3. Training
  4. Inference
  5. Citing
  6. Contact

Requirements

Install FFmpeg for ubuntu:

sudo apt install ffmpeg

For macOS:

brew install ffmpeg

Create new environment and install dependencies:

conda create -n YOUR_ENV_NAME python=3.9
conda activate YOUR_ENV_NAME
pip install git+https://github.com/CPJKU/madmom # install madmom from github
pip install -r requirements.txt
cd src/

Dataset

The dataset structure follows that of the MSAF package:

dataset/
├── audio                   # audio files (.mp3, .wav, .aiff)
├── audio_npy               # audio files (.npy)
├── features                # feature files (.json)
└── references              # references files (.jams) (if available, necessary for training)

To preprocess some dataset, run:

python preprocess_data.py --data_path <dataset_path>

This will handle the creation of dataset/audio_npy/ (conversion to numpy files) and dataset/features/ files (beat estimation).

Training

To train a new LinkSeg model, run:

python train.py --data_path <dataset_path>

The default label taxonomy contains 7 section labels: "Intro", "Verse", "Chorus", "Bridge", "Instrumental", "Outro" and "Silence". A second taxonomy containing "Pre-chorus" and "Post-chorus" labels can be used:

python train.py --data_path <dataset_path> --nb_section_labels 9

Inference

To make predictions using a trained model, first make sure that the test dataset is processed:

python preprocess_data.py --data_path <test_dataset_path>

Then run:

python predict.py --test_data_path <test_dataset_path> --model_name <path_to_model>

To use the 7-class pre-trained model (on a 75% split of Harmonix), run:

python predict.py --test_data_path <dataset_path> --model_name ../data/model_7_classes.pt

To use the 9-class pre-trained model, run:

python predict.py --test_data_path <dataset_path> --model_name ../data/model_9_classes.pt

By default, segmentation predictions will be saved in JAMS format under the dataset/predictions/ directory.

Keep in mind that boundary predictions are calculated from the features of two consecutive time frames $x\prime \prime_{i}$, $x\prime \prime_{i+1}$ and the features $e\prime_{i,i+1}$ of the link connecting them. Therefore, boundary predictions fall between consecutive estimated beat locations.

Segmentation Example

Segmentation results for the track M83 - Midnight City from the Harmonix dataset. The top two rows display the class activation and boundary curves over time. The bottom rows show the final predictions and the reference annotations. Black and red dotted lines indicate predicted and annotated segment boundaries.

The self-similarity matrix of the track M83 - Midnight City obtained from the output embeddings of the graph attention module. Red dashed lines indicate annotated (left) and predicted (right) segment boundaries.

Citing

@inproceedings{buisson:hal-04665063,
  title={Using Pairwise Link Prediction and Graph Attention Networks for Music Structure Analysis},
  author={Buisson, Morgan and McFee, Brian and Essid, Slim},
  booktitle={Proceedings of the 25th International Society for Music Information Retrieval Conference (ISMIR)},
  year={2024}
}

Contact

[email protected]

About

PyTorch implementation of the paper Using Pairwise Link Prediction and Graph Attention Networks for Music Structure Analysis presented at ISMIR 2024.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages