LinkSeg: Using Pairwise Link Prediction and Graph Attention Networks for Music Structure Analysis

Morgan Buisson¹, Brian McFee^2,3, Slim Essid¹
¹ LTCI, Télécom Paris, Institut Polytechnique de Paris, France
² Music and Audio Research Laboratory, New York University, USA
³ Center for Data Science, New York University, USA

This repository contains the official PyTorch implementation of the paper Using Pairwise Link Prediction and Graph Attention Networks for Music Structure Analysis presented at ISMIR 2024.

We introduce LinkSeg, a novel approach to music structure analysis based on pairwise link prediction. This method consists in predicting whether any pair of time instants within a track belongs to the same structural element (segment or section). This problem comes down to classifying each individual component (i.e. link) of the track's self-similarity matrix into one of the three categories: "same-segment", "same-section" or "different section". The link features calculated from this task are then combined with frame-wise features through a graph attention module to predict segment boundaries and musical section labels.

This project is focused on the segmentation of popular music genres, therefore, predicted section labels follow a 7-class taxonomy containing: "Intro", "Verse", "Chorus", "Bridge", "Instrumental", "Outro" and "Silence".

This repository provides code for training the system from scratch along with some checkpoints for predicting the structure of new tracks.

Requirements

Install FFmpeg for ubuntu:

sudo apt install ffmpeg

For macOS:

brew install ffmpeg

Create new environment and install dependencies:

conda create -n YOUR_ENV_NAME python=3.9
conda activate YOUR_ENV_NAME
pip install git+https://github.com/CPJKU/madmom # install madmom from github
pip install -r requirements.txt
cd src/

Dataset

The dataset structure follows that of the MSAF package:

dataset/
├── audio                   # audio files (.mp3, .wav, .aiff)
├── audio_npy               # audio files (.npy)
├── features                # feature files (.json)
└── references              # references files (.jams) (if available, necessary for training)

To preprocess some dataset, run:

python preprocess_data.py --data_path <dataset_path>

This will handle the creation of dataset/audio_npy/ (conversion to numpy files) and dataset/features/ files (beat estimation).

Training

To train a new LinkSeg model, run:

python train.py --data_path <dataset_path>

The default label taxonomy contains 7 section labels: "Intro", "Verse", "Chorus", "Bridge", "Instrumental", "Outro" and "Silence". A second taxonomy containing "Pre-chorus" and "Post-chorus" labels can be used:

python train.py --data_path <dataset_path> --nb_section_labels 9

Inference

To make predictions using a trained model, first make sure that the test dataset is processed:

python preprocess_data.py --data_path <test_dataset_path>

Then run:

python predict.py --test_data_path <test_dataset_path> --model_name <path_to_model>

To use the 7-class pre-trained model (on a 75% split of Harmonix), run:

python predict.py --test_data_path <dataset_path> --model_name ../data/model_7_classes.pt

To use the 9-class pre-trained model, run:

python predict.py --test_data_path <dataset_path> --model_name ../data/model_9_classes.pt

By default, segmentation predictions will be saved in JAMS format under the dataset/predictions/ directory.

Keep in mind that boundary predictions are calculated from the features of two consecutive time frames $x\prime \prime_{i}$, $x\prime \prime_{i+1}$ and the features $e\prime_{i,i+1}$ of the link connecting them. Therefore, boundary predictions fall between consecutive estimated beat locations.

Segmentation Example

Segmentation results for the track M83 - Midnight City from the Harmonix dataset. The top two rows display the class activation and boundary curves over time. The bottom rows show the final predictions and the reference annotations. Black and red dotted lines indicate predicted and annotated segment boundaries.

The self-similarity matrix of the track M83 - Midnight City obtained from the output embeddings of the graph attention module. Red dashed lines indicate annotated (left) and predicted (right) segment boundaries.

Citing

@inproceedings{buisson:hal-04665063,
  title={Using Pairwise Link Prediction and Graph Attention Networks for Music Structure Analysis},
  author={Buisson, Morgan and McFee, Brian and Essid, Slim},
  booktitle={Proceedings of the 25th International Society for Music Information Retrieval Conference (ISMIR)},
  year={2024}
}

Contact

[email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 99 Commits
data		data
src		src
README.md		README.md
linkseg.png		linkseg.png
requirements.txt		requirements.txt
segmentation_example_midnight_city.png		segmentation_example_midnight_city.png
ssm_example_midnight_city.png		ssm_example_midnight_city.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LinkSeg: Using Pairwise Link Prediction and Graph Attention Networks for Music Structure Analysis

Table of Contents

Requirements

Dataset

Training

Inference

Segmentation Example

Citing

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Languages

morgan76/LinkSeg

Folders and files

Latest commit

History

Repository files navigation

LinkSeg: Using Pairwise Link Prediction and Graph Attention Networks for Music Structure Analysis

Table of Contents

Requirements

Dataset

Training

Inference

Segmentation Example

Citing

Contact

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages