Key-CNN is a simple CNN-based framework for estimating harmonic properties of music tracks.
First and foremost, Key-CNN is a key estimator. To determine the global key of an audio file, simply run the script
key -i my_audio.wavYou may specify other models and output formats (mirex, JAMS) via command line parameters.
E.g. to create JAMS as output format and a deepsquare model used in the SMC paper [1], please run
key -m deepsquare --jams -i my_audio.wavTo use one of the DeepSpec models from [1] (see also repo
directional_cnns), run
key -m deepspec --jams -i my_audio.wavor,
key -m deepspec_k24 --jams -i my_audio.wavif you want to use a higher capacity model (some k-values are supported).
deepsquare and shallowspec models may also be used.
For estimation using models trained for [2], you may run one of the following model specs:
key -m winterreise -i my_audio.wav
key -m winterreise_v -i my_audio.wav
key -m winterreise_s -i my_audio.wav
key -m winterreise_v_fold0 -i my_audio.wav
key -m winterreise_s_fold1 -i my_audio.wavFor more model names and split training split definitions, please see the models directory
in the GitHub repo
(just remove the .h5 from the file name to use as model name).
The groundtruth annotations for Winterreise models may be found
here.
For batch processing, you may want to run key like this:
find /your_audio_dir/ -name '*.wav' -print0 | xargs -0 key -d /output_dir/ -iThis will recursively search for all .wav files in /your_audio_dir/, analyze then
and write the results to individual files in /output_dir/. Because the model is only
loaded once, this method of processing is much faster than individual program starts.
Instead of estimating a global key, Key-CNN can also estimate local keys in the form of a keygram. This can be useful for identifying modulations. To create such a keygram, run
keygram -p my_audio.wavAs output, keygram will create a .png file. Additional options to select different models
and output formats are available.
You may use the --csv option to export local key estimates in a parseable format and the
--hop-length option to change temporal resolution.
The parameters --sharpen and --norm-frame let you post-process the image.
Clone this repo and run setup.py install using Python 3.6:
git clone https://github.com/hendriks73/key-cnn.git
cd key-cnn
python setup.py installYou may need to install TensorFlow using pip from the command line.
Source code and models can be licensed under the GNU AFFERO GENERAL PUBLIC LICENSE v3. For details, please see the LICENSE file.
If you use Key-CNN in your work, please consider citing it. ShallowSpec, DeepSpec, and DeepSquare models:
@inproceedings{SchreiberM19_CNNKeyTempo_SMC,
Title = {Musical Tempo and Key Estimation using Convolutional Neural Networks with Directional Filters},
Author = {Hendrik Schreiber and Meinard M{\"u}ller},
Booktitle = {Proceedings of the Sound and Music Computing Conference ({SMC})},
Pages = {47--54},
Year = {2019},
Address = {M{\'a}laga, Spain}
}All Winterreise [2] models and annotations:
@inproceedings{SchreiberWM20_HMMCNNLocalKey_ICASSP,
Title = {Local Key Estimation in Classical Music Recordings: A Cross-Version Study on {Schubert's} {Winterreise}},
Author = {Hendrik Schreiber, Christof Wei{\ss}, Meinard M{\"u}ller},
Booktitle = {Proceedings of the {IEEE} International Conference on Acoustics, Speech, and Signal Processing ({ICASSP})},
Year = {2020},
Address = {Barcelona, Spain}
}| [1] | Hendrik Schreiber, Meinard Müller, Musical Tempo and Key Estimation using Convolutional Neural Networks with Directional Filters Proceedings of the Sound and Music Computing Conference (SMC), Málaga, Spain, 2019. |
| [2] | Hendrik Schreiber, Christof Weiß, Meinard Müller, Local Key Estimation in Classical Music Recordings: A Cross-Version Study on Schubert's Winterreise. Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Barcelona, Spain, 2020. |