Acoustic data provide scientific and engineering insights in fields ranging from biology and communications to ocean and Earth science. We survey the recent advances and transformative potential of machine learning (ML), including deep learning, in the field of acoustics. ML is a broad family of techniques, often based on statistics, for automatically detecting and utilizing patterns in data. We have ML examples from ocean acoustics, room acoustics, and personalized spatial audio. For room acoustics, we take room impulse responses (RIR) generation as an example application. For personalized spatial audio, we take head-related transfer function (HRTF) upsampling as an example. This tutorial covers various machine learning approaches for acoustic applications including supervised, unsupervised, and deep learning. Although this notebook doesn't cover all the topics, it provides an initial look into applying machine learning for acoustics research.
To follow the examples, you will need to download Anaconda and Python. We have provided a brief outline on installing Anaconda in Installation Guide. Once Anaconda has been installed, we will need to create a new environment from the one provided in the .yml file. This can be done in the conda terminal as:
conda env create -f environment.ymlor you can manually create an environment and install certain packages. As an example:
conda create -n audioenv
conda activate audioenv
conda install -c conda-forge librosa scikit-learn pandas jupyterlab seaborn matplotlib opendatasets pywavelets pyts optuna widgetsnbextension ipywidgets shap lime requests natsort pathlib pipIn addition to the packages described in the .yml file, you can install PyTorch, Torchvision, and TorchAudio for GPU. This can be done by going to This Website to find the correct package to install.
For a few of the notebooks, you will also need to install the pyroomacoustics package seen Here. This can be installed using the following line:
pip install pyroomacoustics
pip install python-sofaA few of the datasets used in the notebooks require downloading data from the platform Kaggle. If you do not have an account, please register for an account. Once logged in go to your profile icon in the top right, select settings, and scroll down to API. Please select Create a new token and a file "kaggle.json" will download. Place this file within this downloaded directory that contains the Jupyter notebooks above. This API key will grant access to the open datasets package to download the data (seen in the Jupyter notebooks). Data downloaded through open datasets can be downloaded once into your directory and will not duplicate a download unless forced.
The chapters for this repository are ordered as follows: 1) an introduction to signal processing for acoustics; 2) an initial look into feature extraction and selecting features for machine learning models; 3) unsupervised machine learning approaches; 4) supervised machine learning approaches; 5) deep learning models examples; 6) explainable AI and feature importance.
A brief overview of signal processing and techniques useful for processing acoustic data.
Descriptions of features and an introduction to feature extraction approaches for machine learning.
Feature selection aims to improve the complexity of models, reduce training time, or improve the performance of machine learning models. This notebook talks about how to perform feature selection through an example of picking out major vs minor chords.
Given a long time series, how can we quickly segment frames of a time series to find similarities in the acoustic sound.
Principal component analysis is discussed and demonstrated to construct new guitar sounds through the frequency domain.
Dictionary Learning for the sparse representation of room impulse responses and bandwidth extension.
Autoencoder and VAE for machine sound anomaly detection.
Reduce the dimensionality of audio data and cluster the results.
Linear regression for the use case of predicting the room reverberation time.
Classify AudioMNIST dataset through decision trees and random forests to distinguish numbers from 0 to 9.
Targeting audio classification problem, we introduce the classical logistic regression approach and basics of deep learning with a simple neural network and a convolutional neural network.
Generative Adversarial Network for generating room impulse responses (RIRs).
Implicit Neural Representation for representing personalized Head-Related Transfer Function (HRTF).
Solve the wave equation using Physics Informed Neural Networks (forward problem).
Estimate the wave speed using Physics Informed Neural Networks (inverse problem).
Identify number of clusters using a variety of techniques and interpret the clustering of the feature space.
Determine features importance for supervised models and evaluate performance to determine where pitfalls may occur.
6.3 Deep Learning
Visualize key distinctions in model activations for given inputs and evaluate how deep learning models interpret observe and interpret data.
To learn more, please find our paper in npj acoustics here, on arxiv here, or on Github here. To cite this work, please see the citation below:
McCarthy, R.A., Zhang, Y., Verburg, S.A. et al. Machine Learning in Acoustics: A Review and Open-source Repository. npj Acoust. 1, 18 (2025). https://doi.org/10.1038/s44384-025-00021-w
@Article{McCarthy2025,
author={McCarthy, Ryan A. and Zhang, You and Verburg, Samuel A. and Jenkins, William F. and Gerstoft, Peter},
title={Machine Learning in Acoustics: A Review and Open-source Repository},
journal={npj Acoustics},
year={2025},
month={Sep},
day={09},
volume={1},
number={1},
pages={18},
doi={10.1038/s44384-025-00021-w},
url={https://doi.org/10.1038/s44384-025-00021-w}
}