Tutorial: Machine Learning for Acoustics

Logo | created by ChatGPT

Tutorial: Machine Learning for Acoustics

By: Ryan A. McCarthy, Neil Zhang, Samuel Verburg, William F. Jenkins, and Peter Gerstoft

Acoustic data provide scientific and engineering insights in fields ranging from biology and communications to ocean and Earth science. We survey the recent advances and transformative potential of machine learning (ML), including deep learning, in the field of acoustics. ML is a broad family of techniques, often based on statistics, for automatically detecting and utilizing patterns in data. We have ML examples from ocean acoustics, room acoustics, and personalized spatial audio. For room acoustics, we take room impulse responses (RIR) generation as an example application. For personalized spatial audio, we take head-related transfer function (HRTF) upsampling as an example. This tutorial covers various machine learning approaches for acoustic applications including supervised, unsupervised, and deep learning. Although this notebook doesn't cover all the topics, it provides an initial look into applying machine learning for acoustics research.

Installation

Option 1

Environment Installation

To follow the examples, you will need to download Anaconda and Python. We have provided a brief outline on installing Anaconda in Installation Guide. Once Anaconda has been installed, we will need to create a new environment from the one provided in the .yml file. This can be done in the conda terminal as:

conda env create -f environment.yml

Option 2

Anaconda Command Line

or you can manually create an environment and install certain packages. As an example:

conda create -n audioenv
conda activate audioenv
conda install -c conda-forge librosa scikit-learn pandas jupyterlab seaborn matplotlib opendatasets pywavelets pyts optuna widgetsnbextension ipywidgets shap lime requests natsort pathlib pip

Additional Installation (for option 2)

Pytorch

In addition to the packages described in the .yml file, you can install PyTorch, Torchvision, and TorchAudio for GPU. This can be done by going to This Website to find the correct package to install.

PyRoomAcoustics

For a few of the notebooks, you will also need to install the pyroomacoustics package seen Here. This can be installed using the following line:

pip install pyroomacoustics
pip install python-sofa

Kaggle Datasets

A few of the datasets used in the notebooks require downloading data from the platform Kaggle. If you do not have an account, please register for an account. Once logged in go to your profile icon in the top right, select settings, and scroll down to API. Please select Create a new token and a file "kaggle.json" will download. Place this file within this downloaded directory that contains the Jupyter notebooks above. This API key will grant access to the open datasets package to download the data (seen in the Jupyter notebooks). Data downloaded through open datasets can be downloaded once into your directory and will not duplicate a download unless forced.

Chapters

The chapters for this repository are ordered as follows: 1) an introduction to signal processing for acoustics; 2) an initial look into feature extraction and selecting features for machine learning models; 3) unsupervised machine learning approaches; 4) supervised machine learning approaches; 5) deep learning models examples; 6) explainable AI and feature importance.

Chapter 1 - Signal Processing toward Machine Learning

1.1 Introduction to Signal Processing

A brief overview of signal processing and techniques useful for processing acoustic data.

Chapter 2 - Feature Extraction and Selection

2.1 Feature Extraction

Descriptions of features and an introduction to feature extraction approaches for machine learning.

2.2 Feature Selection

Feature selection aims to improve the complexity of models, reduce training time, or improve the performance of machine learning models. This notebook talks about how to perform feature selection through an example of picking out major vs minor chords.

Chapter 3 - Unsupervised Machine Learning

3.1 Unsupervised Approaches

Given a long time series, how can we quickly segment frames of a time series to find similarities in the acoustic sound.

3.2 Principal Component Analysis

Principal component analysis is discussed and demonstrated to construct new guitar sounds through the frequency domain.

3.3 Dictionary Learning

Dictionary Learning for the sparse representation of room impulse responses and bandwidth extension.

3.4 Autoencoder|Variational Autoencoder

Autoencoder and VAE for machine sound anomaly detection.

3.5 Dimensionality Reduction and Clustering

Reduce the dimensionality of audio data and cluster the results.

Chapter 4 - Supervised Machine Learning

4.1 Linear Regression

Linear regression for the use case of predicting the room reverberation time.

4.2 Decision Tree and Random Forest

Classify AudioMNIST dataset through decision trees and random forests to distinguish numbers from 0 to 9.

Chapter 5 - Deep Learning

5.1 Neural Network and Convolutional Neural Networks

Targeting audio classification problem, we introduce the classical logistic regression approach and basics of deep learning with a simple neural network and a convolutional neural network.

5.2 Generative Adversarial Networks (GAN)

Generative Adversarial Network for generating room impulse responses (RIRs).

5.3 Implicit Neural Representation

Implicit Neural Representation for representing personalized Head-Related Transfer Function (HRTF).

5.4 Forward Propagation Physics Informed Neural Network

Solve the wave equation using Physics Informed Neural Networks (forward problem).

5.5 Inverse Propagation Physics Informed Neural Network

Estimate the wave speed using Physics Informed Neural Networks (inverse problem).

Chapter 6 - Explainable AI

6.1 Unsupervised Models

Identify number of clusters using a variety of techniques and interpret the clustering of the feature space.

6.2 Supervised Models

Determine features importance for supervised models and evaluate performance to determine where pitfalls may occur.

6.3 Deep Learning

Visualize key distinctions in model activations for given inputs and evaluate how deep learning models interpret observe and interpret data.

Reference

PyTorch Tutorial

Sklearn Tutorial

Citation

To learn more, please find our paper in npj acoustics here, on arxiv here, or on Github here. To cite this work, please see the citation below:

McCarthy, R.A., Zhang, Y., Verburg, S.A. et al. Machine Learning in Acoustics: A Review and Open-source Repository. npj Acoust. 1, 18 (2025). https://doi.org/10.1038/s44384-025-00021-w

@Article{McCarthy2025,
author={McCarthy, Ryan A. and Zhang, You and Verburg, Samuel A. and Jenkins, William F. and Gerstoft, Peter},
title={Machine Learning in Acoustics: A Review and Open-source Repository},
journal={npj Acoustics},
year={2025},
month={Sep},
day={09},
volume={1},
number={1},
pages={18},
doi={10.1038/s44384-025-00021-w},
url={https://doi.org/10.1038/s44384-025-00021-w}
}

Name		Name	Last commit message	Last commit date
Latest commit History 183 Commits
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
3_5_data		3_5_data
PINNs_util		PINNs_util
shapvalues		shapvalues
.gitignore		.gitignore
1_1_Introduction_Signal_Processing.ipynb		1_1_Introduction_Signal_Processing.ipynb
2_1_FeatureExtraction.ipynb		2_1_FeatureExtraction.ipynb
2_2_FeatureSelection.ipynb		2_2_FeatureSelection.ipynb
3_1_Unsupervised_Learning--Long_Timeseries.ipynb		3_1_Unsupervised_Learning--Long_Timeseries.ipynb
3_2_PCA--Creating_Sound.ipynb		3_2_PCA--Creating_Sound.ipynb
3_3_Dictionary_Learning.ipynb		3_3_Dictionary_Learning.ipynb
3_4_AE_VAE--Anomalous_Sound_Detection.ipynb		3_4_AE_VAE--Anomalous_Sound_Detection.ipynb
3_5_DimReduction-Clustering.ipynb		3_5_DimReduction-Clustering.ipynb
4_1_Linear_Regression--Predict_Reverberation_time.ipynb		4_1_Linear_Regression--Predict_Reverberation_time.ipynb
4_2_DT_RF--Number_Identification.ipynb		4_2_DT_RF--Number_Identification.ipynb
5_1_LR_NN_CNN--Audio_Classification.ipynb		5_1_LR_NN_CNN--Audio_Classification.ipynb
5_2_Generative_Adversarial_Network--Room_Impulse_Response_Generation.ipynb		5_2_Generative_Adversarial_Network--Room_Impulse_Response_Generation.ipynb
5_3_Implicit_Neural_Representation--HRTF_Representation_Learning_and_Interpolation.ipynb		5_3_Implicit_Neural_Representation--HRTF_Representation_Learning_and_Interpolation.ipynb
5_4_PINNs_forward.ipynb		5_4_PINNs_forward.ipynb
5_5_PINNs_inverse.ipynb		5_5_PINNs_inverse.ipynb
6_1_ExplainableAI-Unsupervised.ipynb		6_1_ExplainableAI-Unsupervised.ipynb
6_2_ExplainableAI-Supervised.ipynb		6_2_ExplainableAI-Supervised.ipynb
6_3_ExplainableAI-DeepLearning.ipynb		6_3_ExplainableAI-DeepLearning.ipynb
Acoustics_ML.png		Acoustics_ML.png
ML_Acoustics_paper.pdf		ML_Acoustics_paper.pdf
Python_Installation_instructions.pdf		Python_Installation_instructions.pdf
README.md		README.md
Wavelet_decomp.jpeg		Wavelet_decomp.jpeg
cnnmodelexpai.pt		cnnmodelexpai.pt
discriminator.pt		discriminator.pt
environment.yml		environment.yml
generator.pt		generator.pt
hrtf_field_model.pth		hrtf_field_model.pth

RAMshades/AcousticsML

Folders and files

Latest commit

History

Repository files navigation

Tutorial: Machine Learning for Acoustics

By: Ryan A. McCarthy, Neil Zhang, Samuel Verburg, William F. Jenkins, and Peter Gerstoft

Installation

Option 1

Environment Installation

Option 2

Anaconda Command Line

Additional Installation (for option 2)

Pytorch

PyRoomAcoustics

Kaggle Datasets

Chapters

Chapter 1 - Signal Processing toward Machine Learning

1.1 Introduction to Signal Processing

Chapter 2 - Feature Extraction and Selection

2.1 Feature Extraction

2.2 Feature Selection

Chapter 3 - Unsupervised Machine Learning

3.1 Unsupervised Approaches

3.2 Principal Component Analysis

3.3 Dictionary Learning

3.4 Autoencoder|Variational Autoencoder

3.5 Dimensionality Reduction and Clustering

Chapter 4 - Supervised Machine Learning

4.1 Linear Regression

4.2 Decision Tree and Random Forest

Chapter 5 - Deep Learning

5.1 Neural Network and Convolutional Neural Networks

5.2 Generative Adversarial Networks (GAN)

5.3 Implicit Neural Representation

5.4 Forward Propagation Physics Informed Neural Network

5.5 Inverse Propagation Physics Informed Neural Network

Chapter 6 - Explainable AI

6.1 Unsupervised Models

6.2 Supervised Models

6.3 Deep Learning

Reference

Citation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Uh oh!

Languages

Packages