The implementation of Blind Estimator of Room Parameters (BERP) is Pytorch-based framework for predicting room acoustic and physical parameters all-in-one. The project is based on PyTorch Lightning and Hydra. This implementation includes the data preprocessing pipelines, model architectures, training and inference strategies, and experimental configurations.
# Install miniconda
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda3/miniconda.sh
bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3
rm -rf ~/miniconda3/miniconda.sh
# add conda to PATH
echo "export PATH=~/miniconda3/bin:$PATH" >> ~/.zshrc
source ~/.zshrc
# initialize conda
conda init zsh
# create conda environment
conda create -n acoustic-toolkit python=3.11.12
conda activate acoustic-toolkitFor better dependency management, we use pdm as the package manager and deprecate pip. You can install pdm with the following command
pip install pdm# clone project
git clone https://github.com/Alizeded/BERP
cd BERP
# create conda environment and install dependencies
pdm config venv.backend conda # choose the backend as conda, current we support torch==2.7.0
pdm sync -G cu126 -G toolbox -G logging -G integration # default cuda is 12.6, you can change it to your, e.g. -G cu121
# if you use the older cuda, use this to update pdm.lock
pdm lock -G cu118 -G toolbox -G logging -G integration && pdm sync -G cu118 -G toolbox -G logging -G integrationThe data is also avaliable, you can download from the cloud storage
https://jstorage.app.box.com/v/berp-datasetsThen, unzip the data and put it in the data directory.
Jupyter notebook data_preprocessing.ipynb and mix_real_record_preprocess.py in notebook folder and synthesize_rir_speech and synthesize_speech_noise in script folder detail the data preprocessing pipeline.
Train model with the default configurations in configs folder.
# train on single GPU (H100 as an example)
# for unified module
python src/train_jointRegressor.py trainer=gpu logger=wandb_jointRegressor callbacks=default_jointRegressor
# for occupancy module
python src/train_numEstimator.py trainer=gpu trainer.precision=bf16-mixed logger=wandb_numEstimator callbacks=default_numEstimator# train on one node with multiple GPUs (2 GPUs as an example)
# for unified module
python src/train_jointRegressor.py trainer=ddp logger=wandb_jointRegressor callbacks=default_jointRegressor
# for occupancy module
python src/train_numEstimator.py trainer=ddp trainer.precision=bf16-mixed logger=wandb_numEstimator callbacks=default_numEstimatorPlease refer to model, callback and logger folder and train.yaml in configs directory for more details.
# default inference with MFCC featurization
python src/inference_jointRegressor.py# default inference with MFCC featurization
python src/inference_numEstimator.pyMore details about the inference can be found in inference.yaml in configs directory.
# you can copy & paste the following cloud storage link to your browser
# unified module with four types of featurizers
https://jstorage.box.com/s/3164ikshkfml1apsb1diva4h3s7bhmww
# occupancy module with four types of featurizers
https://jstorage.box.com/s/x6ac1z6n982jftb6jsnrqqazxn3iscxAfter obtaining the weights, please check the eval.yaml or inference.yaml in the configs directory to put the weights in the correct path for the evaluation or inference.
PS2: We have checked the validity of the download link, there should be no problem with the download link. We are working on migrating the dataset to the hugginggface dataset hub. Please stay tuned.
This project is licensed under the GPL-3.0 License - see the LICENSE file for details.
This project obtained the great favours from Jianan Chen, our good friend. Thanks for his great help.
If you find this repository useful in your research, or if you want to refer to the methodology and code, please cite the following paper:
@misc{wang2024berp,
title={BERP: A Blind Estimator of Room Parameters for Single-Channel Noisy Speech Signals},
author={Lijun Wang and Yixian Lu and Ziyan Gao and Kai Li and Jianqiang Huang and Yuntao Kong and Shogo Okada},
year={2025},
eprint={2405.04476},
archivePrefix={arXiv},
primaryClass={eess.AS}
}