Thanks to visit codestin.com
Credit goes to github.com

Skip to content

The official PyTorch implementation of "FullSubNet+: Channel Attention FullSubNet with Complex Spectrograms for Speech Enhancement".

License

Notifications You must be signed in to change notification settings

kapwing/FullSubNet-plus

 
 

Repository files navigation

FullSubNet+

This Git repository for the official PyTorch implementation of "FullSubNet+: Channel Attention FullSubNet with Complex Spectrograms for Speech Enhancement", accepted by ICASSP 2022.

📜[Full Paper] ▶[Demo] 💿[Checkpoint]

Kapwing:

  1. First run make build to build the Docker image
  2. Then run docker images to list Docker images and get the ID of the Docker image just built
  3. Then run docker run --rm -it {IMAGE_ID} /bin/bash to run the image as a container and start a bash shell in the container
  4. Now you can try running the clean audio command from this shell. For example: conda run --no-capture-output -n speech_enhance python -m speech_enhance.tools.inference -C "./config/inference.toml" -M "./best_model.tar" -I "./input_files" -O "./output_files"

Requirements

  • Linux or macOS

  • python>=3.6

  • Anaconda or Miniconda

  • NVIDIA GPU + CUDA CuDNN (CPU can also be supported)

Environment && Installation

Install Anaconda or Miniconda, and then install conda and pip packages:

# Create conda environment
conda create --name speech_enhance python=3.6
conda activate speech_enhance

# Install conda packages
# Check python=3.8, cudatoolkit=10.2, pytorch=1.7.1, torchaudio=0.7
conda install pytorch torchvision torchaudio cudatoolkit=10.2 -c pytorch
conda install tensorboard joblib matplotlib

# Install pip packages
# Check librosa=0.8
pip install Cython
pip install librosa pesq pypesq pystoi tqdm toml colorful mir_eval torch_complex

# (Optional) If you want to load "mp3" format audio in your dataset
conda install -c conda-forge ffmpeg

Quick Usage

Clone the repository:

git clone https://github.com/hit-thusz-RookieCJ/FullSubNet-plus.git
cd FullSubNet-plus

Download the pre-trained checkpoint, and input commands:

source activate speech_enhance
python -m speech_enhance.tools.inference \
  -C config/inference.toml \
  -M $MODEL_DIR \
  -I $INPUT_DIR \
  -O $OUTPUT_DIR

Start Up

Clone

git clone https://github.com/hit-thusz-RookieCJ/FullSubNet-plus.git
cd FullSubNet-plus

Data preparation

Train data

Please prepare your data in the data dir as like:

  • data/DNS-Challenge/DNS-Challenge-interspeech2020-master/
  • data/DNS-Challenge/DNS-Challenge-master/

and set the train dir in the script run.sh.

Then:

source activate speech_enhance
bash run.sh 0   # peprare training list or meta file

Test data

Please prepare your test cases dir like: data/test_cases_<name>, and set the test dir in the script run.sh.

Training

First, you need to modify the various configurations in config/train.toml for training.

Then you can run training:

source activate speech_enhance
bash run.sh 1   

Inference

After training, you can enhance noisy speech. Before inference, you first need to modify the configuration in config/inference.toml.

You can also run inference:

source activate speech_enhance
bash run.sh 2

Or you can just use inference.sh:

source activate speech_enhance
bash inference.sh

Eval

Calculating bjective metrics (SI_SDR, STOI, WB_PESQ, NB_PESQ, etc.) :

bash metrics.sh

Obtain subjective scores (DNS_MOS):

python ./speech_enhance/tools/dns_mos.py --testset_dir $YOUR_TESTSET_DIR --score_file $YOUR_SAVE_DIR

Citation

If you find our work useful in your research, please consider citing:

@inproceedings{chen2022fullsubnet+,
  title={FullSubNet+: Channel Attention FullSubNet with Complex Spectrograms for Speech Enhancement},
  author={Chen, Jun and Wang, Zilin and Tuo, Deyi and Wu, Zhiyong and Kang, Shiyin and Meng, Helen},
  booktitle={ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  pages={7857--7861},
  year={2022},
  organization={IEEE}
}

About

The official PyTorch implementation of "FullSubNet+: Channel Attention FullSubNet with Complex Spectrograms for Speech Enhancement".

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 97.2%
  • Shell 2.0%
  • Other 0.8%