[Paper] Vesper: A Compact and Effective Pretrained Model for Speech Emotion Recognition
Modify the variables of each dataset in the configs/dataset_config.py.
- Move your audio files to the
wavdirdirectory. - Create a meta_csv_file with columns
name(file names) andlabel(emotional labels) for each dataset. The pretraining datasets do not need thelabelcolumn.
Extracting WavLM features in advance can accelerate the pretraining speed greatly. Please use the extract_feature/WavLM/extract_wavlm.py file to extract the features of pretraining data in advance.
Specify training hyperparameters on the command line or modify them in the configs/train_config.py.
Please also specify path_to_wavlm on the command line or in the configs/model_config.py.
Please refer to the get_args function in the configs/__init__.py if you want to use the command line method.
python pretrain.py -M Vesper-4
python pretrain.py -M Vesper-12
python pretrain.py -M Vesper-12 -b 32 -g 0,1 -l 0.0005 --model_path_to_wavlm PATH_to_WavLM/WavLM-Large.ptSpecify fine-tuning hyperparameters on the command line or modify them in the configs/train_config.py.
Please also specify path_to_vesper on the command line or in the configs/model_config.py.
python finetune.py -M Vesper-12 -d iemocap
python finetune.py -M Vesper-12 -d iemocap -g 0 -b 32 -l 0.0007 --model_path_to_vesper PATH_to_EXP_DIRECTORY/checkpoint/model_best.pt