[RecurrentNN × Regression × Regularized] based Mouth Opening Estimation via SSL
- Install PyTorch from official instructions: https://pytorch.org/get-started/locally/
- Install dependencies:
pip install -r requirements.txt-
Collect data using LipsSync. Directory structure:
2025-02-04_22-01-52/ audio.wav mouth_data.csv 2025-02-04_22-43-56/ audio.wav mouth_data.csv valid.txt- Prepare seen validation set (in-distribution speakers) and unseen validation set (out-of-distribution speakers)
- Add audio paths to
valid.txt - For SSL: Prepare unlabeled vocal-only audio (intact spectrum below 16kHz)
-
Run preprocessing:
# Labeled data python recipes/mouth_opening/preprocess.py <SOURCE_DIR> <TARGET_DIR> # Unlabeled data (SSL) python recipes/mouth_opening/preprocess_unlabel.py <SOURCE_DIR> <TARGET_DIR>
Run training:
python train.py --exp_name <EXP_NAME> --dataset <DATA_PATH> --gpu <GPU_ID>View all options with python train.py --help. Variants:
train_r_drop.py(R-Drop regularization)train_mse.py(MSE loss)
Command:
python train_ssl.py --exp_name <EXP_NAME> --dataset <DATA_PATH> --unlabel_dataset <UNLABEL_PATH> --gpu <GPU_ID>Prerequisites:
- Create
valid2.txtwith unseen validation paths --conv_dropoutmust be non-zero
- Use 10+ hours of seen data
- Prepare 50+ hours of unlabeled data
- Tested datasets: PopBuTFy from NeuralSVB, PopCS from DiffSinger, M4Singer, Jingju a Cappella Recordings Collection, tiny-singing-voice-database, OpenSinger, GTSinger
python eval.py --model <model_path> --wav <wav_path>-
Framework cloned from GeneralCurveEstimator
-
Training code adapted from vocal-remover
-
Early model reference: FCPE
-
SSL inspiration: SOFA
-
Core references:
R-Drop: Regularized Dropout for Neural Networks [CODE]
Temporal Ensembling for Semi-Supervised Learning [CODE]
Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results [CODE]
- Data collection tool: LipsSync
- Visualization tool: lips-sync-visualizer
- .ass mask tools: mask_fix_tools
- Data expansion initiative: DiffSinger Discussion