Multilingual Librispeech ASR2 + ASR1 baselines#5441
Conversation
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
β¦to asr_mls_exp
for more information, see https://pre-commit.ci
β¦to asr_mls_exp
for more information, see https://pre-commit.ci
β¦to asr_mls_exp
|
Is it ready for review? |
ftshijt
left a comment
There was a problem hiding this comment.
Thanks for the contribution! Some minor comments regarding configurations and formating
egs2/mls/asr2/run.sh
Outdated
| # --src_bpe_train_text "data/${train_set}/text.${src_case}.${src_lang}" \ | ||
| # --tgt_bpe_train_text "data/${train_set}/text.${tgt_case}.${tgt_lang}" \ | ||
| # --lm_train_text "data/${train_set}/text.${tgt_case}.${tgt_lang} data/local/other_text/text" \ |
There was a problem hiding this comment.
| # --src_bpe_train_text "data/${train_set}/text.${src_case}.${src_lang}" \ | |
| # --tgt_bpe_train_text "data/${train_set}/text.${tgt_case}.${tgt_lang}" \ | |
| # --lm_train_text "data/${train_set}/text.${tgt_case}.${tgt_lang} data/local/other_text/text" \ |
egs2/mls/asr2/run.sh
Outdated
| --tgt_nbpe $tgt_nbpe \ | ||
| --src_case ${src_case} \ | ||
| --tgt_case ${tgt_case} \ | ||
| --speed_perturb_factors "" \ |
There was a problem hiding this comment.
| --speed_perturb_factors "" \ |
egs2/mls/asr2/RESULTS.md
Outdated
| - Git hash: `a8bc43b1bfc9518da7dd8be4cad0ef346ef222fc` | ||
| - Commit date: `Sun Aug 20 16:17:23 2023 -0400` | ||
|
|
||
| ## exp/asr_smallerbatch_wamrup10k_lr0.0001_e200_raw_wavlm_large_21_full_km1000_bpe_rm3000_bpe_ts150 |
There was a problem hiding this comment.
You may consider put the model to huggingface as well
There was a problem hiding this comment.
Uploaded to huggingface (5584524), thx for the tip!
| @@ -0,0 +1,18 @@ | |||
| # Default configuration | |||
There was a problem hiding this comment.
please set it as default (our CI check would ask for that
egs2/mls/asr2/run.sh
Outdated
| nclusters=1000 | ||
|
|
||
| src_lang=$(echo "${kmeans_feature}_full_km${nclusters}" | tr "/" "_") | ||
| tgt_lang=en |
There was a problem hiding this comment.
Since the target language is not multilingual anymore, you may consider change it to multi or multilingual (the tag is mostly serving for token naming etc, so please feel free to change)
| src_nbpe=3000 | ||
| tgt_nbpe=150 |
There was a problem hiding this comment.
maybe we should also consider using different bpe depending on the size of the data. I feel the current setting would be good for 1h, 10h, or single languages, but definitely not good for all the data
There was a problem hiding this comment.
Great point! I set the default as 10h in 91c2b06
I don't think we want to do the full dataset for now.
egs2/mls/asr2/run.sh
Outdated
|
|
||
| ./asr2.sh \ | ||
| --local_data_opts "--lang ${lang} --data_split ${data_split}" \ | ||
| --portion 1.0 \ |
There was a problem hiding this comment.
ditto, portion 1.0 is too large for the whole set
There was a problem hiding this comment.
Similar with the above, set the default to 10h!
There was a problem hiding this comment.
Do you think it is a good idea to have a "comment" to explain the portion 1.0?
There was a problem hiding this comment.
Good idea, added more comments in 5672114
simpleoier
left a comment
There was a problem hiding this comment.
Thanks for your efforts! I left some comments.
| htmlcov | ||
| coverage.xml* | ||
| bats-core/ | ||
| test_utils/bats-core/ |
There was a problem hiding this comment.
Not sure if we are supposed to touch this file in general PRs.
egs2/TEMPLATE/asr2/asr2.sh
Outdated
| if [ ${stage} -le 4 ] && [ ${stop_stage} -ge 4 ] && ! [[ " ${skip_stages} " =~ [[:space:]]4[[:space:]] ]]; then | ||
| log "Stage 4a: Perform Kmeans using ${kmeans_feature_type} features" | ||
|
|
||
| if [ ${ngpu} -gt 0 ]; then |
There was a problem hiding this comment.
Sorry, I don't want to mix ngpu (usually used in training LM / ASR models) with use_gpu here.
There was a problem hiding this comment.
Added gpu_kmeans parameter (following the gpu_inference style variable naming), and set it into default in 91c2b06 Thanks for pointing out!
egs2/mls/asr2/cmd.sh
Outdated
|
|
||
| export train_cmd="slurm.pl" | ||
| export cuda_cmd="slurm.pl" | ||
| export decode_cmd="slurm.pl --num_threads 4 --mem 2000M" |
There was a problem hiding this comment.
You may clean your customized settings from this file.
There was a problem hiding this comment.
how about moving this file into scripts?
There was a problem hiding this comment.
Applied in 29d7cb8, thx!
(Related discussion: #5323 (comment))
Co-authored-by: Jiatong <[email protected]>
Codecov Report
@@ Coverage Diff @@
## master #5441 +/- ##
=======================================
Coverage 75.37% 75.37%
=======================================
Files 709 709
Lines 65291 65291
=======================================
Hits 49212 49212
Misses 16079 16079
Flags with carried forward coverage won't be shown. Click here to find out more. π£ Weβre building smart automated test selection to slash your CI/CD build times. Learn more |
for more information, see https://pre-commit.ci
|
Just found out that |
|
I tried to apply all the great reviews to the PR, and it seems that now it is close to merging! Evaluation + model upload to huggingface is also done. |
There was a problem hiding this comment.
I observed degradations for some languages from fbank.
Can you summarize them here?
Will you plan to tune it more?
There was a problem hiding this comment.
CER comparison
| Model | ASR1 FBANK | ASR1 SSL | ASR2 |
|---|---|---|---|
| EN | 22.1 | 15.3 | 12.4 |
| ES | 7.0 | 7.0 | 7.9 |
| DE | 10.3 | 10.4 | 11.9 |
| FR | 13.5 | 13.6 | 17.0 |
| IT | 7.1 | 6.9 | 7.9 |
| NL | 11.4 | 10.8 | 14.6 |
| PL | 8.2 | 7.2 | 11.3 |
| PT | 12.7 | 11.8 | 13.2 |
Currently, all the experiments aren't using the language model (asr1 ssl, asr1 fbank, asr2), so it can be somewhat worse compared to other reported numbers on using MLS 10h (10h per each language, total 8 languages).
Also, for asr1, I tried tuning the learning rate (1e-3, 1e-4, 1e-5) and chose the best among three. In comparison, for asr2, I tuned learning rate, batch size, kmeans k, source bpe, and target bpe. I think, to increase the asr2 performance, we need some modifications on the algorithm.
Interestingly, even though asr2 is generally a bit worse, it shows much better performance on English while showing limited performance on French. I'm suspecting that underlying ssl model is English-friendly so asr1 ssl and asr2 is somewhat better than asr1 fbank, but I'm not so sure.
For now, I wasn't thinking about tuning it more (because ES performance is similar to the original ES-only model's performance with CER=7.1), but if you think it's necessary, I'm open to more tuning.
|
I think this PR is almost there. |
|
@simpleoier and/or @ftshijt, if it is OK for you, please merge this PR. |
|
LGTM! Please also consider:
|
What?
Why?
See also