Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Multilingual Librispeech ASR2 + ASR1 baselines#5441

Merged
ftshijt merged 29 commits intoespnet:masterfrom
juice500ml:asr_mls_exp
Oct 23, 2023
Merged

Multilingual Librispeech ASR2 + ASR1 baselines#5441
ftshijt merged 29 commits intoespnet:masterfrom
juice500ml:asr_mls_exp

Conversation

@juice500ml
Copy link
Contributor

@juice500ml juice500ml commented Sep 21, 2023

What?

  • ASR2 recipe for the Multilingual Librispeech
  • Training from scratch, 10h split (10h per language, total 8 language)
  • ASR2, ASR1 Fbank baseline, ASR1 SSL baseline

Why?

  • MLS experiments for the ASR2 paper

See also

@mergify mergify bot added ESPnet2 CI Travis, Circle CI, etc labels Sep 21, 2023
@sw005320
Copy link
Contributor

Is it ready for review?
If so, please change the status from draft to ready for review.

@sw005320 sw005320 added Recipe ASR Automatic speech recogntion labels Sep 27, 2023
@sw005320 sw005320 added this to the v.202312 milestone Sep 27, 2023
@juice500ml
Copy link
Contributor Author

@sw005320 This PR (#5323) is a precursor of this PR, so it'd be better if we fix that first. Let me try to fix the previous PRs' tests first, will try to fix it this week πŸ˜„

Copy link
Collaborator

@ftshijt ftshijt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution! Some minor comments regarding configurations and formating

Comment on lines +62 to +64
# --src_bpe_train_text "data/${train_set}/text.${src_case}.${src_lang}" \
# --tgt_bpe_train_text "data/${train_set}/text.${tgt_case}.${tgt_lang}" \
# --lm_train_text "data/${train_set}/text.${tgt_case}.${tgt_lang} data/local/other_text/text" \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# --src_bpe_train_text "data/${train_set}/text.${src_case}.${src_lang}" \
# --tgt_bpe_train_text "data/${train_set}/text.${tgt_case}.${tgt_lang}" \
# --lm_train_text "data/${train_set}/text.${tgt_case}.${tgt_lang} data/local/other_text/text" \

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied in 91c2b06 Thanks πŸ‘

--tgt_nbpe $tgt_nbpe \
--src_case ${src_case} \
--tgt_case ${tgt_case} \
--speed_perturb_factors "" \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
--speed_perturb_factors "" \

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied in 91c2b06 Thanks πŸ‘

- Git hash: `a8bc43b1bfc9518da7dd8be4cad0ef346ef222fc`
- Commit date: `Sun Aug 20 16:17:23 2023 -0400`

## exp/asr_smallerbatch_wamrup10k_lr0.0001_e200_raw_wavlm_large_21_full_km1000_bpe_rm3000_bpe_ts150
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may consider put the model to huggingface as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uploaded to huggingface (5584524), thx for the tip!

@@ -0,0 +1,18 @@
# Default configuration
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please set it as default (our CI check would ask for that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied in 91c2b06 Thanks πŸ‘

nclusters=1000

src_lang=$(echo "${kmeans_feature}_full_km${nclusters}" | tr "/" "_")
tgt_lang=en
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the target language is not multilingual anymore, you may consider change it to multi or multilingual (the tag is mostly serving for token naming etc, so please feel free to change)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied in 91c2b06 Thanks πŸ‘

Comment on lines +32 to +33
src_nbpe=3000
tgt_nbpe=150
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we should also consider using different bpe depending on the size of the data. I feel the current setting would be good for 1h, 10h, or single languages, but definitely not good for all the data

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great point! I set the default as 10h in 91c2b06
I don't think we want to do the full dataset for now.


./asr2.sh \
--local_data_opts "--lang ${lang} --data_split ${data_split}" \
--portion 1.0 \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto, portion 1.0 is too large for the whole set

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar with the above, set the default to 10h!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it is a good idea to have a "comment" to explain the portion 1.0?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, added more comments in 5672114

Copy link
Collaborator

@simpleoier simpleoier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your efforts! I left some comments.

htmlcov
coverage.xml*
bats-core/
test_utils/bats-core/
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if we are supposed to touch this file in general PRs.

if [ ${stage} -le 4 ] && [ ${stop_stage} -ge 4 ] && ! [[ " ${skip_stages} " =~ [[:space:]]4[[:space:]] ]]; then
log "Stage 4a: Perform Kmeans using ${kmeans_feature_type} features"

if [ ${ngpu} -gt 0 ]; then
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I don't want to mix ngpu (usually used in training LM / ASR models) with use_gpu here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added gpu_kmeans parameter (following the gpu_inference style variable naming), and set it into default in 91c2b06 Thanks for pointing out!


export train_cmd="slurm.pl"
export cuda_cmd="slurm.pl"
export decode_cmd="slurm.pl --num_threads 4 --mem 2000M"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may clean your customized settings from this file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied in 91c2b06 Thanks πŸ‘

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about moving this file into scripts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied in 29d7cb8, thx!
(Related discussion: #5323 (comment))

@Emrys365 Emrys365 marked this pull request as ready for review October 12, 2023 14:39
@codecov
Copy link

codecov bot commented Oct 16, 2023

Codecov Report

Merging #5441 (63940e1) into master (01037ca) will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master    #5441   +/-   ##
=======================================
  Coverage   75.37%   75.37%           
=======================================
  Files         709      709           
  Lines       65291    65291           
=======================================
  Hits        49212    49212           
  Misses      16079    16079           
Flag Coverage Ξ”
test_configuration_espnet2 βˆ… <ΓΈ> (βˆ…)
test_integration_espnet1 65.67% <ΓΈ> (ΓΈ)
test_integration_espnet2 48.71% <ΓΈ> (ΓΈ)
test_python_espnet1 19.16% <ΓΈ> (ΓΈ)
test_python_espnet2 51.40% <ΓΈ> (ΓΈ)
test_utils 23.10% <ΓΈ> (ΓΈ)

Flags with carried forward coverage won't be shown. Click here to find out more.

πŸ“£ We’re building smart automated test selection to slash your CI/CD build times. Learn more

@juice500ml
Copy link
Contributor Author

Just found out that fr evaluation was killed during mid-eval... Need to run eval again and update the README.md. I'll come back to this after eval is done πŸ₯²

@juice500ml juice500ml changed the title [WIP] Multilingual Librispeech ASR2 + ASR1 baselines Multilingual Librispeech ASR2 + ASR1 baselines Oct 21, 2023
@juice500ml
Copy link
Contributor Author

juice500ml commented Oct 21, 2023

I tried to apply all the great reviews to the PR, and it seems that now it is close to merging! Evaluation + model upload to huggingface is also done.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I observed degradations for some languages from fbank.
Can you summarize them here?
Will you plan to tune it more?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CER comparison

Model ASR1 FBANK ASR1 SSL ASR2
EN 22.1 15.3 12.4
ES 7.0 7.0 7.9
DE 10.3 10.4 11.9
FR 13.5 13.6 17.0
IT 7.1 6.9 7.9
NL 11.4 10.8 14.6
PL 8.2 7.2 11.3
PT 12.7 11.8 13.2

Currently, all the experiments aren't using the language model (asr1 ssl, asr1 fbank, asr2), so it can be somewhat worse compared to other reported numbers on using MLS 10h (10h per each language, total 8 languages).
Also, for asr1, I tried tuning the learning rate (1e-3, 1e-4, 1e-5) and chose the best among three. In comparison, for asr2, I tuned learning rate, batch size, kmeans k, source bpe, and target bpe. I think, to increase the asr2 performance, we need some modifications on the algorithm.
Interestingly, even though asr2 is generally a bit worse, it shows much better performance on English while showing limited performance on French. I'm suspecting that underlying ssl model is English-friendly so asr1 ssl and asr2 is somewhat better than asr1 fbank, but I'm not so sure.
For now, I wasn't thinking about tuning it more (because ES performance is similar to the original ES-only model's performance with CER=7.1), but if you think it's necessary, I'm open to more tuning.

@sw005320
Copy link
Contributor

I think this PR is almost there.

@simpleoier simpleoier mentioned this pull request Oct 22, 2023
4 tasks
@sw005320
Copy link
Contributor

@simpleoier and/or @ftshijt, if it is OK for you, please merge this PR.

@ftshijt
Copy link
Collaborator

ftshijt commented Oct 23, 2023

LGTM! Please also consider:

  • uploading the pre-trained models
  • adding additional results for the full set

@ftshijt ftshijt merged commit d95b221 into espnet:master Oct 23, 2023
@juice500ml juice500ml deleted the asr_mls_exp branch October 23, 2023 16:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ASR Automatic speech recogntion CI Travis, Circle CI, etc ESPnet2 Recipe

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants