Support arbitrary language finetune for Whisper models.#5344
Support arbitrary language finetune for Whisper models.#5344sw005320 merged 9 commits intoespnet:masterfrom
Conversation
sw005320
left a comment
There was a problem hiding this comment.
LGTM.
Please also add some tests.
egs2/aishell/asr1/README.md
Outdated
|
|
||
| - ASR config: [conf/tuning/train_asr_whisper_medium_finetune.yaml](conf/tuning/train_asr_whisper_medium_finetune.yaml) | ||
| - #Params: 762.32 M | ||
| - Model link: |
There was a problem hiding this comment.
As we discussed, please upload a model.
simpleoier
left a comment
There was a problem hiding this comment.
Thanks!
I only have one concern about unseen lang_id used in whisper.
|
|
||
| ## Results | ||
|
|
||
| - ASR config: [conf/tuning/train_asr_whisper_medium_finetune.yaml](conf/tuning/train_asr_whisper_medium_finetune.yaml) |
There was a problem hiding this comment.
Is decode config needed here?
There was a problem hiding this comment.
Thanks, it should be included.
| fi | ||
|
|
||
| _opts="" | ||
| if [ "${token_type}" = "whisper_multilingual" ]; then |
There was a problem hiding this comment.
Would default lang=noinfo work here?
There was a problem hiding this comment.
I add a LANGUAGES_CODE_MAPPING to map the language codes of ESPnet with the language IDs of Whisper and make sure the input language code is supported by the Whiper models.
| else: | ||
| converter = OpenAIWhisperTokenIDConverter(model_type=bpemodel) | ||
| converter = OpenAIWhisperTokenIDConverter( | ||
| model_type=bpemodel, language=tokenizer_language |
There was a problem hiding this comment.
Can we specify any language id here or only the langs supported in Whisper model?
|
This pull request is now in conflict :( |
|
I have made several updates as discussed:
Currently, for unseen language IDs that the Whisper models do not support, we will raise a |
for more information, see https://pre-commit.ci
Codecov Report
@@ Coverage Diff @@
## master #5344 +/- ##
==========================================
- Coverage 76.11% 69.79% -6.33%
==========================================
Files 672 671 -1
Lines 59864 59793 -71
==========================================
- Hits 45567 41733 -3834
- Misses 14297 18060 +3763
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 113 files with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
| ${python} -m espnet2.bin.whisper_export_vocabulary \ | ||
| --whisper_model "${token_type}" \ | ||
| --output "${token_list}" | ||
| --output "${token_list}" ${_opts} |
There was a problem hiding this comment.
I think it would be useful if the script can exit when ${lang} is not recognized. Is this already satisfied? Or just by adding || exit 1 after this line. I'm not sure if the python scripts satisfies the condition of retuning non-zero status.
|
Thanks, @pengchengguo! |

asr.shscript, directly use--langas the language id to export the Whisper vocabulary.tokenizer_languagefor the preprocessor in config files, like