Add decode_options and hyp_cleaner in evaluate_whisper_inference#5272
Add decode_options and hyp_cleaner in evaluate_whisper_inference#5272sw005320 merged 7 commits intoespnet:masterfrom
Conversation
for more information, see https://pre-commit.ci
| # 3. Build data-iterator | ||
| info_list = [] | ||
| wavscp = open(data_path_and_name_and_type, "r", encoding="utf-8") | ||
| wavscp = open(key_file, "r", encoding="utf-8") |
There was a problem hiding this comment.
key_file can contain only a subset of utterances due to the use of multiple jobs. data_path_and_name_and_type can contain all the data.
| for i in $(seq "${_nj}"); do | ||
| cat "${logdir}/output.${i}/1best_recog/${f}" | ||
| done | LC_ALL=C sort -k1 >"${outdir}/${f}" | ||
| if [ -f "${logdir}/output.1/1best_recog/${f}" ]; then |
There was a problem hiding this comment.
Whisper outputs do not contain all the files.
|
Many thanks for the update! The examples are very good to show somewhere, do you consider some places for this? (A candidate might be tts/svs templates' readme as that is having a section for ASR evaluation specifically, but then it might not be general?) |
Codecov Report
@@ Coverage Diff @@
## master #5272 +/- ##
=======================================
Coverage 76.10% 76.10%
=======================================
Files 658 658
Lines 59156 59156
=======================================
Hits 45022 45022
Misses 14134 14134
Flags with carried forward coverage won't be shown. Click here to find out more. 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
|
Thanks, @pyf98! |
This PR adds
decode_optionsandhyp_cleanerinevaluate_whisper_inference. Thedecode_optionscan be used to control the decoding hyperparameters in Whisper model'stranscribemethod.Here is an example script to evaluate a test set using Whisper: