Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Add decode_options and hyp_cleaner in evaluate_whisper_inference#5272

Merged
sw005320 merged 7 commits intoespnet:masterfrom
pyf98:eval-whisper
Jul 21, 2023
Merged

Add decode_options and hyp_cleaner in evaluate_whisper_inference#5272
sw005320 merged 7 commits intoespnet:masterfrom
pyf98:eval-whisper

Conversation

@pyf98
Copy link
Collaborator

@pyf98 pyf98 commented Jul 3, 2023

This PR adds decode_options and hyp_cleaner in evaluate_whisper_inference. The decode_options can be used to control the decoding hyperparameters in Whisper model's transcribe method.

Here is an example script to evaluate a test set using Whisper:

#!/usr/bin/env bash
# Set bash to 'debug' mode, it will exit on :
# -e 'error', -u 'undefined variable', -o ... 'error in pipeline', -x 'print commands',
set -e
set -u
set -o pipefail

whisper_tag=medium
cleaner=whisper_en
hyp_cleaner=whisper_en
nj=1
test_sets="test/WSJ/test_eval92"
decode_options="{language: en, task: transcribe, temperature: 0, beam_size: 10, fp16: False}"

for x in ${test_sets}; do
    wavscp=dump/raw/${x}/wav.scp
    outdir=whisper-${whisper_tag}_outputs/${x}
    gt_text=dump/raw/${x}/text

    scripts/utils/evaluate_asr.sh \
        --whisper_tag ${whisper_tag} \
        --nj ${nj} \
        --gpu_inference true \
        --stage 2 \
        --stop_stage 3 \
        --cleaner ${cleaner} \
        --hyp_cleaner ${hyp_cleaner} \
        --decode_options "${decode_options}" \
        --gt_text ${gt_text} \
        ${wavscp} \
        ${outdir}
done

@mergify mergify bot added the ESPnet2 label Jul 3, 2023
# 3. Build data-iterator
info_list = []
wavscp = open(data_path_and_name_and_type, "r", encoding="utf-8")
wavscp = open(key_file, "r", encoding="utf-8")
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

key_file can contain only a subset of utterances due to the use of multiple jobs. data_path_and_name_and_type can contain all the data.

for i in $(seq "${_nj}"); do
cat "${logdir}/output.${i}/1best_recog/${f}"
done | LC_ALL=C sort -k1 >"${outdir}/${f}"
if [ -f "${logdir}/output.1/1best_recog/${f}" ]; then
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whisper outputs do not contain all the files.

@ftshijt
Copy link
Collaborator

ftshijt commented Jul 3, 2023

Many thanks for the update!

The examples are very good to show somewhere, do you consider some places for this? (A candidate might be tts/svs templates' readme as that is having a section for ASR evaluation specifically, but then it might not be general?)

@sw005320 sw005320 added Enhancement Enhancement ASR Automatic speech recogntion labels Jul 3, 2023
@sw005320 sw005320 added this to the v.202307 milestone Jul 3, 2023
@mergify mergify bot added the Documentation label Jul 3, 2023
@codecov
Copy link

codecov bot commented Jul 21, 2023

Codecov Report

Merging #5272 (2b42646) into master (f122c22) will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master    #5272   +/-   ##
=======================================
  Coverage   76.10%   76.10%           
=======================================
  Files         658      658           
  Lines       59156    59156           
=======================================
  Hits        45022    45022           
  Misses      14134    14134           
Flag Coverage Δ
test_integration_espnet1 65.96% <ø> (ø)
test_integration_espnet2 47.51% <ø> (-0.01%) ⬇️
test_python 66.49% <ø> (ø)
test_utils 23.17% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@sw005320 sw005320 merged commit a5ad6ff into espnet:master Jul 21, 2023
@sw005320
Copy link
Contributor

Thanks, @pyf98!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ASR Automatic speech recogntion Documentation Enhancement Enhancement ESPnet2

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants