Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Modify asr.sh#5020

Merged
mergify[bot] merged 1 commit intoespnet:masterfrom
kamo-naoyuki:refactor
Mar 17, 2023
Merged

Modify asr.sh#5020
mergify[bot] merged 1 commit intoespnet:masterfrom
kamo-naoyuki:refactor

Conversation

@kamo-naoyuki
Copy link
Collaborator

@kamo-naoyuki kamo-naoyuki commented Mar 16, 2023

This PR includes my self-answer to #4944.

  • I changed asr.sh to raise an error if --train_set equal to --valid_set, or --train_set is also included in --test_sets.
  • If --valid_set is included in --test_sets, --eval_valid_set option is enabled.
  • if --eval_valid_set is enabled, dump/org/${valid_set} is evaluated at the decoding stage instead of dump/${valid_set}

Someone might still think we should filter the long/short utterances in the training python process, but finally, I concluded it's a bad idea. Please let me go in this direction.

I also changed the behavior of the --skip_train option:

  • Before this PR: The data preparation for ${train_set} and ${valid_set} still works. This is inconvinient if using a pre-trained model and evaluation is only required.
  • After this PR: The data preparation for ${train_set} and ${valid_set} can be also skipped.

Finally, I added a new feat-type: --feats_type raw_copy. This is almost the same as --feats_type raw_copy, but it can skip format_wav_scp.py.

This type is useful if the file format is already correct.γ€€Sometimes, we create a new evaluation set from an existing dataset, e.g. applying a speech enhancement method. In this case. the data set may be already correct, so we might want to skip
format_wav_scp.py.

Please note that if an user specifies --feats_type raw_copy, the user is responsible to guaranteed that the data format follows correctly our requirements.

@mergify mergify bot added ESPnet2 CI Travis, Circle CI, etc labels Mar 16, 2023
@codecov
Copy link

codecov bot commented Mar 16, 2023

Codecov Report

Merging #5020 (09ce640) into master (7964a2a) will increase coverage by 3.55%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #5020      +/-   ##
==========================================
+ Coverage   73.47%   77.02%   +3.55%     
==========================================
  Files         606      606              
  Lines       53748    53748              
==========================================
+ Hits        39491    41400    +1909     
+ Misses      14257    12348    -1909     
Flag Coverage Ξ”
test_integration_espnet1 66.29% <ΓΈ> (+<0.01%) ⬆️
test_integration_espnet2 47.96% <ΓΈ> (?)
test_python 66.85% <ΓΈ> (+0.02%) ⬆️
test_utils 23.28% <ΓΈ> (ΓΈ)

Flags with carried forward coverage won't be shown. Click here to find out more.

see 61 files with indirect coverage changes

πŸ“£ We’re building smart automated test selection to slash your CI/CD build times. Learn more

@kamo-naoyuki kamo-naoyuki force-pushed the refactor branch 4 times, most recently from 0a92365 to 9830e79 Compare March 17, 2023 02:28
@kamo-naoyuki kamo-naoyuki added auto-merge Enable auto-merge ASR Automatic speech recogntion Refactoring Refactoring labels Mar 17, 2023
@kamo-naoyuki kamo-naoyuki force-pushed the refactor branch 4 times, most recently from 0fd3f61 to 08b1676 Compare March 17, 2023 06:02
@kamo-naoyuki kamo-naoyuki changed the title Refactoring asr.sh Modify asr.sh Mar 17, 2023
@kamo-naoyuki kamo-naoyuki added New Features and removed Refactoring Refactoring labels Mar 17, 2023
@mergify mergify bot merged commit 496365d into espnet:master Mar 17, 2023
@kamo-naoyuki
Copy link
Collaborator Author

@popcornell

Now we can use the same valid_set and test_set without modification. If ${valid_set} is included in ${test_sets}, it is just replaced to org/${valid} in the evaluation stages.

As the above comment, I also modified the behaviour of --skip_train. If --skip_train true is specified, the data preparation for the train_set can be skipped. This is useful for the chime7 challenge.

@kamo-naoyuki kamo-naoyuki deleted the refactor branch March 17, 2023 09:40
@popcornell
Copy link
Contributor

This is great !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ASR Automatic speech recogntion auto-merge Enable auto-merge CI Travis, Circle CI, etc ESPnet2 New Features

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants