Conversation
Codecov Report
@@ Coverage Diff @@
## master #5020 +/- ##
==========================================
+ Coverage 73.47% 77.02% +3.55%
==========================================
Files 606 606
Lines 53748 53748
==========================================
+ Hits 39491 41400 +1909
+ Misses 14257 12348 -1909
Flags with carried forward coverage won't be shown. Click here to find out more. see 61 files with indirect coverage changes π£ Weβre building smart automated test selection to slash your CI/CD build times. Learn more |
0a92365 to
9830e79
Compare
0fd3f61 to
08b1676
Compare
|
Now we can use the same As the above comment, I also modified the behaviour of |
|
This is great ! |
This PR includes my self-answer to #4944.
asr.shto raise an error if--train_setequal to--valid_set, or--train_setis also included in--test_sets.--valid_setis included in--test_sets,--eval_valid_setoption is enabled.--eval_valid_setis enabled,dump/org/${valid_set}is evaluated at the decoding stage instead ofdump/${valid_set}Someone might still think we should filter the long/short utterances in the training python process, but finally, I concluded it's a bad idea. Please let me go in this direction.
I also changed the behavior of the
--skip_trainoption:${train_set}and${valid_set}still works. This is inconvinient if using a pre-trained model and evaluation is only required.${train_set}and${valid_set}can be also skipped.Finally, I added a new feat-type:
--feats_type raw_copy. This is almost the same as--feats_type raw_copy, but it can skipformat_wav_scp.py.This type is useful if the file format is already correct.γSometimes, we create a new evaluation set from an existing dataset, e.g. applying a speech enhancement method. In this case. the data set may be already correct, so we might want to skip
format_wav_scp.py.Please note that if an user specifies
--feats_type raw_copy, the user is responsible to guaranteed that the data format follows correctly our requirements.