Codestin Search App

kamo-naoyuki · 2023-02-17T21:23:15Z

Issue:

If using a test_set as the train_set or valid_set in asr.sh, the test set is modified by stage 4 Remove long/short utt

Modify:

~~- Current behaviour: ${data_feats}/org/${dset} at stage 3 -> ${data_feats}/${dset} at stage 4~~
~~- In this PR: ${data_feats}/${dset} at stage 3 -> ${data_feats}/${dset}_flt at stage 4~~

I only modified asr.sh in this PR, but all templates has same problem (due to my bad original template script...)

@sw005320

codecov · 2023-02-17T21:54:51Z

Codecov Report

Merging #4944 (1835a92) into master (9c7bde4) will increase coverage by 0.01%.
The diff coverage is 86.88%.

@@            Coverage Diff             @@
##           master    #4944      +/-   ##
==========================================
+ Coverage   76.63%   76.65%   +0.01%     
==========================================
  Files         604      604              
  Lines       53934    53992      +58     
==========================================
+ Hits        41334    41385      +51     
- Misses      12600    12607       +7

Flag	Coverage Δ
test_integration_espnet1	`66.33% <ø> (ø)`
test_integration_espnet2	`47.42% <49.18%> (+<0.01%)`	⬆️
test_python	`66.57% <81.96%> (+0.01%)`	⬆️
test_utils	`23.35% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
espnet2/samplers/build_batch_sampler.py	`92.85% <ø> (ø)`
espnet2/train/iterable_dataset.py	`84.67% <75.00%> (-0.80%)`	⬇️
espnet2/tasks/abs_task.py	`75.90% <85.71%> (+0.22%)`	⬆️
espnet2/samplers/sorted_batch_sampler.py	`87.50% <87.50%> (ø)`
espnet2/samplers/unsorted_batch_sampler.py	`83.33% <87.50%> (+0.83%)`	⬆️
espnet2/samplers/folded_batch_sampler.py	`85.55% <88.88%> (+0.37%)`	⬆️
espnet2/samplers/length_batch_sampler.py	`87.80% <88.88%> (+0.13%)`	⬆️
espnet2/samplers/num_elements_batch_sampler.py	`87.64% <88.88%> (+0.14%)`	⬆️
espnet2/main_funcs/collect_stats.py	`90.90% <100.00%> (+0.28%)`	⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

kamo-naoyuki · 2023-02-19T08:35:16Z

I changed my mind.

I implemented --filtered_train_key_text and --filtered_valid_key_text for espnet2/bin/*_train.py.

With giving a text file containing IDs to be filtered,
The samples specified by this option are excluded from the training.

e.g.

wav.scp:

IDa a.wav
IDb b.wav
IDc b.wav

filtered_key.txt

IDb

In this case, IDb is excluded, and IDa and IDc remain for the training.

I also changed asr.sh to create filtered_key.txt at stage 4 instead of creating a new dataset.

kamo-naoyuki · 2023-02-20T00:01:24Z

I changed my mind again.

Filtering short/long utterances by the option of the python tool is better way as a viewpoint for a smart recipe, but it could make some overhead for start at the startup.

Creating another dataset is a dirty way, but actually efficient for training speed.

I'll think about it.

mergify bot added the ESPnet2 label Feb 17, 2023

kamo-naoyuki force-pushed the filtering branch from 86daf0d to 12225f6 Compare February 17, 2023 22:34

mergify bot added the CI Travis, Circle CI, etc label Feb 17, 2023

kamo-naoyuki force-pushed the filtering branch 5 times, most recently from 03fd1f1 to 6c5d459 Compare February 19, 2023 05:37

Implement --filtered_train_key_file and --filtered_valid_key_file

1835a92

kamo-naoyuki force-pushed the filtering branch from 6c5d459 to 1835a92 Compare February 19, 2023 06:57

kamo-naoyuki mentioned this pull request Mar 16, 2023

Modify asr.sh #5020

Merged

kamo-naoyuki closed this Mar 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue if train_set or valid_set are included in test sets#4944

Issue if train_set or valid_set are included in test sets#4944
kamo-naoyuki wants to merge 1 commit intoespnet:masterfrom
kamo-naoyuki:filtering

kamo-naoyuki commented Feb 17, 2023 •

edited

Loading

Uh oh!

codecov bot commented Feb 17, 2023 •

edited

Loading

Uh oh!

kamo-naoyuki commented Feb 19, 2023

Uh oh!

kamo-naoyuki commented Feb 20, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kamo-naoyuki commented Feb 17, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Feb 17, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

kamo-naoyuki commented Feb 19, 2023

Uh oh!

kamo-naoyuki commented Feb 20, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

kamo-naoyuki commented Feb 17, 2023 •

edited

Loading

codecov bot commented Feb 17, 2023 •

edited

Loading