Codestin Search App

simpleoier · 2023-07-06T21:26:43Z

Update

Update hubert.sh to support download pretrained ckpt.
Update data loading part in SSL feature readers, related to asr2 and hubert.
- ESPnet data iterator is now supported for batch loader and audio reader.
- Using batch loader can reduce the time it takes for feature dumping and pseudo-label generation.

codecov · 2023-07-06T22:05:00Z

Codecov Report

Merging #5285 (33dd0f1) into master (0971f17) will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master    #5285   +/-   ##
=======================================
  Coverage   76.10%   76.10%           
=======================================
  Files         658      658           
  Lines       59156    59156           
=======================================
  Hits        45022    45022           
  Misses      14134    14134

Flag	Coverage Δ
test_integration_espnet1	`65.96% <ø> (ø)`
test_integration_espnet2	`47.52% <ø> (ø)`
test_python	`66.49% <ø> (ø)`
test_utils	`23.17% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

ftshijt · 2023-07-08T20:23:05Z

egs2/TEMPLATE/asr1/pyscripts/feats/ssl_feature_utils.py

+    sampler = NumElementsBatchSampler(
+        batch_bins=batch_bins,
+        shape_files=[utt2num_samples],
+    )
+    batches = list(sampler)
+    iterator = SequenceIterFactory(
+        dataset=dataset,
+        batches=batches,
+        collate_fn=CommonCollateFn(float_pad_value=0.0, int_pad_value=-1),
+        num_workers=2,
+    ).build_iter(0)


Is there a specific reason of using numel sampler and sequence iterator, I thought streaming iterator would be enough (it is also easier to trace as it does not go through the sampler)

Not really. I just think it is too complicated to define multiple choices of iterators / samplers. So I randomly chose one. Not sure which one is good enough. But in the end, we may find one that covers most of the use cases, which may be enough, right?

The reason for using numel sampler and sequence iterator instead of streaming iterator is because the utterances are sorted, thus less computations are wasted due to padding. Another reason for using numel sampler is that it is easier to avoid OOM error than others, e.g. sorted sampler.

That's fair, thanks for the clarification.

…etrained hubert checkpoints. Related to hubert and asr2 recipes

for more information, see https://pre-commit.ci

sw005320 · 2023-07-20T16:03:54Z

Do these new parts go through the test?

simpleoier · 2023-07-20T16:05:17Z

Yeah, it will be tested in test_integration_espnet2.sh

sw005320 · 2023-07-20T16:06:14Z

OK, sounds good.
After we fix the CI issue (sorry about it), I'll merge it.

mergify bot added the ESPnet2 label Jul 6, 2023

simpleoier mentioned this pull request Jul 6, 2023

A few minor fixes for SSL #5265

Merged

simpleoier requested a review from ftshijt July 7, 2023 01:21

ftshijt reviewed Jul 8, 2023

View reviewed changes

1. update data iterator for SSL feature kmeans. 2. update download pr…

5f53930

…etrained hubert checkpoints. Related to hubert and asr2 recipes

simpleoier force-pushed the hubert branch from 7f27ce9 to 5f53930 Compare July 12, 2023 19:39

[pre-commit.ci] auto fixes from pre-commit.com hooks

8b33a5a

for more information, see https://pre-commit.ci

sw005320 added Enhancement Enhancement auto-merge Enable auto-merge SSL self-supervised learning labels Jul 20, 2023

sw005320 added this to the v.202307 milestone Jul 20, 2023

simpleoier mentioned this pull request Jul 21, 2023

CoVoST2 ASR2 recipe and new ST2 recipe #5318

Closed

12 tasks

Merge branch 'master' into hubert

33dd0f1

mergify bot merged commit 0140a5f into espnet:master Jul 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A few updates for asr2 and hubert#5285

A few updates for asr2 and hubert#5285
mergify[bot] merged 3 commits intoespnet:masterfrom
simpleoier:hubert

simpleoier commented Jul 6, 2023 •

edited

Loading

Uh oh!

codecov bot commented Jul 6, 2023 •

edited

Loading

Uh oh!

ftshijt Jul 8, 2023

Uh oh!

simpleoier Jul 9, 2023

Uh oh!

simpleoier Jul 10, 2023

Uh oh!

ftshijt Jul 10, 2023

Uh oh!

sw005320 commented Jul 20, 2023

Uh oh!

simpleoier commented Jul 20, 2023

Uh oh!

sw005320 commented Jul 20, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

simpleoier commented Jul 6, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Update

Uh oh!

codecov bot commented Jul 6, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

ftshijt Jul 8, 2023

Choose a reason for hiding this comment

Uh oh!

simpleoier Jul 9, 2023

Choose a reason for hiding this comment

Uh oh!

simpleoier Jul 10, 2023

Choose a reason for hiding this comment

Uh oh!

ftshijt Jul 10, 2023

Choose a reason for hiding this comment

Uh oh!

sw005320 commented Jul 20, 2023

Uh oh!

simpleoier commented Jul 20, 2023

Uh oh!

sw005320 commented Jul 20, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

simpleoier commented Jul 6, 2023 •

edited

Loading

codecov bot commented Jul 6, 2023 •

edited

Loading