Thanks to visit codestin.com
Credit goes to github.com

Skip to content

A few updates for asr2 and hubert#5285

Merged
mergify[bot] merged 3 commits intoespnet:masterfrom
simpleoier:hubert
Jul 21, 2023
Merged

A few updates for asr2 and hubert#5285
mergify[bot] merged 3 commits intoespnet:masterfrom
simpleoier:hubert

Conversation

@simpleoier
Copy link
Collaborator

@simpleoier simpleoier commented Jul 6, 2023

Update

  1. Update hubert.sh to support download pretrained ckpt.
  2. Update data loading part in SSL feature readers, related to asr2 and hubert.
    • ESPnet data iterator is now supported for batch loader and audio reader.
    • Using batch loader can reduce the time it takes for feature dumping and pseudo-label generation.

@mergify mergify bot added the ESPnet2 label Jul 6, 2023
@codecov
Copy link

codecov bot commented Jul 6, 2023

Codecov Report

Merging #5285 (33dd0f1) into master (0971f17) will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master    #5285   +/-   ##
=======================================
  Coverage   76.10%   76.10%           
=======================================
  Files         658      658           
  Lines       59156    59156           
=======================================
  Hits        45022    45022           
  Misses      14134    14134           
Flag Coverage Δ
test_integration_espnet1 65.96% <ø> (ø)
test_integration_espnet2 47.52% <ø> (ø)
test_python 66.49% <ø> (ø)
test_utils 23.17% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@simpleoier simpleoier requested a review from ftshijt July 7, 2023 01:21
Comment on lines +54 to +64
sampler = NumElementsBatchSampler(
batch_bins=batch_bins,
shape_files=[utt2num_samples],
)
batches = list(sampler)
iterator = SequenceIterFactory(
dataset=dataset,
batches=batches,
collate_fn=CommonCollateFn(float_pad_value=0.0, int_pad_value=-1),
num_workers=2,
).build_iter(0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a specific reason of using numel sampler and sequence iterator, I thought streaming iterator would be enough (it is also easier to trace as it does not go through the sampler)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really. I just think it is too complicated to define multiple choices of iterators / samplers. So I randomly chose one. Not sure which one is good enough. But in the end, we may find one that covers most of the use cases, which may be enough, right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for using numel sampler and sequence iterator instead of streaming iterator is because the utterances are sorted, thus less computations are wasted due to padding. Another reason for using numel sampler is that it is easier to avoid OOM error than others, e.g. sorted sampler.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fair, thanks for the clarification.

…etrained hubert checkpoints. Related to hubert and asr2 recipes
@sw005320
Copy link
Contributor

Do these new parts go through the test?

@simpleoier
Copy link
Collaborator Author

Yeah, it will be tested in test_integration_espnet2.sh

@sw005320
Copy link
Contributor

OK, sounds good.
After we fix the CI issue (sorry about it), I'll merge it.

@sw005320 sw005320 added Enhancement Enhancement auto-merge Enable auto-merge SSL self-supervised learning labels Jul 20, 2023
@sw005320 sw005320 added this to the v.202307 milestone Jul 20, 2023
@mergify mergify bot merged commit 0140a5f into espnet:master Jul 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-merge Enable auto-merge Enhancement Enhancement ESPnet2 SSL self-supervised learning

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants