Codestin Search App

Emrys365 · 2023-05-03T23:12:04Z

This PR mainly update the implementation of TD-SpeakerBeam for target speaker extraction:

It now also support the speaker embedding as an auxiliary input.
A pre-mask activation is added by default to make the training easier.

sw005320 · 2023-05-03T23:51:44Z

Can you add a result and model link to README.md?

Emrys365 · 2023-05-04T00:09:29Z

OK. After I finish the training, I will do that.

simpleoier

Thanks! I left some comments.

simpleoier · 2023-05-04T03:29:23Z

egs2/librimix/tse1/conf/tuning/train_enh_tse_td_speakerbeam_4gpu_max.yaml

 batch_size: 16
 iterator_type: chunk
-chunk_length: 24000
+chunk_length: 48000


Do you think it may be useful to mention the sampling rate used for this parameter?

Yes, this is true. I think I'd better mention the sample rate in the file name.

simpleoier · 2023-05-04T03:29:30Z

egs2/librimix/tse1/conf/tuning/train_enh_tse_td_speakerbeam_4gpu_max.yaml


 train_spk2enroll: data/train-100/spk2enroll.json
-enroll_segment: 24000
+enroll_segment: 48000


simpleoier · 2023-05-04T03:29:41Z

egs2/librimix/tse1/conf/tuning/train_enh_tse_td_speakerbeam_4gpu_max.yaml

    channel: 256
-    kernel_size: 16
-    stride: 8
+    kernel_size: 32


simpleoier · 2023-05-04T03:36:38Z

espnet2/enh/extractor/td_speakerbeam_extractor.py

+            elif input_aux.size(-2) == 1:
+                aux_feature = input_aux.moveaxis(-2, -1)
+        else:
+            aux_feature = aux_feature.transpose(1, 2)  # B, N, L'


Suggested change

aux_feature = aux_feature.transpose(1, 2) # B, N, L'

assert aux_feature.dim() == 3

aux_feature = aux_feature.transpose(1, 2) # B, N, L'

simpleoier · 2023-05-04T03:37:42Z

espnet2/enh/extractor/td_speakerbeam_extractor.py

-        aux_feature = aux_feature.transpose(1, 2)  # B, N, L'
+        if self.use_spk_emb:
+            # B, N, L'=1
+            if input_aux.dim() == 2:


Is this expected to use input_aux instead of aux_feature here and after?

Oh I think it should be aux_feature, although usually input_aux is equivalent here.

simpleoier · 2023-05-04T03:40:02Z

espnet2/enh/extractor/td_speakerbeam_extractor.py


        feature = feature.transpose(1, 2)  # B, N, L
-        aux_feature = aux_feature.transpose(1, 2)  # B, N, L'
+        if self.use_spk_emb:


It's a bit difficult to follow/understand in what cases use_spk_emb=True if I didn't know speakerbeam. Can you add some introduction comment here or above ?

I will add a comment above:

# NOTE(wangyou): When `self.use_spk_emb` is True, `aux_feature` is assumed to be # a speaker embedding; otherwise, it is assumed to be an enrollment audio. if self.use_spk_emb: ...

simpleoier · 2023-05-04T03:47:06Z

espnet2/enh/layers/tcn.py

-            layer_norm, bottleneck_conv1x1, temporal_conv_net, mask_conv1x1
-        )
+        if pre_mask_nonlinear == "linear":
+            self.network = nn.Sequential(


A minor question, if the self.network is used separately in forward (e.g. bottleneck, tcn, masknet called individually), what's the benefit of defining them in a Sequential(). I found it is a bit ambiguous in naming the subnet in forward().

This is just for back-compatibility with Conv-TasNet. Because previously the TCN implementation also used Sequential for speech separation, which can be used as a whole. But for TD-SpeakerBeam, we cannot use the Sequential module directly because of the input/output mismatch between sub-modules.

simpleoier

LGTM!

sw005320 · 2023-05-04T13:48:58Z

Hmm
https://github.com/espnet/espnet/actions/runs/4879274957/jobs/8705692129?pr=5155#step:8:35
Not sure. Should we wait for the fairseq to fix this?

Emrys365 · 2023-05-04T14:57:27Z

Hmm https://github.com/espnet/espnet/actions/runs/4879274957/jobs/8705692129?pr=5155#step:8:35 Not sure. Should we wait for the fairseq to fix this?

It seems this issue has been reported since Jan 17, but they have not fixed it yet.

Probably we should use numpy<=1.23.3 before fairseq is updated.

codecov · 2023-05-11T14:36:45Z

Codecov Report

Merging #5155 (a93775c) into master (84f3bde) will decrease coverage by 0.01%.
The diff coverage is 80.95%.

@@            Coverage Diff             @@
##           master    #5155      +/-   ##
==========================================
- Coverage   74.99%   74.99%   -0.01%     
==========================================
  Files         618      618              
  Lines       55588    55603      +15     
==========================================
+ Hits        41689    41700      +11     
- Misses      13899    13903       +4

Flag	Coverage Δ
test_integration_espnet1	`66.28% <ø> (ø)`
test_integration_espnet2	`47.60% <61.90%> (-0.01%)`	⬇️
test_python	`65.45% <80.95%> (+<0.01%)`	⬆️
test_utils	`23.28% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
espnet2/enh/extractor/td_speakerbeam_extractor.py	`90.24% <73.33%> (-9.76%)`	⬇️
espnet2/enh/layers/tcn.py	`95.63% <100.00%> (+0.06%)`	⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Emrys365 · 2023-06-05T19:34:59Z

The model has also been uploaded to HuggingFace: https://huggingface.co/espnet/Wangyou_Zhang_librimix_train_enh_tse_td_speakerbeam_raw

Update TD-SpeakerBeam; fix minor formatting issues

02b9cc7

Emrys365 added Recipe ESPnet2 SE Speech enhancement labels May 3, 2023

Emrys365 mentioned this pull request May 3, 2023

when I run egs2/librimix/tse1/run.sh, the loss=0.000e+00 all the time #5065

Closed

sw005320 modified the milestones: v.202303, v.202307 May 3, 2023

sw005320 requested a review from simpleoier May 3, 2023 23:51

Remove unused code from data preparation in egs2/librimix/tse1

b2c77e5

simpleoier reviewed May 4, 2023

View reviewed changes

Reflect comments

06c5ded

simpleoier approved these changes May 4, 2023

View reviewed changes

Merge branch 'master' of github.com:espnet/espnet into tse

3c2940a

sw005320 added the auto-merge Enable auto-merge label May 15, 2023

Merge branch 'master' into tse

a93775c

mergify bot merged commit 6e35c14 into espnet:master May 15, 2023

	aux_feature = aux_feature.transpose(1, 2) # B, N, L'
	assert aux_feature.dim() == 3
	aux_feature = aux_feature.transpose(1, 2) # B, N, L'

Conversation

Emrys365 commented May 3, 2023

Uh oh!

sw005320 commented May 3, 2023

Uh oh!

Emrys365 commented May 4, 2023

Uh oh!

simpleoier left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

simpleoier left a comment

Choose a reason for hiding this comment

Uh oh!

sw005320 commented May 4, 2023

Uh oh!

Emrys365 commented May 4, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented May 11, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Emrys365 commented Jun 5, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Emrys365 commented May 4, 2023 •

edited

Loading

codecov bot commented May 11, 2023 •

edited

Loading