Add diffusion-base SE model to ESPnet-SE#5572
Conversation
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
| from espnet2.layers.inversible_interface import InversibleInterface | ||
| from espnet.nets.pytorch_backend.nets_utils import make_pad_mask | ||
|
|
||
| is_torch_1_10_plus = V(torch.__version__) >= V("1.10.0") |
There was a problem hiding this comment.
You don't need to add it because current ESPnet already dropped support for PyTorch versions before 1.12.1.
There was a problem hiding this comment.
See line 93 of espnet2/layers/stft.py. Add this check to use native support for FFT and STFT on all CPU targets including ARM.
Done. |
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
Co-authored-by: Wangyou Zhang <[email protected]>
Co-authored-by: Wangyou Zhang <[email protected]>
|
Hi, @sw005320 , I also asked @popcornell to help review this PR. |
sw005320
left a comment
There was a problem hiding this comment.
I have some high-level comments.
There was a problem hiding this comment.
It seems that a lot of lines are not tested, according to Codecov. Can you double-check it?
There was a problem hiding this comment.
It may be ok as this code is taken mostly as it is from sgmse
There was a problem hiding this comment.
The files in ncsnpp_utils are taken from the sgmse repo. Some of them are not used, and thus not tested. Should I add tests to those unused code lines or just remove them? @sw005320 .
There was a problem hiding this comment.
I see.
The discussions are whether we can keep the unused functions or not.
- If we will use them in the future, we can add tests
- If not, maybe, we can keep them as they are or we can remove them.
There was a problem hiding this comment.
I added unit tests for the unused NCSNpp functions as much as I could.
There was a problem hiding this comment.
It seems that a lot of lines are not tested, according to Codecov. Can you double-check it?
There was a problem hiding this comment.
It seems that a lot of lines are not tested, according to Codecov. Can you double-check it?
| spec_factor: float = 0.15, | ||
| spec_abs_exponent: float = 0.5, |
There was a problem hiding this comment.
Where do these values come from?
Can you explain them?
There was a problem hiding this comment.
I think they come largely from SGMSE https://arxiv.org/pdf/2203.17004.pdf but for spec factor they used 1/3 there
There was a problem hiding this comment.
if the waveform is standard dev normalized to 1, using 1/3 as spec factor bounds the STFT max value to +- 1.5.
I think using 0.15 bounds instead to -+ 0.75 which may actually be better.
Do you normalize the waveform std @LiChenda ?
There was a problem hiding this comment.
diffusion is super sensitive to input min max range
There was a problem hiding this comment.
@sw005320 , these numbers come from section V.D in [1]. I added comments to the code. @popcornell , in their journal paper [1], they use 0.15 and 0.5. I did not normalize the waveform std.
[1] J. Richter, S. Welker, J.-M. Lemercier, B. Lay, and T. Gerkmann, “Speech Enhancement and Dereverberation With Diffusion-Based Generative Models,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 31, pp. 2351–2364, 2023.
There was a problem hiding this comment.
Do you normalize by the .abs().amax() to be always within -1 and 1 ?
https://github.com/sp-uhh/sgmse/blob/c6e3291ee56b07792c9d8c7d7d49487b3042e01b/sgmse/data_module.py#L72
There was a problem hiding this comment.
I added the normalize option but didn't use it in my model training by default. Because I feel it is not reasonable for causal models.
espnet/espnet2/enh/diffusion_enh.py
Line 113 in 12c0eb2
for more information, see https://pre-commit.ci
|
LGTM. |
Hi, @sw005320 , I added unit tests for the unused NCSNpp functions as much as I could. Please merge it when the CI test is passed. |
What?
Why?
Extend ESPnet-SE to support diffusion-based generative enhancement models.
Others