SE function updates: new models and support for handling various sampling frequencies#5800
SE function updates: new models and support for handling various sampling frequencies#5800mergify[bot] merged 14 commits intoespnet:masterfrom
Conversation
…mprove espnet2/bin/enh_inference.py and espnet2/bin/enh_scoring.py to support various sampling rates; improve ChunkIterator to support keeping short samples and truncating samples to a max length; Add bandwidth_limitation in espnet2/layers/augmentation.py; Update espnet2/enh/espnet_model.py to support handling different sampling rates in one model
for more information, see https://pre-commit.ci
|
Cool! @LiChenda and @kohei0209, can you review this PR? |
Sure! |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #5800 +/- ##
==========================================
- Coverage 54.59% 52.59% -2.01%
==========================================
Files 771 775 +4
Lines 70732 71155 +423
==========================================
- Hits 38616 37422 -1194
- Misses 32116 33733 +1617
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
| ref_ = ref[i] | ||
| inf_ = inf[int(perm[i])] | ||
| elif sample_rate > 16000: | ||
| mode = "wb" |
There was a problem hiding this comment.
We'd better add some log or warning here.
There was a problem hiding this comment.
It has been added in the lines below.
| Args: | ||
| fs (int): new sampling rate | ||
| """ | ||
| assert fs % self.default_fs == 0 or self.default_fs % fs == 0 |
There was a problem hiding this comment.
Why this assertion was removed?
There was a problem hiding this comment.
This is to allow the process of speech with a sampling rate that is fractional of the encoder's default sampling rate. It doesn't have to be exactly 1/n or a multiple, so I removed this line.
| Args: | ||
| fs (int): new sampling rate | ||
| """ # noqa: H405 | ||
| assert fs % self.default_fs == 0 or self.default_fs % fs == 0 |
There was a problem hiding this comment.
Why this assertion was removed?
There was a problem hiding this comment.
This is to allow the process of speech with a sampling rate that is fractional of the encoder's default sampling rate. It doesn't have to be exactly 1/n or a multiple, so I removed this line.
There was a problem hiding this comment.
What is the consideration of adding a new tcn_separator2.py instead of updating functions in tcn_separator.py ? Is it necessary?
There was a problem hiding this comment.
I have merged them!
|
OK for me. |
|
@Emrys365, we have an issue in the CI https://github.com/espnet/espnet/actions/runs/9500003127/job/26182089849?pr=5800 |
|
OK! It should be fixed now. |
|
@Emrys365 Sorry for the late response. Codes look good to me! |
What?
This PR updates the speech enhancement functions from the following aspects:
espnet2/bin/enh_inference.pyandespnet2/bin/enh_scoring.pyto support various sampling ratesespnet2/iterators/chunk_iter_factory.pyto supportbandwidth_limitationinespnet2/layers/augmentation.pyalways_forward_in_48kinespnet2/enh/espnet_model.pyto support handling different sampling rates in a single SE modell1_timedomain+magspec_lossinespnet2/enh/loss/criterions/time_domain.pyto be more numerically stable.Why?
These updates provide important functions for the forthcoming URGENT Challenge.
See also