ESPnet-codec Training and Setup#5732
Conversation
Jungjee
left a comment
There was a problem hiding this comment.
Thanks @ftshijt for your cool work!
LGTM in general.
Two comments from me (maybe close to questions):
- Having a
gan_codectasks sounds like there would be other*_codectasks upcoming. How many tasks do you expect? Is is GAN vs. GAN-irrelevant? - Could we have a folder where we could see all sorts of modules used for the codec tasks? (gan_codec and others) I initially suggested
espnet2/codec/layerbut after reviewing everything maybe we could put all fundamental modules (that could be re-usable) toespnet2/codec/shared? The idea comes from my assumption that some modules could be used across different *_codec tasks. But maybe this is not the optimal selection.
| ########################################################## | ||
| # OTHER TRAINING SETTING # | ||
| ########################################################## |
There was a problem hiding this comment.
[question] is this commenting style a common thing in ESPnet? I just got curious.
There was a problem hiding this comment.
It is applied to TTS-related config, and I found it very clear so I tried to follow it.
(I always intend to make a consistent style of doc strings and configs in ESPnet but have not found the time yet... )
| return x[..., padding_left:end] | ||
|
|
||
|
|
||
| class NormConvTranspose1d(nn.Module): |
There was a problem hiding this comment.
maybe these layers could be gathered to a separate folder? Such as "espnet2/codec/layers"?
There was a problem hiding this comment.
Thanks for the suggestion! I generally agree with your point on setting a better organization of the modules. But I'm not pretty sure if the above layers are good to be separated. Mostly because they are not beyond the original torch function but just wrappers for the seanet module itself.
There was a problem hiding this comment.
Please let me know if you think layers would be still better in the case~
| from einops import rearrange | ||
|
|
||
|
|
||
| class ModReLU(nn.Module): |
There was a problem hiding this comment.
ditto. Better for others to find if this layer would be in layers folder to me.
There was a problem hiding this comment.
Similar to the above ones, I'm expecting the existing modules to be attached to the major components (the stft discriminator) tightly. In that case, I'm leaning toward saving it here instead of having another set for the layers.
If to make them separate, do you have some suggestions on how the layers folder would be organized? Would be great if you could share a bit more thoughts on this side so that I can understand more on how it could be in better shape. Again, appreciate your advice!
For 1, I'm following the setting to tts and svs. Using the name has the intention to state the task is trained with gan_trainer instead of the existing normal trainer used in many other tasks. I think we will only support the gan_codec task for codec training for now. For 2, yeah, I tried to put some shared modules to |
|
Some comments after discussion with @ftshijt Please let me know if I'm wrong. |
My concern is that for DAC, it actually modifies some loss and quantizer details from Soundstream. It might be clearer for DAC to have a separate |
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
|
Consider the following PR (decoding/evaluation), I will merge the current one first. Let's keep discussing better code organization. |
What?
New ESPnet codec project
The initial PR (to the local codec branch) supports:
References
https://github.com/alibaba-damo-academy/FunCodec/tree/master
https://github.com/facebookresearch/audiocraft
https://github.com/facebookresearch/encodec
https://github.com/yangdongchao/AcademiCodec
TODO in the following PRs: