Codestin Search App

ftshijt · 2024-04-04T08:35:31Z

What?

New ESPnet codec project

The initial PR (to the local codec branch) supports:

The library set up of the codec task
The recipe setup for mini_an4(debugging purpose and future CI support) and libriTTS recipe
A base training framework with soundstream model

References

https://github.com/alibaba-damo-academy/FunCodec/tree/master
https://github.com/facebookresearch/audiocraft
https://github.com/facebookresearch/encodec
https://github.com/yangdongchao/AcademiCodec

TODO in the following PRs:

Decoding
Evaluation
Deployment
Add additional models

“ ”

for more information, see https://pre-commit.ci

ftshijt · 2024-04-04T13:35:50Z

@jctian98 @Jungjee @wyh2000 Hi guys, may I request your review for this PR? Since it has a few framework-level design choices, it would be difficult to change in the later stage.

for more information, see https://pre-commit.ci

Jungjee

Thanks @ftshijt for your cool work!
LGTM in general.

Two comments from me (maybe close to questions):

Having a gan_codec tasks sounds like there would be other *_codec tasks upcoming. How many tasks do you expect? Is is GAN vs. GAN-irrelevant?
Could we have a folder where we could see all sorts of modules used for the codec tasks? (gan_codec and others) I initially suggested espnet2/codec/layer but after reviewing everything maybe we could put all fundamental modules (that could be re-usable) to espnet2/codec/shared? The idea comes from my assumption that some modules could be used across different *_codec tasks. But maybe this is not the optimal selection.

Jungjee · 2024-04-11T00:55:50Z

egs2/libritts/codec1/conf/train_soundstream.yaml

+##########################################################
+#                OTHER TRAINING SETTING                  #
+##########################################################


[question] is this commenting style a common thing in ESPnet? I just got curious.

It is applied to TTS-related config, and I found it very clear so I tried to follow it.

(I always intend to make a consistent style of doc strings and configs in ESPnet but have not found the time yet... )

espnet2/bin/gan_codec_train.py

Jungjee · 2024-04-11T01:06:27Z

espnet2/gan_codec/shared/decoder/seanet.py

+    return x[..., padding_left:end]
+
+
+class NormConvTranspose1d(nn.Module):


maybe these layers could be gathered to a separate folder? Such as "espnet2/codec/layers"?

Thanks for the suggestion! I generally agree with your point on setting a better organization of the modules. But I'm not pretty sure if the above layers are good to be separated. Mostly because they are not beyond the original torch function but just wrappers for the seanet module itself.

Please let me know if you think layers would be still better in the case~

Jungjee · 2024-04-11T01:07:12Z

espnet2/gan_codec/shared/discriminator/stft_discriminator.py

+from einops import rearrange
+
+
+class ModReLU(nn.Module):


ditto. Better for others to find if this layer would be in layers folder to me.

Similar to the above ones, I'm expecting the existing modules to be attached to the major components (the stft discriminator) tightly. In that case, I'm leaning toward saving it here instead of having another set for the layers.

If to make them separate, do you have some suggestions on how the layers folder would be organized? Would be great if you could share a bit more thoughts on this side so that I can understand more on how it could be in better shape. Again, appreciate your advice!

ftshijt · 2024-04-12T06:32:23Z

Thanks @ftshijt for your cool work! LGTM in general.

Two comments from me (maybe close to questions):

Having a gan_codec tasks sounds like there would be other *_codec tasks upcoming. How many tasks do you expect? Is is GAN vs. GAN-irrelevant?

Could we have a folder where we could see all sorts of modules used for the codec tasks? (gan_codec and others) I initially suggested espnet2/codec/layer but after reviewing everything maybe we could put all fundamental modules (that could be re-usable) to espnet2/codec/shared? The idea comes from my assumption that some modules could be used across different *_codec tasks. But maybe this is not the optimal selection.

For 1, I'm following the setting to tts and svs. Using the name has the intention to state the task is trained with gan_trainer instead of the existing normal trainer used in many other tasks. I think we will only support the gan_codec task for codec training for now.

For 2, yeah, I tried to put some shared modules to shared folder.

jctian98 · 2024-04-17T22:11:40Z

Some comments after discussion with @ftshijt Please let me know if I'm wrong.
(1) Since DAC / soundstream / encoded share the very similar structure like SEANet, quantizer, and overall encoder-decoder-discriminator design, it might be better to have a unified file (current soundstream.py) for them three rather than have the espnet2/gan_codec/{soundstream, encoded, DAC} folders. Instead, we can use if/else to specify the distinctions among these models.
(2) we can also make the loss classified and more configurable. Specifically, the losses can be categorized (my current thinking) adversarial loss / reconstruction loss / loss that specify the similarity between generator decoder and discriminator.

wyh2000 · 2024-04-17T22:23:40Z

Some comments after discussion with @ftshijt Please let me know if I'm wrong. (1) Since DAC / soundstream / encoded share the very similar structure like SEANet, quantizer, and overall encoder-decoder-discriminator design, it might be better to have a unified file (current soundstream.py) for them three rather than have the espnet2/gan_codec/{soundstream, encoded, DAC} folders. Instead, we can use if/else to specify the distinctions among these models. (2) we can also make the loss classified and more configurable. Specifically, the losses can be categorized (my current thinking) adversarial loss / reconstruction loss / loss that specify the similarity between generator decoder and discriminator.

My concern is that for DAC, it actually modifies some loss and quantizer details from Soundstream. It might be clearer for DAC to have a separate dac.py file.

for more information, see https://pre-commit.ci

espnet2/gan_codec/shared/quantizer/modules/core_vq.py

for more information, see https://pre-commit.ci

ftshijt · 2024-04-19T06:13:05Z

Consider the following PR (decoding/evaluation), I will merge the current one first. Let's keep discussing better code organization.

ftshijt and others added 13 commits March 30, 2024 02:32

set up initial codec task

e7ad794

“ ”

encoder decoder setup

86f6efa

setup codec task

81ea77a

update soundstream definition and setup

6a35cd0

Merge branch 'espnet:master' into codec

6537fd0

update training codec

d04c8f3

add libritts

e539040

update codec

0345872

update config

6eadce6

bugfix related to mismatched api

dece4bb

fix bug

83f18d3

fix recipe-side training

8176a3b

apply black and isort

66bedf8

ftshijt added Recipe New Features ESPnet2 Need review Codec labels Apr 4, 2024

ftshijt added this to the v.202405 milestone Apr 4, 2024

ftshijt requested a review from sw005320 April 4, 2024 08:35

ftshijt self-assigned this Apr 4, 2024

pre-commit-ci bot and others added 3 commits April 4, 2024 08:36

[pre-commit.ci] auto fixes from pre-commit.com hooks

afa4813

for more information, see https://pre-commit.ci

revert change to trim silence in libritts

58e75d7

Merge branch 'codec' of https://github.com/ftshijt/espnet into codec

7524625

ftshijt and others added 5 commits April 8, 2024 01:57

update libritts config

1531fad

[pre-commit.ci] auto fixes from pre-commit.com hooks

3306641

for more information, see https://pre-commit.ci

update chunk iterator setting

2dfb90f

update config and corresponding loss modules

51a8579

update chunk iterator

54f93d6

ftshijt added 2 commits April 9, 2024 13:04

fix collect feats (skip)

751ec40

Merge branch 'codec' of https://github.com/ftshijt/espnet into codec

49d445c

Jungjee reviewed Apr 11, 2024

View reviewed changes

ftshijt and others added 5 commits April 12, 2024 02:38

fix typo

48805f5

conflig resolve

67f0b71

change to new typeguard type

b814196

change to new typeguard type

d8d109c

Merge branch 'codec' into codec

6b137e1

ftshijt and others added 10 commits April 18, 2024 15:34

fix quantization bug

f2cdb6c

set default cmd

666225f

fix config2

76243d6

Merge branch 'codec' of https://github.com/ftshijt/espnet into codec

3c4ec57

temporary fix

8f6ede3

update mel_loss real bug

bceebe1

[pre-commit.ci] auto fixes from pre-commit.com hooks

fc1e237

for more information, see https://pre-commit.ci

minor fix to bugs

d7f3b25

minor fix to bugs

f2ed45a

[pre-commit.ci] auto fixes from pre-commit.com hooks

4d5fb27

for more information, see https://pre-commit.ci

wyh2000 reviewed Apr 19, 2024

View reviewed changes

espnet2/gan_codec/shared/quantizer/modules/core_vq.py Outdated Show resolved Hide resolved

ftshijt and others added 4 commits April 18, 2024 22:24

minor fix to bugs

6d7181d

minor fix to bugs

d09bb4f

remove useless weight

b6c4251

[pre-commit.ci] auto fixes from pre-commit.com hooks

29236a0

for more information, see https://pre-commit.ci

ftshijt added the auto-merge Enable auto-merge label Apr 19, 2024

ftshijt merged commit 755c4d7 into espnet:codec Apr 19, 2024

ftshijt mentioned this pull request Jul 23, 2024

ESPnet Codec Implementation #5808

Merged

		return x[..., padding_left:end]


		class NormConvTranspose1d(nn.Module):

Conversation

ftshijt commented Apr 4, 2024

What?

References

TODO in the following PRs:

Uh oh!

ftshijt commented Apr 4, 2024

Uh oh!

Jungjee left a comment

Choose a reason for hiding this comment

Uh oh!

Jungjee Apr 11, 2024

Choose a reason for hiding this comment

Uh oh!

ftshijt Apr 12, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Jungjee Apr 11, 2024

Choose a reason for hiding this comment

Uh oh!

ftshijt Apr 12, 2024

Choose a reason for hiding this comment

Uh oh!

ftshijt Apr 12, 2024

Choose a reason for hiding this comment

Uh oh!

Jungjee Apr 11, 2024

Choose a reason for hiding this comment

Uh oh!

ftshijt Apr 12, 2024

Choose a reason for hiding this comment

Uh oh!

ftshijt commented Apr 12, 2024

Uh oh!

jctian98 commented Apr 17, 2024

Uh oh!

wyh2000 commented Apr 17, 2024

Uh oh!

Uh oh!

ftshijt commented Apr 19, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants