Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[GAN SVS] Add VISinger2, UHifiGAN, Avocodo#5123

Merged
ftshijt merged 65 commits intoespnet:masterfrom
jerryuhoo:uhifigan
May 23, 2023
Merged

[GAN SVS] Add VISinger2, UHifiGAN, Avocodo#5123
ftshijt merged 65 commits intoespnet:masterfrom
jerryuhoo:uhifigan

Conversation

@jerryuhoo
Copy link
Contributor

@jerryuhoo jerryuhoo commented Apr 17, 2023

@ftshijt @A-Quarter-Mile
This is an update for VISinger, I added multiple modules.

  • Update VISinger 1 generator
    • Add an option to use phoneme predictor or not
    • Add an option to use flow or not
  • Add VISinger 2 generator
  • Add VISinger 2 vocoder generator (DDSP)
  • Add VISinger 2 vocoder discriminator (+MFD)
  • Add Avocodo
  • Add UHifiGAN
  • Add yin feature (For Pits)
    • Pits
  • Unit tests for all those changes
  • Fix F0 bug for SVS

jerryuhoo and others added 30 commits March 18, 2023 17:30
Remove relu to avoid gradient vanishing.
Combine melody information in pitch predictor.
This is not a bug, but an improvement in VISinger2, I will add it later.
note that there's a bug when changing downsample parameters.
This change is for both gan_tts and gan_svs
@ftshijt
Copy link
Collaborator

ftshijt commented May 11, 2023

Hi @jerryuhoo could you let me know when you have finished the development? Then I can also help to fix the CI issue in the import test for you.

@jerryuhoo
Copy link
Contributor Author

Hi @jerryuhoo could you let me know when you have finished the development? Then I can also help to fix the CI issue in the import test for you.

Those listed models and functions are done, but I'm still investigating the performance gap. It is caused by either the posterior encoder or the vocoder, but currently I cannot find the bug.

@ftshijt
Copy link
Collaborator

ftshijt commented May 18, 2023

Sorry that I did not find time to fix the CI, let's discuss the details in today's meeting

@jerryuhoo
Copy link
Contributor Author

Some code can be improved such as ddsp module (some part of the ddsp code is not used) and modules in MFD. For example, in visinger2_vocoder.py, maybe we can use LogMelFbank instead of TorchSTFT. But LogMelFbank doesn't have a feature of domain="double", which considers both linear and log fbanks.

@ftshijt
Copy link
Collaborator

ftshijt commented May 22, 2023

Some code can be improved such as ddsp module (some part of the ddsp code is not used) and modules in MFD. For example, in visinger2_vocoder.py, maybe we can use LogMelFbank instead of TorchSTFT. But LogMelFbank doesn't have a feature of domain="double", which considers both linear and log fbanks.

You may add TODOs to the codebase. As this PR consists of important bug fix, I believe we can quickly merge it by fixing the ci.

@ftshijt
Copy link
Collaborator

ftshijt commented May 22, 2023

I've fixed the import test and a few comments errors for you

@ftshijt
Copy link
Collaborator

ftshijt commented May 22, 2023

Last request for this PR, can we fix the ci tests for the imported functions https://github.com/espnet/espnet/actions/runs/5043122127/jobs/9044514638 ? After that, I can merge it

@codecov
Copy link

codecov bot commented May 23, 2023

Codecov Report

Merging #5123 (912cfe9) into master (d1074ce) will increase coverage by 0.00%.
The diff coverage is 73.88%.

@@            Coverage Diff            @@
##           master    #5123     +/-   ##
=========================================
  Coverage   74.99%   75.00%             
=========================================
  Files         618      630     +12     
  Lines       55603    56816   +1213     
=========================================
+ Hits        41700    42614    +914     
- Misses      13903    14202    +299     
Flag Coverage Δ
test_integration_espnet1 66.28% <ø> (ø)
test_integration_espnet2 47.61% <29.41%> (+<0.01%) ⬆️
test_python 65.66% <73.88%> (+0.20%) ⬆️
test_utils 23.28% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
espnet2/bin/svs_inference.py 0.00% <0.00%> (ø)
espnet2/gan_svs/espnet_model.py 0.00% <0.00%> (ø)
espnet2/gan_svs/vits/phoneme_predictor.py 100.00% <ø> (ø)
espnet2/svs/espnet_model.py 7.01% <0.00%> (-0.56%) ⬇️
espnet2/tasks/gan_svs.py 0.00% <0.00%> (ø)
espnet2/train/preprocessor.py 29.16% <0.00%> (ø)
espnet2/tts/feats_extract/yin.py 0.00% <0.00%> (ø)
espnet2/tts/feats_extract/ying.py 0.00% <0.00%> (ø)
espnet2/gan_svs/visinger2/ddsp.py 28.44% <28.44%> (ø)
espnet2/gan_svs/uhifigan/uhifigan.py 68.53% <68.53%> (ø)
... and 18 more

... and 7 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@ftshijt ftshijt merged commit 09a7e49 into espnet:master May 23, 2023

def _apply_spectral_norm(m: torch.nn.Module):
if isinstance(m, torch.nn.Conv2d):
if isinstance(m, torch.nn.Conv1d):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#5215 Hi @jerryuhoo , could you double check the issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants