discriminator weights

Hi, there, just read about your fantastic paper and tried your open source model. The results seem quite promising! Congrats!

I've seen an zero-shot VC speaker-sim score of 0.78, which is even close or better than a fine-tuned version of previous technologies. Now I am trying to fine-tune using my own data! Based on some short experiments, it seems like only fine-tuning the DiT module may not be sufficient to further improve the similarity to a score higher than 0.8. 

I noticed that perhaps I haven't fine-tuned the discriminator or the vocos decoder. However, it seems the weights of discriminator are not public yet, is that true? Did I miss something? If would be awesome if you could share those weights.

BTW, what would be the threshold to select a proper prompt utterance of the same speaker? Currently I am using 0.8. Any suggestions?

Many thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

discriminator weights #14

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

discriminator weights #14

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions