-
Notifications
You must be signed in to change notification settings - Fork 129
Description
the required spectrogram form is like [N,C,W].
spectrogram = # get your hands on a spectrogram in [N,C,W] format
could you please explain these three dimensions?
I use the code from this repo: https://github.com/CorentinJ/Real-Time-Voice-Cloning to produce the mel spectrogram and use diffwave as the vocoder. But I only get the audio full of noises.
generate mel spectrogram
specs = synthesizer.synthesize_spectrograms(texts, embeds) #len(specs) == 1
spec = specs[0] #spec numpy.array, float32, shape(80, 314)
spec = torch.tensor(spec)
Generating the waveform
diffwave_dir = "/hdd/haoran_project/diffwave-master/pretrained_models/diffwave-ljspeech-22kHz-1000578.pt"
generated_wav, sample_rate = diffwave_predict(spec, diffwave_dir, fast_sampling=True)
Save it on the disk
filename = "results/diffwave_Elon.wav"
print(generated_wav.dtype, " ",generated_wav.shape) # torch.float32 torch.Size([1, 87040])
torchaudio.save(filename, generated_wav.cpu(), sample_rate=sample_rate)