-
Notifications
You must be signed in to change notification settings - Fork 26
Open
Description
While fine-tuning works as expected, doing regular training with a dataset that isn't LJSpeech would eventually cause a NaN loss at some point.
The culprit appears to be the following line, which causes a division by zero if wav happens to contain perfect silence:
Line 106 in 374a456
| wav = flip * gain * wav / wav.abs().max() |
I'm not sure what the best solution for this would be, as a quick fix I simply clipped the divisor so it can't reach zero:
wav = flip * gain * wav / max([wav.abs().max(), 0.001])
sajattack and jojonki
Metadata
Metadata
Assignees
Labels
No labels