You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello,
I've encountered an issue where the loss turns to NaN after several hundred ticks when I enable mixed-precision training using the --fp16=True flag. And I noticed this line in the code:
loss_scaling=1, # Loss scaling factor for reducing FP16 under/overflows.
I'm wondering if I should also adjust the --ls setting for loss scaling in conjunction with the --fp16=True flag. Could you advise what value the loss scaling should be set to under these conditions?
Additionally, are there other specific settings that should be configured to optimize mixed-precision training? For example, should the "learning rate" be modified together with "loss scaling"?
If possible, could you share the commands or configuration that you typically use for mixed-precision training?