Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Loss Turns to NaN After Several Hundred Ticks When Using Mixed-Precision Training #12

@aiihn

Description

@aiihn

Hello,
I've encountered an issue where the loss turns to NaN after several hundred ticks when I enable mixed-precision training using the --fp16=True flag. And I noticed this line in the code:

loss_scaling = 1, # Loss scaling factor for reducing FP16 under/overflows.

I'm wondering if I should also adjust the --ls setting for loss scaling in conjunction with the --fp16=True flag. Could you advise what value the loss scaling should be set to under these conditions?

Additionally, are there other specific settings that should be configured to optimize mixed-precision training? For example, should the "learning rate" be modified together with "loss scaling"?

If possible, could you share the commands or configuration that you typically use for mixed-precision training?

Thank you very much in advance!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions