Implement Automatic Mixed Precision with GradScaler to Address NaN Loss Issues #13

aiihn · 2024-08-26T04:00:16Z

Description

This pull request addresses the issue of NaN losses occurring during mixed-precision training with --fp16 enabled (#12).

Key Changes

Integrated torch.cuda.amp.GradScaler to dynamically adjust loss scaling.
Replaced the manual loss scaling approach. Note: GradScaler will override the loss_scale set manually by --ls.

Usage

Use --fp16=True along with --enable_gradscaler=True. For example, below is the mixed-training command modified from run_ecm_1hour.sh.

torchrun --nnodes=1 --nproc_per_node=1 --rdzv_backend=c10d --rdzv_endpoint=localhost:9901 ct_train.py  \
    --outdir=ct-runs --data=datasets/cifar10-32x32.zip  \
    --cond=0 --arch=ddpmpp --metrics=fid50k_full        \
    --transfer=https://nvlabs-fi-cdn.nvidia.com/edm/pretrained/edm-cifar10-32x32-uncond-vp.pkl    \
    --duration=25.6 --tick=12.8 --double=250 --batch=128 --lr=0.0001 --optim=RAdam --dropout=0.2 --augment=0.0 \
    -q 256 --double 10000 --ema_beta 0.9993 --eval_every 80 --dump 80     \
    --desc bs128.200k \
    --fp16=True --enable_gradscaler=True

The FID records obtained using the above command are shown in the following images:

…ling

Gsunshine

Merge AMP via Gradscalar into ECT.

Gsunshine · 2024-08-27T07:25:09Z

ct_train.py

 @click.option('--fp16',          help='Enable mixed-precision training', metavar='BOOL',            type=bool, default=False, show_default=True)
 @click.option('--tf32',          help='Enable tf32 for A100/H100 training speed', metavar='BOOL',   type=bool, default=False, show_default=True)
 @click.option('--ls',            help='Loss scaling', metavar='FLOAT',                              type=click.FloatRange(min=0, min_open=True), default=1, show_default=True)
+@click.option('--enable_gradscaler', help='Enable torch.cuda.amp.GradScaler, NOTE overwritting loss_scale set by --ls', metavar='BOOL', type=bool, default=False, show_default=True)


Hi Zixiang @aiihn ,

Thanks for your neat PR!

Would it be better to use a short abbreviation like amp as the option name? AMP already stands for Automatic Mixed Precision.

Gsunshine · 2024-08-27T07:28:05Z

training/ct_training_loop.py

+        if enable_gradscaler:
+            if 'gradscaler_state' in data:
+                dist.print0(f'Loading GradScaler state from "{resume_state_dump}"...')
+                # Although not loading the state_dict of the GradScaler works well, loading it can improve reproducibility.


Gotcha. Thanks for the comments!

Gsunshine · 2024-08-27T07:36:40Z

training/ct_training_loop.py

+            scaler.step(optimizer)
+            scaler.update()
+        else:
+            # Update weights.


TODO is also unclear to me either. It seems still useful and compatible per Claude.

It's fine to remove my commented code for lr rampup.

Gsunshine · 2024-09-23T22:43:52Z

Hi @aiihn ,

Thank you again for your PR! I had another AMP implementation that could also be helpful for ECT. I’ll check it out later and test torch.autocast, bu feel free to take a look if you’re working with mixed precision!

Links for reference:
https://github.com/locuslab/torchdeq/blob/main/deq-zoo/deq-flow/main.py
https://github.com/locuslab/torchdeq/blob/main/deq-zoo/deq-flow/core/deq_flow.py

Cheers,
Zhengyang

Enable torch.cuda.amp.GradScaler to automatically adjust the loss sca…

2beb14a

…ling

Gsunshine self-requested a review September 6, 2024 06:03

Gsunshine approved these changes Sep 23, 2024

View reviewed changes

Gsunshine merged commit f8cdf75 into locuslab:main Sep 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement Automatic Mixed Precision with GradScaler to Address NaN Loss Issues #13

Implement Automatic Mixed Precision with GradScaler to Address NaN Loss Issues #13

Uh oh!

aiihn commented Aug 26, 2024 •

edited

Loading

Uh oh!

Gsunshine left a comment

Uh oh!

Gsunshine Aug 27, 2024

Uh oh!

Gsunshine Aug 27, 2024

Uh oh!

Gsunshine Aug 27, 2024

Uh oh!

Gsunshine commented Sep 23, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Implement Automatic Mixed Precision with GradScaler to Address NaN Loss Issues #13

Implement Automatic Mixed Precision with GradScaler to Address NaN Loss Issues #13

Uh oh!

Conversation

aiihn commented Aug 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Key Changes

Usage

Uh oh!

Gsunshine left a comment

Choose a reason for hiding this comment

Uh oh!

Gsunshine Aug 27, 2024

Choose a reason for hiding this comment

Uh oh!

Gsunshine Aug 27, 2024

Choose a reason for hiding this comment

Uh oh!

Gsunshine Aug 27, 2024

Choose a reason for hiding this comment

Uh oh!

Gsunshine commented Sep 23, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

aiihn commented Aug 26, 2024 •

edited

Loading