Codestin Search App

minitu · 2023-09-13T16:09:40Z

This PR adds multi_tensor_unscale_l2norm_cuda, which is used to fuse gradient unscaling (with AMP) and L2 norm computation of the gradients.
To retain the original precision of the gradients (especially FP16), unscaling is only accounted for in the norm computation and is not applied to the gradients themselves.

nWEIdia · 2023-09-19T04:07:10Z

We manually verified this PR and it worked. Please go ahead merging this PR.

minitu and others added 5 commits August 22, 2023 15:32

Add multi_tensor_scale_l2norm

52d2218

Merge branch 'NVIDIA:master' into unscale_l2norm_pr

dcbe615

Rename

c099b3e

Add unit test

5df02dd

Fix unit test

9252e93

ptrblck merged commit 741bdf5 into NVIDIA:master Sep 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add multi_tensor_unscale_l2norm_cuda#1727

Add multi_tensor_unscale_l2norm_cuda#1727
ptrblck merged 5 commits into
NVIDIA:masterfrom
minitu:unscale_l2norm_pr

minitu commented Sep 13, 2023

Uh oh!

nWEIdia commented Sep 19, 2023 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

minitu commented Sep 13, 2023

Uh oh!

nWEIdia commented Sep 19, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

nWEIdia commented Sep 19, 2023 •

edited

Loading