Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit ca35aa7

Browse files
Updating gradient accumulation guidance
1 parent f29b3f8 commit ca35aa7

1 file changed

Lines changed: 5 additions & 0 deletions

File tree

docs/source/advanced.rst

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -145,6 +145,11 @@ Gradient accumulation across iterations
145145
The following should "just work," and properly accommodate multiple models/optimizers/losses, as well as
146146
gradient clipping via the `instructions above`_::
147147

148+
# If your intent is to simulate a larger batch size using gradient accumulation,
149+
# you can divide the loss by the number of accumulation iterations (so that gradients
150+
# will be averaged over that many iterations):
151+
loss = loss/iters_to_accumulate
152+
148153
if iter%iters_to_accumulate == 0:
149154
# Every iters_to_accumulate iterations, unscale and step
150155
with amp.scale_loss(loss, optimizer) as scaled_loss:

0 commit comments

Comments
 (0)