Thanks to visit codestin.com Credit goes to github.com
We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
1 parent f29b3f8 commit ca35aa7Copy full SHA for ca35aa7
1 file changed
docs/source/advanced.rst
@@ -145,6 +145,11 @@ Gradient accumulation across iterations
145
The following should "just work," and properly accommodate multiple models/optimizers/losses, as well as
146
gradient clipping via the `instructions above`_::
147
148
+ # If your intent is to simulate a larger batch size using gradient accumulation,
149
+ # you can divide the loss by the number of accumulation iterations (so that gradients
150
+ # will be averaged over that many iterations):
151
+ loss = loss/iters_to_accumulate
152
+
153
if iter%iters_to_accumulate == 0:
154
# Every iters_to_accumulate iterations, unscale and step
155
with amp.scale_loss(loss, optimizer) as scaled_loss:
0 commit comments