Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Fix the Division by Zero Bug of CosineAnnealingLR#19180

Closed
chandlerzuo wants to merge 15 commits into
pytorch:masterfrom
chandlerzuo:master
Closed

Fix the Division by Zero Bug of CosineAnnealingLR#19180
chandlerzuo wants to merge 15 commits into
pytorch:masterfrom
chandlerzuo:master

Conversation

@chandlerzuo
Copy link
Copy Markdown
Contributor

@chandlerzuo chandlerzuo commented Apr 12, 2019

Added the formula for the corner case. Updated unit tests.

Fixes #17913

@chandlerzuo
Copy link
Copy Markdown
Contributor Author

If this is approved, we can abandon #19132

@ezyang
Copy link
Copy Markdown
Contributor

ezyang commented Apr 12, 2019

I didn't do an in-depth math check (yet), but the preliminary shape looks good to me. cc @ptrblck @Kindpire

@chandlerzuo
Copy link
Copy Markdown
Contributor Author

chandlerzuo commented Apr 12, 2019

One thing I should mention is that, there are multiple induction formulas we can use for cosine annealing LR while still maintaining BC. Basically, the learning rate can be decomposed as constant + cyclic function of t. We have different choices for the constant here; it can be \eta_min, \eta_max, (\eta_min+\eta_max)/2, or something else. Depending on the choice for the constant, we can work out the corresponding induction formula based on the remaining cyclic function.

Solution here treats \eta_min as the constant. I am not sure if it is the best one, but I think it is reasonable.

@ezyang
Copy link
Copy Markdown
Contributor

ezyang commented Apr 15, 2019

@pytorchbot rebase this please

\eta_{t+1} = \eta_{min} + (\eta_t - \eta_{min})\frac{1 +
\cos(\frac{T_{cur+1}}{T_{max}}\pi)}{1 + \cos(\frac{T_{cur}}{T_{max}}\pi)}
\cos(\frac{T_{cur+1}}{T_{max}}\pi)}{1 + \cos(\frac{T_{cur}}{T_{max}}\pi)},
T_{cur} \neq (2k+1)T_{max};\\
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verified this condition. It's so confusing that we use a different cur+1 convention in the code and math lol.

return [group['lr'] + (base_lr - self.eta_min) *
(1 - math.cos(math.pi / self.T_max)) / 2
for base_lr, group in
zip(self.base_lrs, self.optimizer.param_groups)]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I spent a while staring at this for a while and couldn't figure out if the equation was right. It must be right, since the tests pass. Would you mind saying a little more about the derivation here?

Copy link
Copy Markdown
Contributor Author

@chandlerzuo chandlerzuo Apr 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

latex_4a12231ae15a2d9abc1c42fea250a362

Here, I further replace \eta_{min} in the first term by group['lr'], so that if another scheduler modifies lr simultaneously, its effect would be passed through group['lr'].

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, clears it up!

Copy link
Copy Markdown
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ezyang is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Copy Markdown
Contributor

@ezyang merged this pull request in e3f1504.

zhangguanheng66 pushed a commit to zhangguanheng66/pytorch that referenced this pull request May 6, 2019
Summary:
Added the formula for the corner case. Updated unit tests.

Fixes pytorch#17913
Pull Request resolved: pytorch#19180

Differential Revision: D14942023

Pulled By: ezyang

fbshipit-source-id: 167c109b97a7830d5b24541dc91e4788d531feec
laurentdupin pushed a commit to laurentdupin/pytorch that referenced this pull request Apr 24, 2026
Summary:
Added the formula for the corner case. Updated unit tests.

Fixes pytorch#17913
Pull Request resolved: pytorch#19180

Differential Revision: D14942023

Pulled By: ezyang

fbshipit-source-id: 167c109b97a7830d5b24541dc91e4788d531feec
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug in CosineAnnealingLR (division by zero)

3 participants