Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Add SGDR(Stochastic Gradient Descent with Warm Restarts) scheduler #17226

Closed
Kirayue wants to merge 18 commits into
pytorch:masterfrom
Kirayue:new_sgdr
Closed

Add SGDR(Stochastic Gradient Descent with Warm Restarts) scheduler #17226
Kirayue wants to merge 18 commits into
pytorch:masterfrom
Kirayue:new_sgdr

Conversation

@Kirayue
Copy link
Copy Markdown
Contributor

@Kirayue Kirayue commented Feb 18, 2019

Because of merge error with master in #15042, open a new PR for @ezyang.

@Kirayue
Copy link
Copy Markdown
Contributor Author

Kirayue commented Feb 18, 2019

Hi, @ezyang, thank you for helping me on this PR.

Is it possible for you to let me know where I went wrong with merge?

I git fetch upstream (upstream is pytorch/pytorch repo) and git merge upstream/master (at sgdr branch).
There is only one file (actually only one line, import line in the beginning) conflict, so I fix it and git add . because there are lots of new files from the upsteam/master.

Thank you, I really appreciate your help.

Comment thread torch/optim/lr_scheduler.py Outdated
for base_lr in self.base_lrs]

def step(self, epoch=None):
"""Step could be called after every update, i.e. if one epoeh has 10 iterations(num_train / batch_size),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

epoeh should be epoch

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for checking.

@ezyang ezyang requested a review from mrshenli March 11, 2019 20:28
Copy link
Copy Markdown
Contributor

@mrshenli mrshenli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix the lint error as well.

Comment thread test/test_optim.py Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are quite some duplicated code in the four test cases. Can you consolidate them by, say, creating other helper methods or using a loop?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I combined test_sgdr_lr1 and test_sgdr_lr2 to test_sgdr_lr1, so did test_sgdr_lr3 and test_sgdr_lr4. The former tests integer epoch and the latter tests float epoch, so I remain two tests functions. If there are any suggestions, please let me know.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Combining 4 functions to 2 sounds good to me. Thanks for addressing this!

Comment thread torch/optim/lr_scheduler.py Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you mean "could called" -> "should call"?

Comment thread torch/optim/lr_scheduler.py Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be T_i to match the description in the docs above?

Comment thread torch/optim/lr_scheduler.py Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why T_cur is not reset to 0?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, you are right

Comment thread torch/optim/lr_scheduler.py Outdated
Comment thread torch/optim/lr_scheduler.py Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The self.last_epoch is only used in the if branch. Is there any reason for put it here? Is it because users can call step() with and without epoch arg in an interleaved way, so that you want to remember last_epoch when possible?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I consider the interleaved way. it is somewhat impractical. In your opinion, is it better to take it away?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am OK with this API, but we need to explicitly explain it in the docs. Could you please explain this behavior in detail and add it to doc strings. Thanks.

Comment thread torch/optim/lr_scheduler.py Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is T_i the number of epochs in every run (i.e., # of epochs between two warm restarts)? Please add docs to explain it.

Comment thread torch/optim/lr_scheduler.py Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cut -> cur

Copy link
Copy Markdown
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mrshenli has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Comment thread torch/optim/lr_scheduler.py Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need to check the range of these args? For example, what if T_0, T_multi are negative, or eta_min is larger than initial lr (eta_max) ?

Comment thread test/test_optim.py Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

var i is not used, maybe replace it with _? (same for other for loops in this file)

@mrshenli
Copy link
Copy Markdown
Contributor

@pytorchbot retest this please

Copy link
Copy Markdown
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mrshenli has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@mrshenli
Copy link
Copy Markdown
Contributor

@pytorchbot retest this please

Copy link
Copy Markdown
Contributor

@mrshenli mrshenli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for contributing! Below are some nitpicks.

Comment thread test/test_optim.py Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's test a float point number T_mult :)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good, but it made me think about T_mult >= 1 (e.g. 2.5) and T_mult < 1 (e.g. 0.5). I think the latter is not practical, so I modified the range of T_mult, if < 1, raise ValueError.
What do you think?

By the way, because of T_mults could be float point number, so the test cases were slightly changed to test the case like T_i = 62.5

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with you on T_mult >= 1, as the original paper says:

we suggest an option to start with an initially small T_i and increase it by a factor of T_mult at every restart

Comment thread test/test_optim.py Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's be consistent and use T_i

Comment thread torch/optim/lr_scheduler.py Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this resuming or assuming?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I copied the line 23 from the base class _LRScheduler(object). In my opinion, it is resuming. But I think these codes are redundant, the optimizer will be checked by calling super(SGDR, self).__init__(optimizer, last_epoch)

Comment thread torch/optim/lr_scheduler.py Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we add a test for this?

@mrshenli
Copy link
Copy Markdown
Contributor

@pytorchbot rebase this please

Copy link
Copy Markdown
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mrshenli has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@mrshenli
Copy link
Copy Markdown
Contributor

Some tests failed:

Mar 19 10:14:31 ======================================================================
Mar 19 10:14:31 FAIL: test_sgdr_lr1 (__main__.TestLRScheduler)
Mar 19 10:14:31 ----------------------------------------------------------------------
Mar 19 10:14:31 Traceback (most recent call last):
Mar 19 10:14:31   File "test_optim.py", line 817, in test_sgdr_lr1
Mar 19 10:14:31     self._test(scheduler, targets, iters)
Mar 19 10:14:31   File "test_optim.py", line 936, in _test
Mar 19 10:14:31     epoch, target[epoch], param_group['lr']), delta=1e-5)
Mar 19 10:14:31   File "/var/lib/jenkins/workspace/test/common_utils.py", line 469, in assertAlmostEqual
Mar 19 10:14:31     self.assertEqual(x, y, prec, msg, allow_inf)
Mar 19 10:14:31   File "/var/lib/jenkins/workspace/test/common_utils.py", line 461, in assertEqual
Mar 19 10:14:31     super(TestCase, self).assertLessEqual(abs(x - y), prec, message)
Mar 19 10:14:31 AssertionError: 0.0499999999 not less than or equal to 1e-05 : LR is wrong in epoch 35: expected 0.05, got 1e-10

Copy link
Copy Markdown
Contributor

@mrshenli mrshenli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix failures introduced by the new tests.

@Kirayue
Copy link
Copy Markdown
Contributor Author

Kirayue commented Mar 20, 2019

Hi, @mrshenli
I passed the tests, how can I reproduce the failure cases?

....................................................
----------------------------------------------------------------------
Ran 52 tests in 13.512s

OK

Comment thread torch/optim/lr_scheduler.py Outdated
Copy link
Copy Markdown
Contributor

@mrshenli mrshenli Mar 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be caused by this line. If epoch % self.T_0 != 0 and they are both int, it will drop the residual.

Copy link
Copy Markdown
Contributor

@mrshenli mrshenli Mar 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, maybe not, if that is the case, it should fail everywhere ignore me please, I got it wrong

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I traced this part of codes again, I think n = 2, T_i = 22.5 and T_cur = 0. I have no idea why we got a different result.

Copy link
Copy Markdown
Contributor

@mrshenli mrshenli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error msg expected 0.05, got 1e-10 suggests it occurs on restart boundary. See comments below.

Comment thread torch/optim/lr_scheduler.py Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this line necessary?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are right, it is redundant.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I double check the results. It's necessary, because of the interleaved usage e.g. call step() for 100 times and call step(10), we need to reset the T_i. But for the epoch < T_0, it's redundant.

Comment thread torch/optim/lr_scheduler.py Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be my suggestion introduced numerical instability. It makes me rethink whether it makes sense to have non-integer T_i. Any thought? @Kirayue

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not restricted to the integer from my opinion and the origin paper. But if we can make sure that the failure is caused by numerical instability, I can modify the code.

By the way, how can I reproduce the failure? (I tested it in python3, it works.)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @yf225 is there a way to reproduce CI test failures for a specific environment? It does not have to be local, as long as @Kirayue can iterate with that conveniently.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mrshenli Sorry for the unfortunate update: I just found out that currently we don't have a way to make the CI Docker images public, and we cannot share the credentials to access the Docker images in ECR either. I am working on a solution at #18244.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yf225 Thanks for letting us know.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Kirayue sorry for the delay on this - would you like to send us your email address so that we can share the credentials for read-only access to the Docker images in ECR? Thanks!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not worry about it 😄
Sure, just use this email address [email protected]

Comment thread torch/optim/lr_scheduler.py Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This scheduler is not tied to SGD but can also be effectively used for other optimizers such as Adam, so although it's in the name of the paper, I find the name SGDR misleading. Why not call it CosineAnnealingWR or CosineAnnealingWarmRestarts?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are right, that's fine with me
What do you think, @mrshenli ?

Copy link
Copy Markdown
Contributor

@mrshenli mrshenli Mar 29, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me too.

Curious, there is a CosineAnnealingLR which implements the SGDR without warm restarts. Do you know what does LR mean here? (learning rate? less restart?) I just want to make sure if we go with CosineAnnealingWR, the naming does not confuse people.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CosineAnnealingWarmRestarts sounds better to me.

Copy link
Copy Markdown
Contributor Author

@Kirayue Kirayue Mar 29, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi, @mrshenli
It's a kind of learning rate policy acoording to Cyclical Learning Rates for Training Neural Networks. So, in my opion, it's learning rate.

This observation leads to the idea of letting the learning rate vary within a range of values rather than adopting a stepwise fixed or exponentially decreasing value. That is, one sets minimum and maximum boundaries and the learning rate cyclically varies between these bounds. Experiments with numerous functional forms, such as a triangular window (linear), a Welch window (parabolic) and a Hann window (sinusoidal) all produced equivalent results

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @mrshenli
how about I push a version of integer T_i, I will test the CI error after #18244 is done?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds good to me. Please edit the doc and arg check accordingly. I will approve and merge. Thanks!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @mrshenli
There two ways to convert T_i to an integer. Because T_i equals to T_{i - 1} * T_mult, we can use int(T_mult) or int(T_{i - 1} * T_mult) to force T_i to be an integer. However, the latter would case a inconsistency, for example, let T_0 = 10, T_mult = 1.5, if we call scheduler.step() for 100 times, and call scheduler.step(100) the T_i would be different.
The T_i for the example would be [10, 15, 22, 33, 49] and 50.625 (50 after int()) for the formula result. So for simplicity, I choose the former way.

Comment thread torch/optim/lr_scheduler.py Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:math:\T_{i} -> :math:T_{i}?

Comment thread torch/optim/lr_scheduler.py Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we do an explicit type check, and raise an error if it is not an int? This also applies to T_0. We might want to avoid silently surprising people even if they did pay attention to the doc.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, thank you for the suggestions.

@mrshenli
Copy link
Copy Markdown
Contributor

The errors look irrelevant to your changes. Could you please try rebase and test again?

@Kirayue
Copy link
Copy Markdown
Contributor Author

Kirayue commented Apr 24, 2019

@mrshenli
You mean fetch pytorch:master and rebase on it?

@mrshenli
Copy link
Copy Markdown
Contributor

@Kirayue yes, rebase to the current master please.

Copy link
Copy Markdown
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mrshenli has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Copy Markdown
Contributor

@mrshenli mrshenli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for contributing!!

@Kirayue
Copy link
Copy Markdown
Contributor Author

Kirayue commented Apr 25, 2019

@mrshenli
Thank you for the discussions and suggestions on this PR 😄

@Kirayue Kirayue closed this Apr 25, 2019
@Kirayue Kirayue reopened this Apr 25, 2019
@Kirayue
Copy link
Copy Markdown
Contributor Author

Kirayue commented Apr 25, 2019

Sorry, should I close this PR?

@mrshenli
Copy link
Copy Markdown
Contributor

@Kirayue I am landing this PR. It will be closed automatically in a moment. :)

@facebook-github-bot
Copy link
Copy Markdown
Contributor

@mrshenli merged this pull request in af06d63.

zhangguanheng66 pushed a commit to zhangguanheng66/pytorch that referenced this pull request May 6, 2019
…ytorch#17226)

Summary:
Because of merge error with master in pytorch#15042, open a new PR for ezyang.
Pull Request resolved: pytorch#17226

Differential Revision: D14418145

Pulled By: mrshenli

fbshipit-source-id: 099ba225b28e6aba71760b81b2153ad1c40fbaae
laurentdupin pushed a commit to laurentdupin/pytorch that referenced this pull request Apr 24, 2026
…ytorch#17226)

Summary:
Because of merge error with master in pytorch#15042, open a new PR for ezyang.
Pull Request resolved: pytorch#17226

Differential Revision: D14418145

Pulled By: mrshenli

fbshipit-source-id: 099ba225b28e6aba71760b81b2153ad1c40fbaae
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants