Codestin Search App

Kirayue · 2019-02-18T02:13:29Z

Because of merge error with master in #15042, open a new PR for @ezyang.

Kirayue · 2019-02-18T02:24:09Z

Hi, @ezyang, thank you for helping me on this PR.

Is it possible for you to let me know where I went wrong with merge?

I git fetch upstream (upstream is pytorch/pytorch repo) and git merge upstream/master (at sgdr branch).
There is only one file (actually only one line, import line in the beginning) conflict, so I fix it and git add . because there are lots of new files from the upsteam/master.

Thank you, I really appreciate your help.

Zhaoyi-Yan · 2019-02-19T02:03:10Z

+                for base_lr in self.base_lrs]
+
+    def step(self, epoch=None):
+        """Step could be called after every update, i.e. if one epoeh has 10 iterations(num_train / batch_size),


epoeh should be epoch

Thanks for checking.

mrshenli

Please fix the lint error as well.

mrshenli · 2019-03-12T01:45:46Z

There are quite some duplicated code in the four test cases. Can you consolidate them by, say, creating other helper methods or using a loop?

I combined test_sgdr_lr1 and test_sgdr_lr2 to test_sgdr_lr1, so did test_sgdr_lr3 and test_sgdr_lr4. The former tests integer epoch and the latter tests float epoch, so I remain two tests functions. If there are any suggestions, please let me know.

Combining 4 functions to 2 sounds good to me. Thanks for addressing this!

mrshenli · 2019-03-12T02:03:36Z

you mean "could called" -> "should call"?

mrshenli · 2019-03-12T02:27:03Z

Should it be T_i to match the description in the docs above?

mrshenli · 2019-03-12T02:36:06Z

why T_cur is not reset to 0?

yes, you are right

mrshenli · 2019-03-12T02:51:33Z

The self.last_epoch is only used in the if branch. Is there any reason for put it here? Is it because users can call step() with and without epoch arg in an interleaved way, so that you want to remember last_epoch when possible?

yes, I consider the interleaved way. it is somewhat impractical. In your opinion, is it better to take it away?

I am OK with this API, but we need to explicitly explain it in the docs. Could you please explain this behavior in detail and add it to doc strings. Thanks.

mrshenli · 2019-03-12T02:53:59Z

Is T_i the number of epochs in every run (i.e., # of epochs between two warm restarts)? Please add docs to explain it.

mrshenli · 2019-03-12T02:54:16Z

facebook-github-bot

@mrshenli has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

mrshenli · 2019-03-12T04:02:03Z

Do you need to check the range of these args? For example, what if T_0, T_multi are negative, or eta_min is larger than initial lr (eta_max) ?

mrshenli · 2019-03-12T20:27:01Z

var i is not used, maybe replace it with _? (same for other for loops in this file)

mrshenli · 2019-03-14T18:08:27Z

@pytorchbot retest this please

facebook-github-bot

@mrshenli has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

mrshenli · 2019-03-17T19:39:25Z

@pytorchbot retest this please

mrshenli

LGTM! Thanks for contributing! Below are some nitpicks.

mrshenli · 2019-03-17T19:52:56Z

Let's test a float point number T_mult :)

Good, but it made me think about T_mult >= 1 (e.g. 2.5) and T_mult < 1 (e.g. 0.5). I think the latter is not practical, so I modified the range of T_mult, if < 1, raise ValueError.
What do you think?

By the way, because of T_mults could be float point number, so the test cases were slightly changed to test the case like T_i = 62.5

I agree with you on T_mult >= 1, as the original paper says:

we suggest an option to start with an initially small T_i and increase it by a factor of T_mult at every restart

mrshenli · 2019-03-17T19:54:08Z

Let's be consistent and use T_i

mrshenli · 2019-03-17T20:06:24Z

is this resuming or assuming?

I copied the line 23 from the base class _LRScheduler(object). In my opinion, it is resuming. But I think these codes are redundant, the optimizer will be checked by calling super(SGDR, self).__init__(optimizer, last_epoch)

mrshenli · 2019-03-17T20:11:03Z

Shall we add a test for this?

mrshenli · 2019-03-18T01:47:42Z

@pytorchbot rebase this please

facebook-github-bot

@mrshenli has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

mrshenli · 2019-03-19T14:50:39Z

Some tests failed:

Mar 19 10:14:31 ======================================================================
Mar 19 10:14:31 FAIL: test_sgdr_lr1 (__main__.TestLRScheduler)
Mar 19 10:14:31 ----------------------------------------------------------------------
Mar 19 10:14:31 Traceback (most recent call last):
Mar 19 10:14:31   File "test_optim.py", line 817, in test_sgdr_lr1
Mar 19 10:14:31     self._test(scheduler, targets, iters)
Mar 19 10:14:31   File "test_optim.py", line 936, in _test
Mar 19 10:14:31     epoch, target[epoch], param_group['lr']), delta=1e-5)
Mar 19 10:14:31   File "/var/lib/jenkins/workspace/test/common_utils.py", line 469, in assertAlmostEqual
Mar 19 10:14:31     self.assertEqual(x, y, prec, msg, allow_inf)
Mar 19 10:14:31   File "/var/lib/jenkins/workspace/test/common_utils.py", line 461, in assertEqual
Mar 19 10:14:31     super(TestCase, self).assertLessEqual(abs(x - y), prec, message)
Mar 19 10:14:31 AssertionError: 0.0499999999 not less than or equal to 1e-05 : LR is wrong in epoch 35: expected 0.05, got 1e-10

mrshenli

Please fix failures introduced by the new tests.

Kirayue · 2019-03-20T02:08:45Z

Hi, @mrshenli
I passed the tests, how can I reproduce the failure cases?

....................................................
----------------------------------------------------------------------
Ran 52 tests in 13.512s

OK

mrshenli · 2019-03-20T03:19:57Z

~~Could be caused by this line. If epoch % self.T_0 != 0 and they are both int, it will drop the residual.~~

~~oh, maybe not, if that is the case, it should fail everywhere~~ ignore me please, I got it wrong

I traced this part of codes again, I think n = 2, T_i = 22.5 and T_cur = 0. I have no idea why we got a different result.

mrshenli

The error msg expected 0.05, got 1e-10 suggests it occurs on restart boundary. See comments below.

mrshenli · 2019-03-20T03:35:41Z

is this line necessary?

you are right, it is redundant.

Sorry, I double check the results. It's necessary, because of the interleaved usage e.g. call step() for 100 times and call step(10), we need to reset the T_i. But for the epoch < T_0, it's redundant.

mrshenli · 2019-03-20T03:51:40Z

It could be my suggestion introduced numerical instability. It makes me rethink whether it makes sense to have non-integer T_i. Any thought? @Kirayue

It is not restricted to the integer from my opinion and the origin paper. But if we can make sure that the failure is caused by numerical instability, I can modify the code.

By the way, how can I reproduce the failure? (I tested it in python3, it works.)

Hi @yf225 is there a way to reproduce CI test failures for a specific environment? It does not have to be local, as long as @Kirayue can iterate with that conveniently.

@mrshenli Sorry for the unfortunate update: I just found out that currently we don't have a way to make the CI Docker images public, and we cannot share the credentials to access the Docker images in ECR either. I am working on a solution at #18244.

@yf225 Thanks for letting us know.

@Kirayue sorry for the delay on this - would you like to send us your email address so that we can share the credentials for read-only access to the Docker images in ECR? Thanks!

Do not worry about it 😄
Sure, just use this email address [email protected]

mdraw · 2019-03-28T16:37:52Z

This scheduler is not tied to SGD but can also be effectively used for other optimizers such as Adam, so although it's in the name of the paper, I find the name SGDR misleading. Why not call it CosineAnnealingWR or CosineAnnealingWarmRestarts?

you are right, that's fine with me
What do you think, @mrshenli ?

Sounds good to me too.

Curious, there is a CosineAnnealingLR which implements the SGDR without warm restarts. Do you know what does LR mean here? (learning rate? less restart?) I just want to make sure if we go with CosineAnnealingWR, the naming does not confuse people.

CosineAnnealingWarmRestarts sounds better to me.

hi, @mrshenli
It's a kind of learning rate policy acoording to Cyclical Learning Rates for Training Neural Networks. So, in my opion, it's learning rate.

This observation leads to the idea of letting the learning rate vary within a range of values rather than adopting a stepwise fixed or exponentially decreasing value. That is, one sets minimum and maximum boundaries and the learning rate cyclically varies between these bounds. Experiments with numerous functional forms, such as a triangular window (linear), a Welch window (parabolic) and a Hann window (sinusoidal) all produced equivalent results

Hi, @mrshenli
how about I push a version of integer T_i, I will test the CI error after #18244 is done?

That sounds good to me. Please edit the doc and arg check accordingly. I will approve and merge. Thanks!

Hi, @mrshenli
There two ways to convert T_i to an integer. Because T_i equals to T_{i - 1} * T_mult, we can use int(T_mult) or int(T_{i - 1} * T_mult) to force T_i to be an integer. However, the latter would case a inconsistency, for example, let T_0 = 10, T_mult = 1.5, if we call scheduler.step() for 100 times, and call scheduler.step(100) the T_i would be different.
The T_i for the example would be [10, 15, 22, 33, 49] and 50.625 (50 after int()) for the formula result. So for simplicity, I choose the former way.

mrshenli · 2019-04-18T14:10:38Z

:math:\T_{i} -> :math:T_{i}?

mrshenli · 2019-04-18T14:18:52Z

Shall we do an explicit type check, and raise an error if it is not an int? This also applies to T_0. We might want to avoid silently surprising people even if they did pay attention to the doc.

Ok, thank you for the suggestions.

mrshenli · 2019-04-24T15:34:53Z

The errors look irrelevant to your changes. Could you please try rebase and test again?

Kirayue · 2019-04-24T16:45:25Z

@mrshenli
You mean fetch pytorch:master and rebase on it?

mrshenli · 2019-04-24T18:05:01Z

@Kirayue yes, rebase to the current master please.

step method could be called after every update

facebook-github-bot

@mrshenli has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

mrshenli

Thanks for contributing!!

Kirayue · 2019-04-25T15:57:15Z

@mrshenli
Thank you for the discussions and suggestions on this PR 😄

Kirayue · 2019-04-25T15:58:23Z

Sorry, should I close this PR?

mrshenli · 2019-04-25T16:01:35Z

@Kirayue I am landing this PR. It will be closed automatically in a moment. :)

facebook-github-bot · 2019-04-25T19:07:08Z

@mrshenli merged this pull request in af06d63.

…ytorch#17226) Summary: Because of merge error with master in pytorch#15042, open a new PR for ezyang. Pull Request resolved: pytorch#17226 Differential Revision: D14418145 Pulled By: mrshenli fbshipit-source-id: 099ba225b28e6aba71760b81b2153ad1c40fbaae

Zhaoyi-Yan reviewed Feb 19, 2019

View reviewed changes

ezyang requested a review from mrshenli March 11, 2019 20:28

mrshenli reviewed Mar 12, 2019

View reviewed changes

facebook-github-bot reviewed Mar 12, 2019

View reviewed changes

mrshenli reviewed Mar 12, 2019

View reviewed changes

facebook-github-bot reviewed Mar 14, 2019

View reviewed changes

mrshenli approved these changes Mar 17, 2019

View reviewed changes

facebook-github-bot reviewed Mar 18, 2019

View reviewed changes

mrshenli requested changes Mar 19, 2019

View reviewed changes

mrshenli reviewed Mar 20, 2019

View reviewed changes

yf225 mentioned this pull request Mar 20, 2019

Provide S3 download link for Docker images #18244

Closed

mdraw reviewed Mar 28, 2019

View reviewed changes

mrshenli reviewed Apr 18, 2019

View reviewed changes

Kirayue added 6 commits April 25, 2019 17:24

Add SGDR(Stochastic Gradient Descent with Warm Restarts) scheduler

0bf32c4

fix 428, 453 line too long

7d19071

fix trailing whitespace

60646a8

Refine SGDR to more general usage

185a27a

step method could be called after every update

Add a test for SGDR in test_optim.py

878c5a8

Fix lint error

f447cb8

Kirayue and others added 12 commits April 25, 2019 17:26

Fix lr_scheduler.py line too long

4bbff6c

Fix missing whitespace around operator

9992be7

Fix super() takes at least 1 argument in lr_scheduler.py

1ebfdcc

Remove numpy package from test/test_optim.py

c1aa1ac

Correct typo epoeh => epoch

adf13fe

Refine code and doctring

d8a06cc

Fix lint error mixed spaces and tabs

327fc97

Fix lint mistake: E302 in test_optim.py

dc357f0

Add tests for testing decimal T_mult and interleaved usage

0c80d9e

Rename SGDR to CosineAnnealingWarmRestarts

4db677a

Modify doc and tests for integer T_mult

27a9e3c

Add explicit int type check to T_0, T_i

8653839

facebook-github-bot reviewed Apr 25, 2019

View reviewed changes

mrshenli approved these changes Apr 25, 2019

View reviewed changes

Kirayue closed this Apr 25, 2019

Kirayue reopened this Apr 25, 2019

facebook-github-bot closed this in af06d63 Apr 25, 2019

facebook-github-bot added the merged label Apr 25, 2019

jcreinhold mentioned this pull request May 1, 2019

CosineAnnealingWarmRestarts documentation poor and not appearing #20028

Closed

ezyang added the open source label Jun 24, 2019

Conversation

Kirayue commented Feb 18, 2019

Uh oh!

Kirayue commented Feb 18, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mrshenli left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mrshenli commented Mar 14, 2019

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

mrshenli commented Mar 17, 2019

Uh oh!

mrshenli left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mrshenli commented Mar 18, 2019

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

mrshenli commented Mar 19, 2019

Uh oh!

mrshenli left a comment

Choose a reason for hiding this comment

Uh oh!

Kirayue commented Mar 20, 2019

mrshenli left a comment •

edited

Loading

mrshenli Mar 20, 2019 •

edited

Loading

mrshenli Mar 20, 2019 •

edited

Loading