Fix `lr_scheduler` unexpectedly calls `step()` when init argument last_epoch is larger than -1 #149312

zeshengzong · 2025-03-17T11:15:33Z

Changes

Use flag _is_initial to replace self.last_epoch == 0 condition to judge whether lr should be initial value
Add test for ExponentialLR checkpoint usecase

Test Result

pytest -s test/optim/test_lrscheduler.py  -vv

pytorch-bot · 2025-03-17T11:15:37Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/149312

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

CI workflows being skipped on PR

✅ No Failures

As of commit 538d5e0 with merge base 01f226b ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

zeshengzong · 2025-03-18T08:12:57Z

Hello @albanD @janeyx99 , please check whether the fixing is feasible, if it works, I would like to continue fix more schedulers which have same problem, like MultiplicativeLR, LinearLR, thanks!

zeshengzong · 2025-04-14T08:39:27Z

@pytorchbot rebase -b main

pytorchmergebot · 2025-04-14T08:41:00Z

@pytorchbot started a rebase job onto refs/remotes/origin/main. Check the current status here

…t_epoch is larger than -1

pytorchmergebot · 2025-04-14T08:41:04Z

Successfully rebased fix/optim/step onto refs/remotes/origin/main, please pull locally before adding more changes (for example, via git checkout fix/optim/step && git pull --rebase)

janeyx99

This does not look like the right approach. If the discrepancy is for ExponentialLR between get_lr and _get_closed_form_lr, I'd expect the fix to be local there. Could you explain your approach a little bit?

janeyx99 · 2025-05-05T21:24:13Z

test/optim/test_lrscheduler.py

+        optim2 = torch.optim.AdamW(model.parameters())
+        optim2.load_state_dict(optim.state_dict())
+        sch2 = LRClass(optim2, last_epoch=1)
+        self.assertEqual(optim.param_groups[0]["lr"], optim2.param_groups[0]["lr"])


This is not the same comparison as the repro--we should be comparing that the closed form lr is the same as the params group lr?

janeyx99 · 2025-05-06T17:37:26Z

torch/optim/lr_scheduler.py

@@ -724,7 +738,7 @@ def get_lr(self):
        """Compute the learning rate of each parameter group."""
        _warn_get_lr_called_within_step(self)

-        if self.last_epoch == 0:
+        if self._is_initial:


Suggested change

if self._is_initial:

// when loading from a checkpoint, we don't want _initial_step (called from the constructor) to update the lr

// one more step ahead of itself.

if self._is_initial:

janeyx99

Oh actually, I see what you're doing now. Sorry I was confused yesterday. I'm willing to accept this fix if you update the test case.

It would also be good to include a comment about why we prefer the _is_initial.

left newer review

janeyx99 · 2025-05-06T17:48:39Z

torch/optim/lr_scheduler.py

@@ -134,7 +135,8 @@ def wrapper(*args, **kwargs):
    def _initial_step(self):
        """Initialize step counts and perform a step."""


As someone who has looked into LRScheduler more than I've been able to, have you seen a good reason why we need to call .step() from the constructor?

pytorch-bot bot added the release notes: optim label Mar 17, 2025

pytorchbot added the open source label Mar 17, 2025

zeshengzong force-pushed the fix/optim/step branch from 424ac56 to 7c5e79a Compare March 18, 2025 07:59

zeshengzong marked this pull request as ready for review March 18, 2025 08:06

zeshengzong requested review from albanD and janeyx99 as code owners March 18, 2025 08:06

janeyx99 added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Mar 20, 2025

albanD removed their request for review April 9, 2025 19:37

zeshengzong added 2 commits April 14, 2025 08:41

Fix lr_scheduler unexpectedly calls step() when init argument las…

c8b0f1e

…t_epoch is larger than -1

Update

538d5e0

pytorchmergebot force-pushed the fix/optim/step branch from 7c5e79a to 538d5e0 Compare April 14, 2025 08:41

janeyx99 previously requested changes May 5, 2025

View reviewed changes

janeyx99 reviewed May 6, 2025

View reviewed changes

janeyx99 added the topic: bug fixes topic category label May 6, 2025

janeyx99 reviewed May 6, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix `lr_scheduler` unexpectedly calls `step()` when init argument last_epoch is larger than -1 #149312

Fix `lr_scheduler` unexpectedly calls `step()` when init argument last_epoch is larger than -1 #149312

zeshengzong commented Mar 17, 2025 •

edited

Loading

pytorch-bot bot commented Mar 17, 2025 •

edited

Loading

zeshengzong commented Mar 18, 2025

zeshengzong commented Apr 14, 2025

pytorchmergebot commented Apr 14, 2025

pytorchmergebot commented Apr 14, 2025

janeyx99 left a comment •

edited

Loading

janeyx99 May 5, 2025

janeyx99 May 6, 2025

janeyx99 left a comment

janeyx99 May 6, 2025

		@@ -134,7 +135,8 @@ def wrapper(args, *kwargs):
		def _initial_step(self):
		"""Initialize step counts and perform a step."""

Fix lr_scheduler unexpectedly calls step() when init argument last_epoch is larger than -1 #149312

Are you sure you want to change the base?

Fix lr_scheduler unexpectedly calls step() when init argument last_epoch is larger than -1 #149312

Conversation

zeshengzong commented Mar 17, 2025 • edited Loading

Changes

Test Result

pytorch-bot bot commented Mar 17, 2025 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/149312

❗ 1 Active SEVs

✅ No Failures

zeshengzong commented Mar 18, 2025

zeshengzong commented Apr 14, 2025

pytorchmergebot commented Apr 14, 2025

pytorchmergebot commented Apr 14, 2025

janeyx99 left a comment • edited Loading

Choose a reason for hiding this comment

janeyx99 May 5, 2025

Choose a reason for hiding this comment

janeyx99 May 6, 2025

Choose a reason for hiding this comment

janeyx99 left a comment

Choose a reason for hiding this comment

janeyx99 May 6, 2025

Choose a reason for hiding this comment

Fix `lr_scheduler` unexpectedly calls `step()` when init argument last_epoch is larger than -1 #149312

Fix `lr_scheduler` unexpectedly calls `step()` when init argument last_epoch is larger than -1 #149312

zeshengzong commented Mar 17, 2025 •

edited

Loading

pytorch-bot bot commented Mar 17, 2025 •

edited

Loading

janeyx99 left a comment •

edited

Loading