Force upsample to be float32 #121324

bhack · 2024-03-06T18:21:21Z

BC-breaking note:
This is not technically bc-breaking any behavior but will lead to an expected significant performance change for amp + deterministic.
cc @zou3519 any way we can tell people how to register a fallthrough for that key to recover the old behavior if they want?

cc @mcarilli @ptrblck @leslie-fang-intel @jgong5 @voznesenskym @penguinwu @EikanWang @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @aakhundov @ColinPeppler @amjames @desertfire @chauhang

pytorch-bot · 2024-03-06T18:21:26Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/121324

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (2 Unrelated Failures)

As of commit 04a831d with merge base 47330ca ():

FLAKY - The following job failed but was likely due to flakiness present on trunk:

pull / linux-focal-py3.8-clang10 / test (dynamo, 1, 3, linux.2xlarge) (gh)

BROKEN TRUNK - The following job failed but was present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

pull / linux-focal-py3_8-clang9-xla / test (xla, 1, 1, linux.12xlarge) (gh) (trunk failure)
test_resnet18

This comment was automatically generated by Dr. CI and updates every 15 minutes.

bhack · 2024-03-06T18:22:44Z

/cc @ezyang @albanD

aten/src/ATen/autocast_mode.cpp

albanD

Let's see what CI thinks, but we should also add a test for this!

ezyang · 2024-03-07T22:32:37Z

@pytorchbot merge

pytorchmergebot · 2024-03-07T22:34:23Z

Merge failed

Reason: This PR needs a release notes: label
If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Details for Dev Infra team

Raised by workflow job

bhack · 2024-03-07T22:52:18Z

About the user facing doc I don't see that we had upsample in the list right?
https://github.com/pytorch/pytorch/blob/main/docs%2Fsource%2Famp.rst

Is this probably cause that list is not auto-generated and it is out of sync?

bhack · 2024-03-12T02:18:36Z

Green lights here.

ezyang · 2024-03-13T03:33:28Z

@pytorchbot merge

pytorchmergebot · 2024-03-13T03:36:05Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2024-03-13T04:02:56Z

Merge failed

Reason: 12 mandatory check(s) failed. The first few are:

Dig deeper by viewing the failures on hud

Details for Dev Infra team

Raised by workflow job

Failing merge rule: Core Maintainers

bhack · 2024-03-13T10:59:25Z

Are these errors related to this PR?

bhack · 2024-03-13T18:16:50Z

/cc @nWEIdia

nWEIdia · 2024-03-13T18:49:40Z

ImportError: /opt/conda/envs/py_3.8/lib/libstdc++.so.6: version `GLIBCXX_3.4.30' not found (required by /opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
The error seems to be build env issues.

bhack · 2024-03-13T18:54:41Z

@nWEIdia Yes it is not related but I was just asking if you have a specific opinion at Nvidia about mixed precision float16 or bfloat in upsample kernels related to real model impact (see #121072).

bhack · 2024-04-22T23:29:12Z

Thanks @albanD the question was general not only about the refactoring as it is not clear if these ops are still impacted by the amp float32 enforcing machinery when lowered/compiled.

bhack · 2024-04-23T16:51:41Z

@albanD It was adapted to the mentioned refactoring.

aten/src/ATen/autocast_mode.cpp

albanD

Thanks!

Atomic add changes were properly removed

albanD · 2024-04-23T22:49:35Z

@pytorchbot merge

pytorchmergebot · 2024-04-23T22:51:12Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2024-04-24T04:50:01Z

The merge job was canceled. If you believe this is a mistake, then you can re trigger it through pytorch-bot.

albanD · 2024-04-24T23:12:38Z

@pytorchbot merge

pytorchmergebot · 2024-04-24T23:14:28Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Fixes pytorch#121072 Pull Request resolved: pytorch#121324 Approved by: https://github.com/albanD

xuzhao9 · 2024-04-26T13:49:13Z

This diff regresses torchbench pytorch_unet by 10%: pytorch/benchmark#2247

albanD · 2024-04-26T13:52:46Z

Regressed in what sense? Speed or numerical?

xuzhao9 · 2024-04-26T14:36:46Z

Regressed in what sense? Speed or numerical?

It regressed in latency ~10% and peak gpu memory usage ~40%. In pytorch/benchmark CI we only run eager mode (PT1) so there is no numerical metric.

albanD · 2024-04-26T15:12:40Z

I guess this is expected behavior given that we now upcast this op. cc @ezyang is that a regression that is problematic?

ezyang · 2024-04-26T17:28:25Z

This is an explicit correctness/speed tradeoff here. @bhack, is the memory usage regression you were expecting to see in this case?

bhack · 2024-04-26T17:43:21Z

A regression is expected for sure as we are working at float32 also with amp on the upsampling.
My points were:

Are we forcing this also when compiled?
is the lowering going to let us to still use flaot16/bfloat?

Cause the gradient seems better at lower precision with the compiled code.
See #121324 (comment)

Fixes #121072 Pull Request resolved: #121324 Approved by: https://github.com/albanD

github-actions bot added the module: amp (automated mixed precision) autocast label Mar 6, 2024

bhack mentioned this pull request Mar 6, 2024

Correctly handle F.interpolate upsample with amp #121072

Closed

albanD reviewed Mar 6, 2024

View reviewed changes

aten/src/ATen/autocast_mode.cpp Outdated Show resolved Hide resolved

bhack requested a review from albanD March 6, 2024 18:27

albanD reviewed Mar 6, 2024

View reviewed changes

pytorchbot added the open source label Mar 6, 2024

bhack requested a review from albanD March 7, 2024 01:11

ezyang previously approved these changes Mar 7, 2024

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Mar 7, 2024

pytorchmergebot added the merging label Mar 7, 2024

pytorchmergebot removed the merging label Mar 7, 2024

bhack requested a review from ezyang March 9, 2024 13:17

ezyang added release notes: python_frontend python frontend release notes category topic: bug fixes topic category labels Mar 13, 2024

pytorchmergebot added the merging label Mar 13, 2024

pytorchmergebot removed the merging label Mar 13, 2024

bhack mentioned this pull request Mar 14, 2024

Wrong libstdc++ version load in CI when pytorch is built in different kernel version (5.+) #121796

Closed

Merge branch 'main' into pr/bhack/121324

0b6f8ba

albanD reviewed Apr 23, 2024

View reviewed changes

aten/src/ATen/autocast_mode.cpp Outdated Show resolved Hide resolved

Make linter happy

04a831d

bhack requested a review from albanD April 23, 2024 22:47

albanD approved these changes Apr 23, 2024

View reviewed changes

pytorchmergebot added the merging label Apr 23, 2024

pytorchmergebot closed this in cb94845 Apr 24, 2024

pytorchmergebot removed the merging label Apr 24, 2024

alat-rights pushed a commit to alat-rights/pytorch that referenced this pull request Apr 26, 2024

Force upsample to be float32 (pytorch#121324)

1b45a6a

Fixes pytorch#121072 Pull Request resolved: pytorch#121324 Approved by: https://github.com/albanD

xuzhao9 mentioned this pull request Apr 26, 2024

V3 Performance Signal Detected by TorchBench Userbenchmark "torch-nightly" on '2.4.0.dev20240425+cu121' pytorch/benchmark#2247

Closed

albanD added topic: bc breaking topic category module: bc-breaking Related to a BC-breaking change labels May 1, 2024

pytorch-bot bot pushed a commit that referenced this pull request May 3, 2024

Force upsample to be float32 (#121324)

36755cd

Fixes #121072 Pull Request resolved: #121324 Approved by: https://github.com/albanD

bhack mentioned this pull request Jun 12, 2024

AMP guards recompilation dtype mismatch. expected Half, actual Float #128134

Closed

Force upsample to be float32 #121324

Force upsample to be float32 #121324

Uh oh!

Conversation

bhack commented Mar 6, 2024 • edited by albanD Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Mar 6, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/121324

✅ You can merge normally! (2 Unrelated Failures)

Uh oh!

bhack commented Mar 6, 2024

Uh oh!

Uh oh!

albanD left a comment

Choose a reason for hiding this comment

Uh oh!

ezyang commented Mar 7, 2024

Uh oh!

pytorchmergebot commented Mar 7, 2024

Merge failed

Uh oh!

bhack commented Mar 7, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bhack commented Mar 12, 2024

Uh oh!

ezyang commented Mar 13, 2024

Uh oh!

pytorchmergebot commented Mar 13, 2024

Merge started

Uh oh!

pytorchmergebot commented Mar 13, 2024

Merge failed

Uh oh!

bhack commented Mar 13, 2024

Uh oh!

bhack commented Mar 13, 2024

Uh oh!

nWEIdia commented Mar 13, 2024

Uh oh!

bhack commented Mar 13, 2024

Uh oh!

bhack commented Apr 22, 2024

Uh oh!

bhack commented Apr 23, 2024

Uh oh!

Uh oh!

albanD left a comment

Choose a reason for hiding this comment

Uh oh!

albanD commented Apr 23, 2024

Uh oh!

pytorchmergebot commented Apr 23, 2024

Merge started

Uh oh!

pytorchmergebot commented Apr 24, 2024

Uh oh!

albanD commented Apr 24, 2024

Uh oh!

pytorchmergebot commented Apr 24, 2024

Merge started

Uh oh!

xuzhao9 commented Apr 26, 2024

Uh oh!

albanD commented Apr 26, 2024

Uh oh!

xuzhao9 commented Apr 26, 2024

Uh oh!

albanD commented Apr 26, 2024

Uh oh!

ezyang commented Apr 26, 2024

Uh oh!

bhack commented Apr 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

bhack commented Mar 6, 2024 •

edited by albanD

Loading

pytorch-bot bot commented Mar 6, 2024 •

edited

Loading

bhack commented Mar 7, 2024 •

edited

Loading

bhack commented Apr 26, 2024 •

edited

Loading