Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

bhack
Copy link
Contributor

@bhack bhack commented Mar 6, 2024

Fixes #121072

BC-breaking note:
This is not technically bc-breaking any behavior but will lead to an expected significant performance change for amp + deterministic.
cc @zou3519 any way we can tell people how to register a fallthrough for that key to recover the old behavior if they want?

cc @mcarilli @ptrblck @leslie-fang-intel @jgong5 @voznesenskym @penguinwu @EikanWang @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @aakhundov @ColinPeppler @amjames @desertfire @chauhang

Copy link

pytorch-bot bot commented Mar 6, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/121324

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (2 Unrelated Failures)

As of commit 04a831d with merge base 47330ca (image):

FLAKY - The following job failed but was likely due to flakiness present on trunk:

BROKEN TRUNK - The following job failed but was present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@bhack
Copy link
Contributor Author

bhack commented Mar 6, 2024

/cc @ezyang @albanD

@bhack bhack requested a review from albanD March 6, 2024 18:27
Copy link
Collaborator

@albanD albanD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's see what CI thinks, but we should also add a test for this!

@bhack bhack requested a review from albanD March 7, 2024 01:11
ezyang
ezyang previously approved these changes Mar 7, 2024
@ezyang
Copy link
Contributor

ezyang commented Mar 7, 2024

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Mar 7, 2024
@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: This PR needs a release notes: label
If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Details for Dev Infra team Raised by workflow job

@bhack
Copy link
Contributor Author

bhack commented Mar 7, 2024

About the user facing doc I don't see that we had upsample in the list right?
https://github.com/pytorch/pytorch/blob/main/docs%2Fsource%2Famp.rst

Is this probably cause that list is not auto-generated and it is out of sync?

@bhack bhack requested a review from ezyang March 9, 2024 13:17
@bhack
Copy link
Contributor Author

bhack commented Mar 12, 2024

Green lights here.

@ezyang ezyang added release notes: python_frontend python frontend release notes category topic: bug fixes topic category labels Mar 13, 2024
@ezyang
Copy link
Contributor

ezyang commented Mar 13, 2024

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

@bhack
Copy link
Contributor Author

bhack commented Mar 13, 2024

Are these errors related to this PR?

@bhack
Copy link
Contributor Author

bhack commented Mar 13, 2024

/cc @nWEIdia

@nWEIdia
Copy link
Collaborator

nWEIdia commented Mar 13, 2024

ImportError: /opt/conda/envs/py_3.8/lib/libstdc++.so.6: version `GLIBCXX_3.4.30' not found (required by /opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
The error seems to be build env issues.

@bhack
Copy link
Contributor Author

bhack commented Mar 13, 2024

@nWEIdia Yes it is not related but I was just asking if you have a specific opinion at Nvidia about mixed precision float16 or bfloat in upsample kernels related to real model impact (see #121072).

@bhack
Copy link
Contributor Author

bhack commented Apr 22, 2024

Thanks @albanD the question was general not only about the refactoring as it is not clear if these ops are still impacted by the amp float32 enforcing machinery when lowered/compiled.

@bhack
Copy link
Contributor Author

bhack commented Apr 23, 2024

@albanD It was adapted to the mentioned refactoring.

@bhack bhack requested a review from albanD April 23, 2024 22:47
Copy link
Collaborator

@albanD albanD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@albanD albanD dismissed lezcano’s stale review April 23, 2024 22:49

Atomic add changes were properly removed

@albanD
Copy link
Collaborator

albanD commented Apr 23, 2024

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

The merge job was canceled. If you believe this is a mistake, then you can re trigger it through pytorch-bot.

@albanD
Copy link
Collaborator

albanD commented Apr 24, 2024

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@xuzhao9
Copy link
Contributor

xuzhao9 commented Apr 26, 2024

This diff regresses torchbench pytorch_unet by 10%: pytorch/benchmark#2247

@albanD
Copy link
Collaborator

albanD commented Apr 26, 2024

Regressed in what sense? Speed or numerical?

@xuzhao9
Copy link
Contributor

xuzhao9 commented Apr 26, 2024

Regressed in what sense? Speed or numerical?

It regressed in latency ~10% and peak gpu memory usage ~40%. In pytorch/benchmark CI we only run eager mode (PT1) so there is no numerical metric.

@albanD
Copy link
Collaborator

albanD commented Apr 26, 2024

I guess this is expected behavior given that we now upcast this op. cc @ezyang is that a regression that is problematic?

@ezyang
Copy link
Contributor

ezyang commented Apr 26, 2024

This is an explicit correctness/speed tradeoff here. @bhack, is the memory usage regression you were expecting to see in this case?

@bhack
Copy link
Contributor Author

bhack commented Apr 26, 2024

A regression is expected for sure as we are working at float32 also with amp on the upsampling.
My points were:

  • Are we forcing this also when compiled?
  • is the lowering going to let us to still use flaot16/bfloat?

Cause the gradient seems better at lower precision with the compiled code.
See #121324 (comment)

@albanD albanD added topic: bc breaking topic category module: bc-breaking Related to a BC-breaking change labels May 1, 2024
pytorch-bot bot pushed a commit that referenced this pull request May 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/inductor ciflow/trunk Trigger trunk jobs on your pull request Merged module: amp (automated mixed precision) autocast module: bc-breaking Related to a BC-breaking change module: dynamo module: inductor open source release notes: python_frontend python frontend release notes category Reverted topic: bc breaking topic category topic: bug fixes topic category triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Correctly handle F.interpolate upsample with amp