Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Make device check error message more descriptive #150750

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

zeshengzong
Copy link
Contributor

Fixes #122757

Test Result

import torch

model_output = torch.randn(10, 5).cuda()
labels = torch.randint(0, 5, (10,)).cuda()
weights = torch.randn(5)

loss_fn = torch.nn.CrossEntropyLoss(weight=weights)
loss = loss_fn(input=model_output, target=labels)
print(loss)


Traceback (most recent call last):
  File "/home/zong/code/pytorch/../loss2.py", line 17, in <module>
    loss = loss_fn(input=model_output, target=labels)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/zong/code/pytorch/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/zong/code/pytorch/torch/nn/modules/module.py", line 1762, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/zong/code/pytorch/torch/nn/modules/loss.py", line 1297, in forward
    return F.cross_entropy(
           ^^^^^^^^^^^^^^^^
  File "/home/zong/code/pytorch/torch/nn/functional.py", line 3494, in cross_entropy
    return torch._C._nn.cross_entropy_loss(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Expected all tensors to be on the same device, but got weight is on cpu, different from other tensors on cuda:0 (when checking argument in method wrapper_CUDA_nll_loss_forward)

Copy link

pytorch-bot bot commented Apr 7, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/150750

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 6ab42b7 with merge base dfcfad2 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@zeshengzong
Copy link
Contributor Author

@pytorchbot label "topic: not user facing"

@pytorch-bot pytorch-bot bot added the topic: not user facing topic category label Apr 7, 2025
@zeshengzong zeshengzong marked this pull request as ready for review April 7, 2025 10:57
@bdhirsh bdhirsh requested a review from mikaylagawarecki April 8, 2025 20:54
@bdhirsh bdhirsh added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Apr 8, 2025
@zeshengzong
Copy link
Contributor Author

Hello @mikaylagawarecki, please help review the change, thanks!

@zeshengzong
Copy link
Contributor Author

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label May 2, 2025
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 1 jobs have failed, first few of them are: trunk / macos-py3-arm64-mps / test (mps, 1, 1, macos-m2-15)

Details for Dev Infra team Raised by workflow job

@zeshengzong
Copy link
Contributor Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@huydhn
Copy link
Contributor

huydhn commented May 6, 2025

@pytorchbot revert -m 'Sorry for reverting your change but it seems to cause a test to fail in trunk' -c nosignal

test_sparse.py::TestSparseOneOff::test_cuda_from_cpu GH job link HUD commit link

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

pytorchmergebot added a commit that referenced this pull request May 6, 2025
This reverts commit 8253970.

Reverted #150750 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it seems to cause a test to fail in trunk ([comment](#150750 (comment)))
@pytorchmergebot
Copy link
Collaborator

@zeshengzong your PR has been successfully reverted.

@pytorchmergebot pytorchmergebot added Reverted ci-no-td Do not run TD on this PR labels May 6, 2025
@pytorch-bot pytorch-bot bot dismissed mikaylagawarecki’s stale review May 6, 2025 06:42

This PR was reopened (likely due to being reverted), so your approval was removed. Please request another review.

@pytorch-bot pytorch-bot bot removed the ciflow/trunk Trigger trunk jobs on your pull request label May 6, 2025
@zeshengzong
Copy link
Contributor Author

Fixed test, please review the change, thanks!

@malfet
Copy link
Contributor

malfet commented May 6, 2025

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label May 6, 2025
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 2 mandatory check(s) failed. The first few are:

Dig deeper by viewing the failures on hud

Details for Dev Infra team Raised by workflow job

Failing merge rule: Core Maintainers

Copy link

pytorch-bot bot commented May 7, 2025

You don't have permissions to rebase this PR since you are a first time contributor. If you think this is a mistake, please contact PyTorch Dev Infra.

@zeshengzong
Copy link
Contributor Author

@pytorchbot rebase -b main

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/main. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Successfully rebased opt/aten/device_check onto refs/remotes/origin/main, please pull locally before adding more changes (for example, via git checkout opt/aten/device_check && git pull --rebase)

@pytorchmergebot pytorchmergebot force-pushed the opt/aten/device_check branch from 6a49d61 to 6ab42b7 Compare May 7, 2025 02:58
@pytorch-bot pytorch-bot bot removed the ciflow/trunk Trigger trunk jobs on your pull request label May 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci-no-td Do not run TD on this PR Merged open source Reverted topic: not user facing topic category triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Throwing more specific errors for CrossEntropyLoss weights being on a different device than the input/target
7 participants