Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@gau-nernst
Copy link
Collaborator

@gau-nernst gau-nernst commented May 10, 2025

Closes #1931

Description of the problem can be found in the PR

        # when there is CPU offload, p.device is cpu, but device_mesh.device_type is cuda.
        # DTensor.from_local() will move local_tensor to device_mesh.device_type.
        # hence, we need to manually move it back to CPU.
        # https://github.com/pytorch/pytorch/blob/bc4cf1c1/torch/distributed/tensor/_api.py#L410-L415

Added test to cover FSDP2+CPUOffload

@pytorch-bot
Copy link

pytorch-bot bot commented May 10, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2195

Note: Links to docs will display an error until the docs builds have been completed.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 10, 2025
@gau-nernst gau-nernst requested a review from msaroufim May 10, 2025 06:20
@gau-nernst gau-nernst added the topic: bug fix Use this tag for PRs that fix bugs label May 10, 2025
@msaroufim msaroufim merged commit 4ee2ee1 into pytorch:main May 10, 2025
3 checks passed
@gau-nernst gau-nernst deleted the fix/optim_fsdp_offload branch May 10, 2025 13:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. topic: bug fix Use this tag for PRs that fix bugs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

FSDP2 + CPU Offload + AdamW8bit issue

3 participants