Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

Copy link

pytorch-bot bot commented Apr 24, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/124853

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 5 Unrelated Failures

As of commit 38189a3 with merge base a8aed4c (image):

NEW FAILURE - The following job has failed:

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@xuhancn xuhancn changed the title update code [Don't merge] Dummy PR to builder trigger Apr 24, 2024
@xuhancn xuhancn added module: windows Windows support for PyTorch windows-triaged ciflow/binaries_wheel Trigger binary build and upload jobs for wheel on the PR and removed windows-triaged module: windows Windows support for PyTorch labels Apr 24, 2024
@xuhancn
Copy link
Collaborator Author

xuhancn commented Apr 24, 2024

@pytorchbot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

snarayan21 and others added 4 commits April 24, 2024 17:41
Fixes pytorch#124546

When setting `use_orig_params = False` and using activation checkpointing, the FQN mapping as retrieved by the `_get_fqns` function is incorrect because the prefix that is added to the name of each activation checkpointed module, `_checkpoint_wrapped_module`, can still be present. I think this is an edge case with the `_get_fqns` function that was not addressed by this previous commit pytorch#118119.

Without the change, the list of object names for an activation checkpointed module with FSDP (and `use_orig_params=False`) can be something like:
```
['model', '_fsdp_wrapped_module', 'transformer', 'blocks', '0', '_fsdp_wrapped_module', '_checkpoint_wrapped_module', '_flat_param']
```
Which will incorrectly return just one FQN, `{'model.transformer.blocks.0._flat_param'}`, when all the FQNs of the parameters of the transformer block should be returned.

With the change, the list of object names will now have `_checkpoint_wrapped_module` removed:
```
['model', '_fsdp_wrapped_module', 'transformer', 'blocks', '0', '_fsdp_wrapped_module', '_flat_param']
```
And the FQNs are correctly retrieved and returned in `_get_fqns` when [this condition](https://github.com/pytorch/pytorch/blob/ea61c9cb299b6dfebc57dc9d8821c34321d568ab/torch/distributed/checkpoint/state_dict.py#L168) is satisfied. The correct FQNs are:
```
{'model.transformer.blocks.0.attn.Wqkv.bias', 'model.transformer.blocks.0.ffn.up_proj.bias',
'model.transformer.blocks.0.attn.out_proj.weight', 'model.transformer.blocks.0.norm_2.weight',
'model.transformer.blocks.0.ffn.down_proj.weight', 'model.transformer.blocks.0.attn.Wqkv.weight',
'model.transformer.blocks.0.norm_2.bias', 'model.transformer.blocks.0.ffn.up_proj.weight',
'model.transformer.blocks.0.ffn.down_proj.bias', 'model.transformer.blocks.0.norm_1.bias',
'model.transformer.blocks.0.norm_1.weight', 'model.transformer.blocks.0.attn.out_proj.bias'}
```

Pull Request resolved: pytorch#124698
Approved by: https://github.com/Skylion007
Update ROCm-triton to use the AMD backend from https://github.com/openai/triton

Note: `test__int_mm` can be enabled after pytorch#122431 is landed

Co-authored-by: Pruthvi Madugundu <[email protected]>
Co-authored-by: Nikita Shulga <[email protected]>
Pull Request resolved: pytorch#121801
Approved by: https://github.com/nmacchioni, https://github.com/malfet
Summary:
Original commit changeset: 1f155b3a0bfc

Original Phabricator Diff: D56273267

Test Plan: CI

Differential Revision: D56526505

Pull Request resolved: pytorch#124860
Approved by: https://github.com/angelayi
@pytorchmergebot
Copy link
Collaborator

Successfully rebased xu_builder_test_trigger onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout xu_builder_test_trigger && git pull --rebase)

@xuhancn xuhancn removed oncall: distributed Add this issue/PR to distributed oncall triage queue module: inductor module: dynamo ciflow/inductor labels Apr 24, 2024
@xuhancn xuhancn added the ciflow/binaries_libtorch Trigger binary build and upload jobs for libtorch on the PR label Apr 25, 2024
@xuhancn xuhancn closed this Apr 26, 2024
@xuhancn xuhancn reopened this Jul 5, 2024
@xuhancn xuhancn closed this Jul 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/binaries_libtorch Trigger binary build and upload jobs for libtorch on the PR ciflow/binaries_wheel Trigger binary build and upload jobs for wheel on the PR open source topic: not user facing topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants