Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Tags: dmm-fb/pytorch

Tags

ciflow/trunk/83880

Toggle ciflow/trunk/83880's commit message
rebase bug

ciflow/trunk/83708

Toggle ciflow/trunk/83708's commit message
Update on "Enable -Wunused-local-typedefs"

I recently had a PR reverted because it triggered a
unused-local-typedefs warning, so disabling these in the CMake build
is counter-productive.

[ghstack-poisoned]

ciflow/trunk/83690

Toggle ciflow/trunk/83690's commit message
Merge branch 'master' into manual-test-selection

ciflow/trunk/83285

Toggle ciflow/trunk/83285's commit message
save

ciflow/trunk/82754

Toggle ciflow/trunk/82754's commit message
remove redundant torchgen checks; add NestedTensor back to math keyset

ciflow/periodic/83690

Toggle ciflow/periodic/83690's commit message
Merge branch 'master' into manual-test-selection

ciflow/nightly/83957

Toggle ciflow/nightly/83957's commit message
Fix wrong comparison logic

ciflow/binaries_libtorch/83959

Toggle ciflow/binaries_libtorch/83959's commit message
Skip NCCL slimming for cxx11 libtorch builds

Fixes pytorch#83887

ciflow/trunk/83239

Toggle ciflow/trunk/83239's commit message
Update on "[NVFuser] Upstream push 0811"

Syncing nvfuser devel branch to upstream master. https://github.com/csarofeen/pytorch/

Code changes includes:

- codegen improvements:
  1. double support in expression evaluator
- bug fixes:
  1. dropout fix - rework RNG to support broadcasted dropout (Fixes pytorch#82784)
  2. expand fix - Patch expand+reduction, expand+view, rework view analysis and guard
- scheduler:
  1. manual transpose schedule example
  2. WIP transpose scheduler

Commits that's in this PR from the devel branch:

```
b7435af Transpose scheduler, step 1 (pytorch#1854)
8a45dbf Add an example on how to manually schedule transpose (pytorch#1889)
83dbf56 Patch dropout fix (pytorch#1898)
69d3519 Expand+Reduction, Expand+View support, rework View analysis and guards (pytorch#1883)
15091c4 Rework RNG to correctly support broadcasted dropout (pytorch#1888)
aafe2d0 Make ExpressionEvaluator support Double (pytorch#1885)
```

RUN_TORCHBENCH: nvfuser

Differential Revision: [D38657074](https://our.internmc.facebook.com/intern/diff/D38657074)

[ghstack-poisoned]

ciflow/trunk/83195

Toggle ciflow/trunk/83195's commit message
Update on "Fix FSDP not all outputs used in loss"


There are a couple issues / assumptions within FSDP today that this PR attempts to fix:

- In wait_for_post_backward, we assume that if a param required grad, its post backward was called, but this is not true, i.e. if its output did not participate in grad computation, it would not have called post backward. To fix this we simply removed those assertions.
- There is a deeper issue where in `_finalize_params`, we could end up assigning a grad of the sharded shape to an unsharded parameter gradient field, which would raise a shape error. This can happen for example if a parameter's usage transitions from used --> unused. In this case, when the parameter was used, it would have had a gradient, then user could have possibly called `zero_grad()` and p.grad would not be `None`. This in `_prep_grad_for_backward`, we would assign a `_saved_grad_shard` to this gradient field which would be the sharded shape. In `_finalize_param`, our parameter would be unsharded (since post_backward was not called), but we'd try to assign, raising the shape issue. This issue is fixed by checking `_post_backward_called`. If this is False, we simply skip the assignment because there is no new gradient to update.
- A final issue as mentioned above is that if post_backward is not called, we never reshard the full param. This is fixed by checking if we haven't resharded (basically if post_backward_called == False), and if so, performing a reshard.

A few things to note:
- This logic may have to be revisited when non-recursive wrapping lands as there are multiple FlatParams per FSDP unit
- This logic may not work when post_backward_hook fires but p.grad is None, i.e. the short-circuiting here: https://github.com/pytorch/pytorch/blob/f534b2c627da65bbee7ccc8f7e054da0ba48eb79/torch/distributed/fsdp/fully_sharded_data_parallel.py#L2884. As a quick fix, we could just move `_post_backward_called` flag change to after this, or just perform a reshard before returning early. I am not sure how to repro a case where p.grad == None but we call the post-backward hook, pytorch#83197 might be a possibility, but I think it is fine to not support this yet.



[ghstack-poisoned]