Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Comments

for now, delete the float8-all-gather-only functionality from float8 …#1451

Merged
vkuzo merged 1 commit intomainfrom
20241219_float8_allgather_only_delete
Dec 20, 2024
Merged

for now, delete the float8-all-gather-only functionality from float8 …#1451
vkuzo merged 1 commit intomainfrom
20241219_float8_allgather_only_delete

Conversation

@vkuzo
Copy link
Contributor

@vkuzo vkuzo commented Dec 19, 2024

…training

Summary:

In #1093 we added a config option, off by default, to use only float8 all-gather for training and do the matrix multiply in high precision.

This seems generally useful for communication bound workloads, but we can probably think of a cleaner way to add this functionality (such as a weight wrapper tensor subclass). The current implementation adds non-trivial complexity and doesn't jive well with where we want to take this codebase.

Since no one is using this internally or externally yet and we haven't talked about it in the release notes, I think we should do a BC-breaking delete as a one-off. However, if people have concerns - let me know and we can talk about less aggressive options.

Test Plan:

./test/float8/test_everything.sh

Reviewers:

Subscribers:

Tasks:

Tags:

@pytorch-bot
Copy link

pytorch-bot bot commented Dec 19, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1451

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 910edb3 with merge base 7618c26 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Dec 19, 2024
@vkuzo vkuzo requested a review from y-sq December 19, 2024 23:34
@vkuzo vkuzo added the module: bc-breaking Use this tag if this PR breaks backward compatibility label Dec 19, 2024
…training

Summary:

In #1093 we added a config option, off
by default, to use only float8 all-gather for training and do the matrix
multiply in high precision.

This seems generally useful for communication bound workloads, but we
can probably think of a cleaner way to add this functionality (such as a
weight wrapper tensor subclass). The current implementation adds
non-trivial complexity and doesn't jive well with where we want to take this
codebase.

Since no one is using this internally or externally yet and we haven't talked
about it in the release notes, I think we should do a BC-breaking delete
as a one-off.  However, if people have concerns - let me know and we can
talk about less aggressive options.

Test Plan:

```
./test/float8/test_everything.sh
```

Reviewers:

Subscribers:

Tasks:

Tags:
@vkuzo vkuzo force-pushed the 20241219_float8_allgather_only_delete branch from 488185d to 910edb3 Compare December 19, 2024 23:37
@vkuzo vkuzo requested review from drisspg and jerryzh168 December 19, 2024 23:40
@vkuzo vkuzo merged commit 84fb834 into main Dec 20, 2024
amdfaa pushed a commit that referenced this pull request Jan 10, 2025
#1451)

for now, delete the float8-all-gather-only functionality from float8 training

Summary:

In #1093 we added a config option, off
by default, to use only float8 all-gather for training and do the matrix
multiply in high precision.

This seems generally useful for communication bound workloads, but we
can probably think of a cleaner way to add this functionality (such as a
weight wrapper tensor subclass). The current implementation adds
non-trivial complexity and doesn't jive well with where we want to take this
codebase.

Since no one is using this internally or externally yet and we haven't talked
about it in the release notes, I think we should do a BC-breaking delete
as a one-off.  However, if people have concerns - let me know and we can
talk about less aggressive options.

Test Plan:

```
./test/float8/test_everything.sh
```

Reviewers:

Subscribers:

Tasks:

Tags:
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. module: bc-breaking Use this tag if this PR breaks backward compatibility

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants