-
Notifications
You must be signed in to change notification settings - Fork 25.5k
[vec128] Fix fmsub NEON defintion #152075
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/152075
Note: Links to docs will display an error until the docs builds have been completed. ⏳ No Failures, 43 PendingAs of commit e61b271 with merge base 2f74cff ( UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
As reported in #149292 According to manual, `vfmsq_f32` implements `c - a * b` rather than `a * b - c`, so it's call must be prefixed with `vnegq_f32` Also, adjust the tests to use OpMath for FMA computation to avoid accuracy error accumulation due to non-fused multiply-and-add over lower precision dtypes Run `./bin/vec_test_all_types_DEFAULT` during MacOS testing Fixes #149292 ghstack-source-id: 2b49ca2 Pull Request resolved: #152075
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks
As reported in #149292 According to manual, `vfmsq_f32` implements `c - a * b` rather than `a * b - c`, so it's call must be prefixed with `vnegq_f32` Also, adjust the tests to use OpMath for FMA computation to avoid accuracy error accumulation due to non-fused multiply-and-add over lower precision dtypes Run `./bin/vec_test_all_types_DEFAULT` during MacOS testing Fixes #149292 ghstack-source-id: 58b498b Pull Request resolved: #152075
@pytorchbot merge -f "This seems fine" |
Merge startedYour change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
@pytorchbot cherry-pick --onto release/2.7 -c critical |
As reported in #149292, according to manual, `vfmsq_f32` implements `c - a * b` rather than `a * b - c`, so it's call must be prefixed with `vnegq_f32` Also, adjust the tests to use OpMath for FMA computation to avoid accuracy error accumulation due to non-fused multiply-and-add over lower precision dtypes Note that `Vectorized::fmsub` is not currently instantiated anywhere, so it could safely remain broken TODO: - Enable C++ testing on MacOS and/or aarch64 platforms (right now Mac tests are build without C++ tests) Fixes #149292 Pull Request resolved: #152075 Approved by: https://github.com/swolchok ghstack dependencies: #151955 (cherry picked from commit 2ea8653)
Cherry picking #152075The cherry pick PR is at #153093 and it is recommended to link a critical cherry pick PR with an issue. The following tracker issues are updated: Details for Dev Infra teamRaised by workflow job |
[vec128] Fix fmsub NEON defintion (#152075) As reported in #149292, according to manual, `vfmsq_f32` implements `c - a * b` rather than `a * b - c`, so it's call must be prefixed with `vnegq_f32` Also, adjust the tests to use OpMath for FMA computation to avoid accuracy error accumulation due to non-fused multiply-and-add over lower precision dtypes Note that `Vectorized::fmsub` is not currently instantiated anywhere, so it could safely remain broken TODO: - Enable C++ testing on MacOS and/or aarch64 platforms (right now Mac tests are build without C++ tests) Fixes #149292 Pull Request resolved: #152075 Approved by: https://github.com/swolchok ghstack dependencies: #151955 (cherry picked from commit 2ea8653) Co-authored-by: Nikita Shulga <[email protected]>
Stack from ghstack (oldest at bottom):
As reported in #149292, according to manual,
vfmsq_f32
implementsc - a * b
rather thana * b - c
, so it's call must be prefixed withvnegq_f32
Also, adjust the tests to use OpMath for FMA computation to avoid accuracy error accumulation due to non-fused multiply-and-add over lower precision dtypes
Note that
Vectorized::fmsub
is not currently instantiated anywhere, so it could safely remain brokenTODO:
Fixes #149292
cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @jerryzh168