[vec128] Fix fmsub NEON defintion #152075

malfet · 2025-04-24T00:50:37Z

Stack from ghstack (oldest at bottom):

-> [vec128] Fix fmsub NEON defintion #152075

As reported in #149292, according to manual, vfmsq_f32 implements c - a * b rather than a * b - c, so it's call must be prefixed with vnegq_f32

Also, adjust the tests to use OpMath for FMA computation to avoid accuracy error accumulation due to non-fused multiply-and-add over lower precision dtypes

Note that Vectorized::fmsub is not currently instantiated anywhere, so it could safely remain broken

TODO:

Enable C++ testing on MacOS and/or aarch64 platforms (right now Mac tests are build without C++ tests)

Fixes #149292

cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @jerryzh168

[ghstack-poisoned]

pytorch-bot · 2025-04-24T00:50:41Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/152075

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

⏳ No Failures, 43 Pending

As of commit e61b271 with merge base 2f74cff ():
💚 Looks good so far! There are no failures yet. 💚

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

⏳ pull / unstable-linux-focal-cuda12.6-py3.10-gcc11-sm89-xfail / build (gh)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

As reported in #149292 According to manual, `vfmsq_f32` implements `c - a * b` rather than `a * b - c`, so it's call must be prefixed with `vnegq_f32` Also, adjust the tests to use OpMath for FMA computation to avoid accuracy error accumulation due to non-fused multiply-and-add over lower precision dtypes Run `./bin/vec_test_all_types_DEFAULT` during MacOS testing Fixes #149292 ghstack-source-id: 2b49ca2 Pull Request resolved: #152075

swolchok

thanks

[ghstack-poisoned]

As reported in #149292 According to manual, `vfmsq_f32` implements `c - a * b` rather than `a * b - c`, so it's call must be prefixed with `vnegq_f32` Also, adjust the tests to use OpMath for FMA computation to avoid accuracy error accumulation due to non-fused multiply-and-add over lower precision dtypes Run `./bin/vec_test_all_types_DEFAULT` during MacOS testing Fixes #149292 ghstack-source-id: 58b498b Pull Request resolved: #152075

malfet · 2025-04-24T05:08:07Z

@pytorchbot merge -f "This seems fine"

pytorchmergebot · 2025-04-24T05:10:28Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

atalman · 2025-05-07T20:00:55Z

@pytorchbot cherry-pick --onto release/2.7 -c critical

As reported in #149292, according to manual, `vfmsq_f32` implements `c - a * b` rather than `a * b - c`, so it's call must be prefixed with `vnegq_f32` Also, adjust the tests to use OpMath for FMA computation to avoid accuracy error accumulation due to non-fused multiply-and-add over lower precision dtypes Note that `Vectorized::fmsub` is not currently instantiated anywhere, so it could safely remain broken TODO: - Enable C++ testing on MacOS and/or aarch64 platforms (right now Mac tests are build without C++ tests) Fixes #149292 Pull Request resolved: #152075 Approved by: https://github.com/swolchok ghstack dependencies: #151955 (cherry picked from commit 2ea8653)

pytorchbot · 2025-05-07T20:06:27Z

Cherry picking #152075

The cherry pick PR is at #153093 and it is recommended to link a critical cherry pick PR with an issue. The following tracker issues are updated:

[v2.7.1] Release Tracker #152627 (comment)

Details for Dev Infra team

Raised by workflow job

[vec128] Fix fmsub NEON defintion (#152075) As reported in #149292, according to manual, `vfmsq_f32` implements `c - a * b` rather than `a * b - c`, so it's call must be prefixed with `vnegq_f32` Also, adjust the tests to use OpMath for FMA computation to avoid accuracy error accumulation due to non-fused multiply-and-add over lower precision dtypes Note that `Vectorized::fmsub` is not currently instantiated anywhere, so it could safely remain broken TODO: - Enable C++ testing on MacOS and/or aarch64 platforms (right now Mac tests are build without C++ tests) Fixes #149292 Pull Request resolved: #152075 Approved by: https://github.com/swolchok ghstack dependencies: #151955 (cherry picked from commit 2ea8653) Co-authored-by: Nikita Shulga <[email protected]>

Update

7d9f5ed

[ghstack-poisoned]

malfet requested a review from a team as a code owner April 24, 2025 00:50

pytorch-bot bot added the module: cpu CPU specific problem (e.g., perf, algorithm) label Apr 24, 2025

malfet requested a review from swolchok April 24, 2025 00:51

malfet added topic: bug fixes topic category release notes: cpu (aarch64) release notes category for aarch64, arm, etc. ciflow/trunk Trigger trunk jobs on your pull request labels Apr 24, 2025

swolchok approved these changes Apr 24, 2025

View reviewed changes

Update

e61b271

[ghstack-poisoned]

pytorchmergebot added the merging label Apr 24, 2025

pytorchmergebot added the Merged label Apr 24, 2025

pytorchmergebot closed this in 2ea8653 Apr 24, 2025

pytorchmergebot removed the merging label Apr 24, 2025

pytorchbot mentioned this pull request May 7, 2025

[vec128] Fix fmsub NEON defintion #153093

Merged

pytorchbot mentioned this pull request May 7, 2025

[v2.7.1] Release Tracker #152627

Closed

atalman mentioned this pull request May 28, 2025

Release 2.7.1 validations checklist and cherry-picks #154512

Closed

49 tasks

github-actions bot deleted the gh/malfet/300/head branch June 12, 2025 02:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[vec128] Fix fmsub NEON defintion #152075

[vec128] Fix fmsub NEON defintion #152075

Uh oh!

malfet commented Apr 24, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Apr 24, 2025 •

edited

Loading

Uh oh!

swolchok left a comment

Uh oh!

malfet commented Apr 24, 2025

Uh oh!

pytorchmergebot commented Apr 24, 2025

Uh oh!

atalman commented May 7, 2025

Uh oh!

pytorchbot commented May 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[vec128] Fix fmsub NEON defintion #152075

[vec128] Fix fmsub NEON defintion #152075

Uh oh!

Conversation

malfet commented Apr 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Apr 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/152075

⏳ No Failures, 43 Pending

Uh oh!

swolchok left a comment

Choose a reason for hiding this comment

Uh oh!

malfet commented Apr 24, 2025

Uh oh!

pytorchmergebot commented Apr 24, 2025

Merge started

Uh oh!

atalman commented May 7, 2025

Uh oh!

pytorchbot commented May 7, 2025

Cherry picking #152075

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

malfet commented Apr 24, 2025 •

edited

Loading

pytorch-bot bot commented Apr 24, 2025 •

edited

Loading