Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

malfet
Copy link
Contributor

@malfet malfet commented Apr 24, 2025

Stack from ghstack (oldest at bottom):

As reported in #149292, according to manual, vfmsq_f32 implements c - a * b rather than a * b - c, so it's call must be prefixed with vnegq_f32

Also, adjust the tests to use OpMath for FMA computation to avoid accuracy error accumulation due to non-fused multiply-and-add over lower precision dtypes

Note that Vectorized::fmsub is not currently instantiated anywhere, so it could safely remain broken

TODO:

  • Enable C++ testing on MacOS and/or aarch64 platforms (right now Mac tests are build without C++ tests)

Fixes #149292

cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @jerryzh168

[ghstack-poisoned]
@malfet malfet requested a review from a team as a code owner April 24, 2025 00:50
Copy link

pytorch-bot bot commented Apr 24, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/152075

Note: Links to docs will display an error until the docs builds have been completed.

⏳ No Failures, 43 Pending

As of commit e61b271 with merge base 2f74cff (image):
💚 Looks good so far! There are no failures yet. 💚

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added the module: cpu CPU specific problem (e.g., perf, algorithm) label Apr 24, 2025
malfet added a commit that referenced this pull request Apr 24, 2025
As reported in #149292
According to manual, `vfmsq_f32` implements `c - a * b` rather than `a * b - c`, so it's call must be prefixed with `vnegq_f32`

Also, adjust the tests to use OpMath for FMA computation to avoid accuracy error accumulation due to non-fused multiply-and-add over lower precision dtypes

Run `./bin/vec_test_all_types_DEFAULT` during MacOS testing

Fixes #149292

ghstack-source-id: 2b49ca2
Pull Request resolved: #152075
@malfet malfet requested a review from swolchok April 24, 2025 00:51
@malfet malfet added topic: bug fixes topic category release notes: cpu (aarch64) release notes category for aarch64, arm, etc. ciflow/trunk Trigger trunk jobs on your pull request labels Apr 24, 2025
Copy link
Contributor

@swolchok swolchok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks

[ghstack-poisoned]
malfet added a commit that referenced this pull request Apr 24, 2025
As reported in #149292
According to manual, `vfmsq_f32` implements `c - a * b` rather than `a * b - c`, so it's call must be prefixed with `vnegq_f32`

Also, adjust the tests to use OpMath for FMA computation to avoid accuracy error accumulation due to non-fused multiply-and-add over lower precision dtypes

Run `./bin/vec_test_all_types_DEFAULT` during MacOS testing

Fixes #149292

ghstack-source-id: 58b498b
Pull Request resolved: #152075
@malfet
Copy link
Contributor Author

malfet commented Apr 24, 2025

@pytorchbot merge -f "This seems fine"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@atalman
Copy link
Contributor

atalman commented May 7, 2025

@pytorchbot cherry-pick --onto release/2.7 -c critical

pytorchbot pushed a commit that referenced this pull request May 7, 2025
As reported in #149292, according to manual, `vfmsq_f32` implements `c - a * b` rather than `a * b - c`, so it's call must be prefixed with `vnegq_f32`

Also, adjust the tests to use OpMath for FMA computation to avoid accuracy error accumulation due to non-fused multiply-and-add over lower precision dtypes

Note that `Vectorized::fmsub` is not currently instantiated anywhere, so it could safely remain broken

TODO:
 - Enable C++ testing on MacOS and/or aarch64 platforms (right now Mac tests are build without C++ tests)

Fixes #149292

Pull Request resolved: #152075
Approved by: https://github.com/swolchok
ghstack dependencies: #151955

(cherry picked from commit 2ea8653)
@pytorchbot
Copy link
Collaborator

Cherry picking #152075

The cherry pick PR is at #153093 and it is recommended to link a critical cherry pick PR with an issue. The following tracker issues are updated:

Details for Dev Infra team Raised by workflow job

atalman pushed a commit that referenced this pull request May 7, 2025
[vec128] Fix fmsub NEON defintion (#152075)

As reported in #149292, according to manual, `vfmsq_f32` implements `c - a * b` rather than `a * b - c`, so it's call must be prefixed with `vnegq_f32`

Also, adjust the tests to use OpMath for FMA computation to avoid accuracy error accumulation due to non-fused multiply-and-add over lower precision dtypes

Note that `Vectorized::fmsub` is not currently instantiated anywhere, so it could safely remain broken

TODO:
 - Enable C++ testing on MacOS and/or aarch64 platforms (right now Mac tests are build without C++ tests)

Fixes #149292

Pull Request resolved: #152075
Approved by: https://github.com/swolchok
ghstack dependencies: #151955

(cherry picked from commit 2ea8653)

Co-authored-by: Nikita Shulga <[email protected]>
@github-actions github-actions bot deleted the gh/malfet/300/head branch June 12, 2025 02:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request Merged module: cpu CPU specific problem (e.g., perf, algorithm) release notes: cpu (aarch64) release notes category for aarch64, arm, etc. topic: bug fixes topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants