Thanks to visit codestin.com
Credit goes to github.com

Skip to content

BUG: Address interaction between SME and FPSR (#29223) #29235

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 19, 2025

Conversation

charris
Copy link
Member

@charris charris commented Jun 19, 2025

Backport of #29223.

  • BUG: Address interaction between SME and FPSR

This is intended to resolve #28687

The root cause is an interaction between Arm Scalable Matrix Extension (SME) and the floating point status register (FPSR).

As noted in Arm docs for FPSR, "On entry to or exit from Streaming SVE
mode, FPSR.{IOC, DZC, OFC, UFC, IXC, IDC, QC} are set to 1 and the
remaining bits are set to 0". This means that floating point status
flags are all raised when SME is used, regardless of values or
operations performed.

These are manifesting now because Apple Silicon M4 supports SME and macOS 15.4 enables SME codepaths for Accelerate BLAS / LAPACK. However, SME / FPSR behavior is not specific to Apple Silicon M4 and will occur on non-Apple chips using SME as well.

Changes add compile and runtime checks to determine whether BLAS / LAPACK might use SME (macOS / Accelerate only at the moment). If so, special handling of floating-point error (FPE) is added, which includes:

  • clearing FPE after some BLAS calls
  • short-circuiting FPE read after some BLAS calls

All tests pass
Performance is similar

Another approach would have been to wrap all BLAS / LAPACK calls with save / restore FPE. However, it added a lot of overhead for the inner loops that utilize BLAS / LAPACK. Some benchmarks were 8x slower.

  • add blas_supports_fpe and ifdef check

Address the linker & linter failures

* BUG: Address interaction between SME and FPSR

This is intended to resolve numpy#28687

The root cause is an interaction between Arm Scalable Matrix Extension
(SME) and the floating point status register (FPSR).

 As noted in Arm docs for FPSR, "On entry to or exit from Streaming SVE
 mode, FPSR.{IOC, DZC, OFC, UFC, IXC, IDC, QC} are set to 1 and the
 remaining bits are set to 0".  This means that floating point status
 flags are all raised when SME is used, regardless of values or
 operations performed.

These are manifesting now because Apple Silicon M4 supports SME and
macOS 15.4 enables SME codepaths for Accelerate BLAS / LAPACK.  However,
SME / FPSR behavior is not specific to Apple Silicon M4 and will occur
on non-Apple chips using SME as well.

Changes add compile and runtime checks to determine whether BLAS /
LAPACK might use SME (macOS / Accelerate only at the moment).  If so,
special handling of floating-point error (FPE) is added, which includes:
- clearing FPE after some BLAS calls
- short-circuiting FPE read after some BLAS calls

All tests pass
Performance is similar

Another approach would have been to wrap all BLAS / LAPACK calls with
save / restore FPE.  However, it added a lot of overhead for the inner
loops that utilize BLAS / LAPACK.  Some benchmarks were 8x slower.

* add blas_supports_fpe and ifdef check

Address the linker & linter failures
@charris charris added this to the 2.3.1 release milestone Jun 19, 2025
@charris charris added 00 - Bug 08 - Backport Used to tag backport PRs labels Jun 19, 2025
@charris charris merged commit 0481076 into numpy:maintenance/2.3.x Jun 19, 2025
73 checks passed
@charris charris deleted the backport-29223 branch June 19, 2025 14:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
00 - Bug 08 - Backport Used to tag backport PRs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants