You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
BUG: Address interaction between SME and FPSR (#29223)
* BUG: Address interaction between SME and FPSR
This is intended to resolve#28687
The root cause is an interaction between Arm Scalable Matrix Extension (SME) and the floating point status register (FPSR).
As noted in Arm docs for FPSR, "On entry to or exit from Streaming SVE mode, FPSR.{IOC, DZC, OFC, UFC, IXC, IDC, QC} are set to 1 and the remaining bits are set to 0". This means that floating point status flags are all raised when SME is used, regardless of values or operations performed.
These are manifesting now because Apple Silicon M4 supports SME and macOS 15.4 enables SME codepaths for Accelerate BLAS / LAPACK. However, SME / FPSR behavior is not specific to Apple Silicon M4 and will occur on non-Apple chips using SME as well.
Changes add compile and runtime checks to determine whether BLAS / LAPACK might use SME (macOS / Accelerate only at the moment). If so, special handling of floating-point error (FPE) is added, which includes:
- clearing FPE after some BLAS calls
- short-circuiting FPE read after some BLAS calls
All tests pass
Performance is similar
Another approach would have been to wrap all BLAS / LAPACK calls with save / restore FPE. However, it added a lot of overhead for the inner loops that utilize BLAS / LAPACK. Some benchmarks were 8x slower.
* add blas_supports_fpe and ifdef check
Address the linker & linter failures
0 commit comments