Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@yazon
Copy link
Owner

@yazon yazon commented Aug 17, 2025

Fixes neon64_ee and neon64_oe ARM64 NEON FFT kernels to prevent writing uninitialized data.

The original ARM32 kernels process 2 complex numbers (16 bytes) per iteration. The ARM64 port incorrectly mixed 2-lane (64-bit) arithmetic with 4-lane (128-bit) stores, causing the upper 64 bits of the 128-bit registers to be uninitialized and written to memory. This resulted in large L2 errors for N=32 FFTs. The fix ensures all operations (loads, arithmetic, and stores) are consistently 2-lane to match the ARM32 behavior and prevent writing garbage data.


Open in Cursor Open in Web

@cursor
Copy link

cursor bot commented Aug 17, 2025

Cursor Agent can help with this pull request. Just @cursor in comments and I'll start working on changes in this branch.
Learn more about Cursor Agents

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants