Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@gwoltman
Copy link
Collaborator

I think its ready for merging!

gwoltman added 30 commits April 7, 2025 21:55
…code.

Updated maxBpw tables.  Changed -tune to handle new FFT spec code.
Added TABMUL_CHAIN option (it did not deserve to be a WMH code because it has little impact on Z).
…at most 500 kernels, a marker, and 500 more kernels. When prpll tries to add another kernel to the queue, we loop checking if the marker has been reached else performing a lengthy sleep (knowing there are 500 kernels to execute after the marker).
…ktodo files.

Autoprimenet needs this change.
…ce condition were launched thread accesses uninitialized variables.
…mentations that are not faster (at least on TitanV).
…word / little word range. Carryutil sloppy routines may or may not use this feature in the future.
…r more sloppy carries which gives a tiny performance boost for M31+M61 NTTs.
…TitanV. Explored alternate weakMul and csq implementations.
…need to expose the MIDDLE_CHAINMUL option to the end user.
…e I'll figure out a may to use them profitably in the future.
…t. Now we just need to automatically detect the GPU's actual level of CUDA support.
… does not support the builtins required for variant zero.

As a poor workaround, this change let's the user specify NO_ASM to bypass tuning FP64 variant zero.
…od AMD memory layout. I have been unable to find a memory layout nearly as good as the INPLACE=0 layout.
…ng on use of fma function on floats).

Some minor changes on wording of tune output.
@preda preda merged commit 4d0e759 into preda:master Dec 13, 2025
6 of 7 checks passed
@preda
Copy link
Owner

preda commented Dec 13, 2025

Thank you for this great work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants