The autotuning turned up a missing range check in the radix sort kernel: SCATTER_SLICE * SCATTER_WORK_GROUP_SCALE must be strictly less than 256, as otherwise the reduced histogram can overflow, causing bizarre results. This was missed by the sorting tests, since they only use random data. A good test would simply be to sort on a reduced range, where for higher digits it would be certain to the same digit for all keys.
Reported by: bmerry
Original Ticket: clogs/tickets/22