See discussion in NVIDIA/cub#294 and NVIDIA/cub#305. The same change should be applied to `cub::DeviceReduce::Reduce`.