You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Check if sufficient GPUs are available
The CUDA error message "Test CUDA failure util.cu:706 'invalid device ordinal'"
is not as helpful. Test this explicitly and guide the user.
Fix compilation for old NCCL versions
Fix compilation failure on ctaPolicy with NCCL <= 2.26.
Fix compilation failure on local_register with NCCL <= 2.18.
Fix ctaPolicy behavior if the tests are compiled with NCCL <= 2.26
but run with NCCL >= 2.27.
Update to align with the NCCL 2.28 release
Added Device API infrastructure and example kernels
Two new command line arguments:
-D <num> device kernel implementation to use <0/1/2/3/4>
-V <num> number of CTAs to launch device kernels with
Added new CTA Policy command line option:
-x <policy> set the CTA Policy <0/1/2>
Modified warmup to run for more message sizes
Loops between minBytes and maxBytes doubling size each time
Reduced default warmup iteration count to 1 (was 5)