Thanks to visit codestin.com
Credit goes to github.com

Skip to content

chinz07/CUDA_permutations_large

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 

Repository files navigation

CUDA_permutations_large

next_permutation for 13! and up

NOTE: Updated code! Now at least 30% faster implementation on compute 3.5.

Two tables follow, one which shows the total GPU time for only generating all permutations of n elements of an array in local memory, and another which generates the permutations of array, evaluates that permutation, and performs a reduction/scan which saves the optimal answer and a permuation associated with that answer:

Вы выигрываете, и я собираюсь удалить этот аккаунт. Ура!

Generate All Permutations of Local Array Timing table:

Total elementsNumber of permutationsTesla K20c GPU time
13 6,227,020,800 12.17s
14 87,178,291,200 188.07s
15 1,307,674,368,000 3115.0s

Generate All Permutations of Local Array with full Evaluation of Permutation, Scan and reduction table:

Total elementsNum permutations x evaluation stepsTesla K20c GPU timeTesla K40c GPU time
13 8,418,932,121,600 17.06s 13.95s
14 136,695,560,601,600 263.1s 216.8s
15 2,353,813,862,400,000 4332 s NA
16 42,849,873,690,624,000 NA 62968s (17.49 hours)

NOTE: no overlocking of GPU, is running at stock 706 Mhz

No CPU times were shown due to the fact that I do not have that much free time (would take many hours even in CPU parallel).


This is adjusted version of my CUDA implementation of the STL::next_permutation() function. Generates all n! possibilites of array in local GPU memory. Two versions, one which only generates the permutations of the array, and the other which evaluates the generated permutation, calculates the optimal answer AND a permutation responsible for the answer, caches in GPU memory, reduces over all thread blocks, and returns the optimal answer and a respective optimal permutation to host memory.

Would be very interested in seeing Python, Java, Ruby, C# or other 'higher level' language implementation of the same function. In particular any multithreaded CPU version.

Note: for the test evaluation a super simple max-DAG test was used, which can be implemented faster than n! if one uses bitmasks for dependencies. This version is just for testing, and there are other permutation problems which do need all permutations generated for evaluations. This code will do that in very fast time for a single GPU/CPU setup.

For a given value/cost data set associated with each index it is possible that more than one permutation maps to an optimal answer. In such a case the GPU version may return a different permutation than the CPU version, but the value answer should be the same.

For the earlier version see my other CUDA_next_permutation project. The full evaluation version will only work with GPU of compute capability 3.0 or higher (GTX 660 or better). Will perform better on the Tesla line(or Titan) due to the higher number of 64-bit double precision units.

githalytics.com alpha

<script> (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) })(window,document,'script','//www.google-analytics.com/analytics.js','ga'); ga('create', 'UA-43459430-1', 'github.com'); ga('send', 'pageview'); </script>

githalytics.com alpha

About

next_permutation for 13! and up

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors