Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@CNugteren
Copy link
Owner

Version 0.8.0

  • Added support for half-precision floating-point (fp16) in the library
  • Made it possible to compile the performance tests (clients) separately from the correctness tests
  • Made a reference BLAS and head-to-head performance comparison optional in the clients
  • Increased the verbosity of the "-verbose" option in the correctness tests
  • Refactored the host code for better compilation times and fewer lines of code
  • Added Appveyor continuous integration and increased coverage of the Travis builds
  • Improved the API documentation
  • Various minor fixes and enhancements
  • Added tuned parameters for various devices (see README)
  • Added half-precision routines:
    • Level-1: HSWAP/HSCAL/HCOPY/HAXPY/HDOT/HNRM2/HASUM/HSUM/iHAMAX/iHMAX/iHMIN
    • Level-2: HGEMV/HGBMV/HHEMV/HHBMV/HHPMV/HSYMV/HSBMV/HSPMV/HTRMV/HTBMV/HTPMV/HGER/HSYR/HSPR/HSYR2/HSPR2
    • Level-3: HGEMM/HSYMM/HSYRK/HSYR2K/HTRMM
  • Added non-BLAS routines:
    • SOMATCOPY/DOMATCOPY/COMATCOPY/ZOMATCOPY/HOMATCOPY (matrix copy, scaling, and/or transpose)

CNugteren added 30 commits May 12, 2016 19:56
…ossible to transfer half-precision values as well
CNugteren added 29 commits June 17, 2016 11:41
Refactoring of the Routine class and file-renaming
…tests, now printing when a library is called
…uccesfull error-code checks in the correctness tests
@CNugteren CNugteren merged commit 7c13bac into master Jun 28, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants