Releases: flame/blis
BLIS 2.0
This release provides major new functionality in the core BLIS framework, along with many other bugfixes and small changes.
Improvements present in 2.0 (June 25, 2025):
Known Issues:
- There is a performance regression in the
ztrmmandztrsmoperations. On the Ampere Altra, performance is impacted by up to 30%; it is currently unknown if and how much this bug affects other architectures but the effect should be much smaller in most cases.
Framework:
- BLIS now supports "plugins", which provide additional functionality through user-defined kernels, blocksizes, and kernel preferences. Users can use an installed copy of BLIS (even a binary-only distribution) to create a plugin outside of the BLIS source tree. User-written reference kernels can then be registered into BLIS, and are compiled by the BLIS build system for all configured architecture. This also means that user-provided kernels participate in run-time kernel selection based on the actual hardware used! Additionally, users can provide and register optimized kernels for specific architectures which are automatically selected as appropriate. See
docs/PluginHowTo.mdfor more information. - A new API has been added which allows users to modify the default "control tree". This data structure defines the specific algorithmic steps used to implement a level-3 BLAS operation such as
gemmorsyrk. Users can start with a predefined control tree for one of the level-3 BLAS operations (excepttrsmcurrently) and then modify it to produce a custom operation. Users can change kernels for packing and computation, associated blocksizes, and provide additional information (such as external parameters or additional data) which is passed directly to the kernels. Seedocs/PluginHowTo.mdfor more information and a working example. - All level-3 BLAS operations (except
trsm) now support full mixed-precision mixed-domain computation. The A, B, and C matrices, as well as the alpha and beta scalars, may be provided in any of the supported data types (single/double precision and real/complex domain, currently), and an additionally-provided computational precision controls how the computation is actually performed internally. The computational precision can be set on theobj_tstructure representing the C matrix. - Added a
func2_tstruct for dealing with 2-type kernels (see below). Afunc2_tcan be safely cast tofunc_tto refer to only kernels with equal type parameters. (Devin Matthews) - The
bli_*_frontfunctions have been removed. - Extensive other back-end changes and improvements.
- A new "level-0" macro back-end has been implemented. These macros from the basic language for implementing reference kernels and for enabling correct mixed-type computation. The new back-end specifically support full data-type flexibility, including the "computational" data-type (e.g. input/output in double, compute in single), as well as fully correct mixed-domain computation and safe in-place usage of operations such as
scal2v. A dedicated testsuite (C++17 required) has also been added for this layer. A number of legacy macros have been retained as wrappers so that current code (e.g. optimized kernels) is not affected. - Fixed a lurking bug in
bli_obj_imag_partwhich would have caused the base address to be computed incorrectly for sub-matrix objects. - Users can now force the use of a particular configuration at runtime using
BLIS_ARCH_TYPE=<name>, where<name>is on of the
configured sub-configurations (check the output ofconfigurefor options). This functionality existed previously, but only
using numeric configuration IDs which are undocumented.
Compatibility:
- Added a ScaLAPACK compatibility mode which disables some conflicting BLAS definitions. (Field Van Zee)
- Fixed issues with improperly escaped strings in python scripts for compatibility with python 3.12+. (@AngryLoki)
- Added a user-defined macro
BLIS_ENABLE_STD_COMPLEXwhich usesstd::complextypedefs inblis.hfor C++ code. (Devin Matthews) - Fixed a bug in the definition of some scalar level-0 macros affecting compatibility of
bli_crealandbli_zreal, for example. (Devin Matthews) - Fixed improperly-quoted strings in Python scripts which affected compatibility with Python 3.12+. (@AngryLoki)
- The static initializer macros (
BLIS_*_INITIALIZER) have been fixed for compatibility with C++. (Devin Matthews) - Install "helper"
blis.handcblas.hheaders directly toINCDIR(in addition to the full files inINCDIR/blis). (Field Van Zee, Jed Brown, Mo Zhou) gemmtraliases for thegemmtBLAS and CBLAS compatibility functions have been added to support recent versions of LAPACK. (Mo Zhou)
Kernels:
- Fixed an out-of-bounds read bug in the
haswellgemmsupkernels. (John Mather) - Fixed a bug in the complex-domain
gemmkernels forpiledriver. (@rmast) - Kernel, blocksizes, and preference lookup functions now use
siz_trather than specific enums. (Devin Matthews) - Fixed some issues with run-time kernel detection and add more ARM part numbers/manufacturer codes. (John Mather)
- Kernels can now be added which have two datatype parameters. Kernel IDs are assigned such that 1-type and 2-type kernels cannot be interchanged accidentally. (Devin Matthews)
- The packing microkernels and computational microkernels (
gemmandgemmtrsm) now receive offsets into the global matrix. The latter are passed via theauxinfo_tstruct. (Devin Matthews) - The separate "MRxk" and "NRxk" packing kernels have been merged into one generic packing kernel. Packing kernels are now expected to pack any size micropanel, but may optimize for specific shapes. (Devin Matthews)
- Added explicit packing kernels for diagonal portions of matrices, and for certain mixed-domain/1m cases. (Devin Matthews)
- Improved support for duplication during packing ("broadcast-B") across all packing kernels.
- Some bugs with mixed-precision/mixed-domain operations on certain architectures (esp. AVX512) have been fixed.
- Fixed bug affecting reference kernels with clang 14.
- Fixed a problem affecting row/column strides of exactly -1 with
gemm1m. - Fixed an incompatibility between the
haswellgemmsupkernels and gcc 15. (Dave Love, Christopher Hillenbrand)
Build system:
- The
cblas.hfile is now "flattened" immediately afterblis.his (if enabled), rather than later in the build process. (Jeff Diamond, Field Van Zee) - Added script to help with preparing release candidate branches. (Field Van Zee)
- The configure script has been overhauled. In particular, using spaces in
CC/CXXis now supported. (Devin Matthews) - Improved support for C++ source files in BLIS or in plugins. (Devin Matthews)
- Disabled
armsveon Windows due to build failures. (Hernan Martinez, Atsushi Tatsuma) - Added integer
BLIS_VERSION_{MAJOR,MINOR,REVISION}macros toblis.hso that users can check BLIS version compatibility through the C preprocessor. - Moved
#include <omp.h>fromblis.hto the relevant source files. (Melven Roehrig-Zoellner) - Disable building KNL with gcc 15. (Dave Love)
- Improved support for NVIDIA Fortran compilers (ifx and nvfortran), particularly in terms of selecting the correct method for returning complex numbers. (Jeff Hammond)
Testing:
- test/3 drivers now allow using the "default" induced method, rather than forcing native or 1m operation. (Field Van Zee, Leick Robinson)
- Fix some segfaults in the test/3 drivers. (Field Van Zee, Leick Robinson)
- The testsuite now tests all possible type combinations when requested. (Devin Matthews)
- Improved detection of problems in
make check-blisand related targets. (Devin Matthews) - CI testing infrastructure has moved to CircleCI.
Documentation:
- Added documentation for the new plugin system and for creating custom operations by modifying the BLIS control tree. (Devin Matthews)
- Updated documentation for downloading BLIS in
README.mdand instructions for maintainers inRELEASING. (Field Van Zee) - Widened print format in code examples to avoid misinterpretation of results. (Minh Quan Ho, Mason McBride)
BLIS 1.2
This release contains several new features and optimizations related to threaded execution, as well as internal changes that improve maintainability and lay the groundwork for future refactoring. The build system and kernel sets saw lots of new code and tweaks to old code, and of course there were many bugfixes.
Improvements present in 1.2 (June 25, 2025):
Compatibility:
gemmtraliases for thegemmtBLAS and CBLAS compatibility functions have been added to support recent versions of LAPACK. (Mo Zhou)
Kernels:
- Fixed bug affecting reference kernels with clang 14.
- Fixed an incompatibility between the
haswellgemmsupkernels and gcc 15. (Dave Love, Christopher Hillenbrand)
Build system:
- Disabled
armsveon Windows due to build failures. (Hernan Martinez, Atsushi Tatsuma) - Moved
#include <omp.h>fromblis.hto the relevant source files. (Melven Roehrig-Zoellner) - Disable building KNL with gcc 15. (Dave Love)
Testing:
- CI testing infrastructure has moved to CircleCI.
Documentation:
- Widened print format in code examples to avoid misinterpretation of results. (Minh Quan Ho, Mason McBride)
Improvements present in 1.1 (January 15, 2025):
Compatibility:
- Added a ScaLAPACK compatibility mode which disables some conflicting BLAS definitions.
- Fixed issues with improperly escaped strings in python scripts for compatibility with python 3.12+. (@AngryLoki)
Kernels:
- Fixed an out-of-bounds read bug in the
haswellgemmsupkernels. (John Mather) - Fixed a bug in the complex-domain
gemmkernels forpiledriver. (@rmast)
Improvements present in 1.0 (May 6, 2024):
Framework:
- Initialize/finalize BLIS via a new
bli_pthread_switch_tAPI. (Field Van Zee, Devin Matthews) - Revamped
bli_init()to use TLS where feasible. (Field Van Zee, Edward Smyth, Minh Quan Ho) - Implemented support for fat multithreading.
- Implemented tile-level load balancing (tlb), or tile-level partitioning, in jr/ir loops for
gemm,gemmt, andtrmmmacrokernels. (Field Van Zee, Devin Matthews, Leick Robinson, Minh Quan Ho) - Added padding to
thrcomm_tfields to avoid false sharing of cache lines. (Leick Robinson) - Rewrote/fixed broken tree barrier implementation. (Leick Robinson)
- Refactored some
rntm_tmanagement code. (Field Van Zee, Devin Matthews) - Initialize
rntm_tnt/ways fields with 1 (not -1). (Field Van Zee, Jeff Diamond, Leick Robinson, Devin Matthews) - Defined
invscalv,invscalm,invscaldoperations. - Added consistent
NaN/Infhandling insumsqv. (Devin Matthews) - Implemented support for HPX as a threading backend option. (Christopher Taylor, Srinivas Yadav)
- Relocated the pba, sba pool (from the
rntm_t), andmem_t(from thecntl_t) to thethrinfo_tobject. - Modified which communicator is associated with a given node of the
thrinfo_ttree. (Devin Matthews) - Refactored level-3 thread decorator into two parts: a thread launcher and a function to pass operands. (Devin Matthews)
- Refactored structure awareness in
bli_packm_blk_var1.c. (Devin Matthews) - Reimplemented
bli_l3_determine_kc(). (Devin Matthews) - Implemented
cntx_tpointer caching in gks. (Field Van Zee, Harihara Sudhan S) - Added
constkeyword to pointers in kernel APIs. (Field Van Zee, Nisanth M P) - Migrated all kernel APIs to use
void*pointers. - Defined new global scalar constants:
BLIS_ONE_I,BLIS_MINUS_ONE_I,BLIS_NAN. (Devin Matthews) - Disabled modification of KC in the
gemmsupkernels. (Devin Matthews) - Defined
lt,lte,gt,gteoperations and other miscellaneous updates. - Consolidated
INSERT_macro sets via variadic macros. (Devin Matthews) - De-templatized macrokernels for
gemmt,trmm, andtrsmto match that ofgemm. (Devin Matthews) - De-templatized
bli_l3_sup_var1n2m.cand unified_sup_packm_a/b(). (Devin Matthews) - Fixed 1m enablement for
herk/her2k/syrk/syr2k. (Devin Matthews) - Fixed
trmm[3]/trsmperformance bug introduced incf7d616. (Field Van Zee, Leick Robinson) - Fixed a 1m optimization bug in right-sided
hemm/symm. (Field Van Zee, Nisanth M P) - Fixed a bug in sup threshold registration. (Devin Matthews, Field Van Zee)
- Fixed brokenness in the small block allocator (sba) when the sba is disabled. (Field Van Zee, John Mather)
- Fixed type bug in
bli_cntx_set_ukr_prefs(). (Field Van Zee, Leick Robinson, Devin Matthews, Jeff Diamond) - Fixed incorrect
sizeof(type)in edge case macros. (@moon-chilled) - Fixed bugs and added sanity check in
bli_pool.c. (Devin Matthews) - Fixed a typo in the macro definition for
VEXTRACTF64X2inbli_x86_asm_macros.h. (Harsh Dave) - Fixed a typo in
bli_type_defs.hwhereBLIS_BLAS_INT_TYPE_SIZEwas misspelled. (Devin Matthews) - Typecast
printf()args inbli_thread_range_tlb.cto avoid compiler warnings. (Lee Killough) - Minor tweaks to
bli_l3_check.c. - Partial addition of
constto all interfaces above the (micro)kernels. (Devin Matthews) - Fixed a harmless misspelling of
xpbysin gemm macrokernel. - Various internal API renaming/reorganization.
- Various other fixes.
Compatibility:
- Implemented
[cz]symv_(),[cz]syr_(),[cz]rot_(). (Field Van Zee, James Foster) - Fixed compilation errors when
BLIS_DISABLE_BLAS_DEFSis defined. (Field Van Zee, Edward Smyth, Devin Matthews) - Include
bli_config.hbeforebli_system.hincblas.hso thatBLIS_ENABLE_SYSTEMis defined in time for proper OS detection. (Edward Smyth)
Kernels:
- Updated ARMv8a kernels to fix two prefetching issues and re-enable general stride IO. (Jeff Diamond)
- Restored general storage case to
armsvekernels. (RuQing Xu) - Added arm64
dgemmsupwith extended MR and NR. (RuQing Xu) - Reorganized the way
packmkernels are stored within thecntx_tso that BLIS only stores twopackmkernels per datatype: one for MRxk upanels and one for kxNR upanels. (Devin Matthews) - Fixed bugs in
scal2vreference kernel when alpha == 1. - Fixed out-of-bounds read in
haswellgemmsupkernels. (Daniël de Kok, Bhaskar Nallani, Madeesh Kannan) - Fixed k = 0 edge case in
power10microkernels. (Nisanth M P) - Disabled
power10kernels other thansgemm,dgemm. (Nisanth M P) - Fixed
bli_gemm_small()prototype mismatch. (Jeff Diamond)
Extras:
- Use the conventional level-3 sup thread decorator within the
gemmlikesandbox. - Fixed type-mismatch errors in
power10sandbox. (Nisanth M P) - Fixed
gemmlikesandbox bug that stems from reuse ofbli_thrinfo_sup_grow().
Build system:
- Added two arm64 subconfigs:
altraandaltramax. (Jeff Diamond, Leick Robinson) - Added support for RISC-V configuration targets. (Angelika Schwarz, Lee Killough)
- Auto-detect the RISC-V ABI of the compiler and use
-mabi=during RISC-V builds. (Lee Killough) - Added
sifive_x280subconfig and kernel set. (Aaron Hutchinson, Lee Killough, Devin Matthews, and Angelika Schwarz) - Added AddressSanitizer (--enable-asan) option to
configure. (Devin Matthews) - Added option to disable thread-local storage via
--disable-tls. (Field Van Zee, Nick Knight) - Exclude
-lrton Android with Bionic libraries. (Lee Killough) - Omit
-fPICoption when shared library build is disabled. (Field Van Zee, Nick Knight) - Move
-fPICoption insertion to subconfigs'make_defs.mkfiles. (Field Van Zee, Nick Knight) - Install one-line helper headers to
INCDIRprefix so that user can#include "blis.h"instead of#include <blis/blis.h>and/or"cblas.h"instead of<blis/cblas.h>if CBLAS is enabled). (Field Van Zee, Jed Brown, Devin Matthews, Mo Zhou) - Enhanced detection of Fortran compiler when checking the version string for the purposes of determining a default return convention for complex domain values. (Bart Oldeman)
- Added detection of the NVIDIA nvhpc compiler (
nvc) inconfigure. (Ajay Panyala) - Updated
zen3subconfig to support NVHPC compilers. (Abhishek Bagusetty) - Use kernel CFLAGS for
kernelssubdirs in addons. (AMD, Mithun Mohan) - Created
powerumbrella configuration family (which currently includespower9andpower10subconfigs). (Nisanth M P) - Defined
BLIS_VERSION_STRINGinblis.hinstead of via command line argument during compilation. (Field Van Zee, Mohsen Aznaveh, Tim Davis) - Rewrote
regen-symbols.shasgen-libblis-symbols.sh. (Field Van Zee) - Support
clangtargetting MinGW. (Isuru Fernando) - Added autodetection (via
/proc/cpuinfo) for POWER7, POWER9 and POWER10 microarchitectures. (Alexander Grund) - Added
#linedirectives to flattenedblis.hto facilitate easier debugging. (Devin Matthews) - Added
--nosupand--supshorthand options toconfigure. - Use here-document syntax for
configure --helpoutput. (Lee Killough) - Updated
configureto pass allshellcheckchecks. (Lee Killough) - Tweaks to
.dir-locals.elto enhance emacs formatting of C files. (Lee Killough) - Removed buggy cruft from
power10subconfig. (Field Van Zee, Nicholai Tukanov) - Added missing
#include <io.h>for Windows. (@h-vetinari) - Fixed hardware auto-detection for
firestorm(Apple M1) subconfig. (Devin Matthews) - Fixed bug in detection of Fortran compiler vendor. (Devin Matthews)
- Fixed version check for
znver3, which needs gcc >= 10.3. (Jed Brown) - Fixed typo in
configure --helptext. (Lee Killough) - Fixed warning about regular expressions with stray backslashes as the result of recent changes to
grep. - Added
output.testsuiteto.gitignore. - Minor changes to .gitignore and LICENSE files. (Jeff Diamond)
- Minor decluttering of top-level directory.
- Very minor tweaks to common.mk.
Testing:
- Rewrote
test/3drivers to take parameters via command line arguments. (Field Van Zee, Jeff Diamond, Leick Robinson) - Added
arm64entry to.travis.ymlso that T...
BLIS 0.9.0
This release contains a slew of improvements, new kernels and APIs, bugfixes, and more (including lots of code reduction). It also contains foundational support for an exciting new class of expert functionality: creating new operations without the need to duplicate the middleware that sits between the API and kernels.
Improvements present in 0.9.0:
Framework:
- Added various fields to
obj_tthat relate to storing function pointers to custompackmkernels, microkernels, etc as well as accessor functions to set and query those fields. (Devin Matthews) - Enabled user-customized
packmmicrokernels and variants via the aforementioned newobj_tfields. (Devin Matthews) - Moved edge-case handling out of the macrokernel and into the
gemmandgemmtrsmmicrokernels. This also required updating of APIs and definitions of all existing microkernels inkernelsdirectory. Edge-case handling functionality is now facilitated via new preprocessor macros found inbli_edge_case_macro_defs.h. (Devin Matthews) - Avoid
gemmsupthread barriers when not packing A or B. This boosts performance for many small multithreaded problems. (Field Van Zee, AMD) - Allow the 1m method to operate normally when single and double real-domain microkernels mix row and column I/O preference. (Field Van Zee, Devin Matthews, RuQing Xu)
- Removed support for execution of complex-domain level-3 operations via the 3m and 4m methods.
- Refactored
herk,her2k,syrk,syr2kin terms ofgemmt. (Devin Matthews) - Defined
setijvandgetijvto set/get vector elements. - Defined
eqsc,eqv, andeqmoperations to test equality between two scalars, vectors, or matrices. - Added new bounds checking to
setijmandgetijmto prevent use of negative indices. - Renamed
membrkfiles/variables/functions topba. - Store error-checking level as a thread-local variable. (Devin Matthews)
- Add
err_t*"return" parameter tobli_malloc_*()and friends. - Switched internal mutexes of the
sbaandpbato static initialization. - Changed return value method of
bli_pack_get_pack_a(),bli_pack_get_pack_b(). - Fixed a bug that allows
bli_init()to be called more than once (without segfaulting). (@lschork2, Minh Quan Ho, Devin Matthews) - Removed a sanity check in
bli_pool_finalize()that prevented BLIS from being re-initialized. (AMD) - Fixed insufficient
pool_t-growing logic inbli_pool.c, and always allocate at least one element in.block_ptrsarray. (Minh Quan Ho) - Cleanups related to the error message array in
bli_error.c. (Minh Quan Ho) - Moved language-related definitions from
bli_macro_defs.hto a new header,bli_lang_defs.h. - Renamed
BLIS_SIMD_NUM_REGISTERStoBLIS_SIMD_MAX_NUM_REGISTERSandBLIS_SIMD_SIZEtoBLIS_SIMD_MAX_SIZEfor improved clarity. (Devin Matthews) - Many minor bugfixes.
- Many cleanups, including removal of old and commented-out code.
Compatibility:
- Expanded BLAS layer to include support for
?axpby_()and?gemm_batch_(). (Meghana Vankadari, AMD) - Added
gemm3mAPIs to BLAS and CBLAS layers. (Bhaskar Nallani, AMD) - Handle
?gemm_()invocations where m or n is unit by calling?gemv_(). (Dipal M Zambare, AMD) - Removed option to finalize BLIS after every BLAS call.
- Updated default definitions of
bli_slamch()andbli_dlamch()to use constants from standard C library rather than values computed at runtime. (Devin Matthews)
Kernels:
- Added 512-bit SVE-based
a64fxsubconfiguration that uses empirically-tuned blocksizes (Stepan Nassyr, RuQing Xu) - Added a vector-length agnostic
armsvesubconfig that computes blocksizes via an analytical model. (Stepan Nassyr) - Added vector-length agnostic d/s/sh
gemmkernels for Arm SVE. (Stepan Nassyr) - Added
gemmsupkernels to thearmv8akernel set for use in new Apple Firestorm subconfiguration. (RuQing Xu) - Added 512-bit SVE
dpackmkernels (16xk and 10xk) with in-register transpose. (RuQing Xu) - Extended 256-bit SVE
dpackmkernels by Linaro Ltd. to 512-bit for size 12xk. (RuQing Xu) - Reorganized register usage in
bli_gemm_armv8a_asm_d6x8.cto accommodate clang. (RuQing Xu) - Added
saxpyf/daxpyf/caxpyfkernels tozenkernel set. (Dipal M Zambare, AMD) - Added
vzeroupperinstruction tohaswellmicrokernels. (Devin Matthews) - Added explicit
beta == 0handling in s/darmsveandarmv7agemmmicrokernels. (Devin Matthews) - Added a unique tag to branch labels to accommodate clang. (Devin Matthews, Jeff Hammond)
- Fixed a copy-paste bug in the loading of
kappa_iin the two assemblycpackmkernels inhaswellkernel set. (Devin Matthews) - Fixed a bug in Mx1
gemmsuphaswellkernels whereby thevhaddpdinstruction is used with uninitialized registers. (Devin Matthews) - Fixed a bug in the
power10microkernel I/O. (Nicholai Tukanov) - Many other Arm kernel updates and fixes. (RuQing Xu)
Extras:
- Added support for addons, which are similar to sandboxes but do not require the user to implement any particular operation.
- Added a new
gemmlikesandbox to allow rapid prototyping ofgemm-like operations. - Various updates and improvements to the
power10sandbox, including a new testsuite. (Nicholai Tukanov)
Build system:
- Added explicit support for AMD's Zen3 microarchitecture. (Dipal M Zambare, AMD, Field Van Zee)
- Added runtime microarchitecture detection for Arm. (Dave Love, RuQing Xu, Devin Matthews)
- Added a new
configureoption--[en|dis]able-amd-frame-tweaksthat allows BLIS to compile certain framework files (each with the_amdsuffix) that have been customized by AMD for improved performance (provided that the targeted configuration is eligible). By default, the more portable counterparts to these files are compiled. (Field Van Zee, AMD) - Added an explicit compiler predicate (
is_win) for Windows inconfigure. (Devin Matthews) - Use
-march=haswellinstead of-march=skylake-avx512on Windows. (Devin Matthews, @h-vetinari) - Fixed
configurebreakage on MacOSX by accepting eitherclangorLLVMin vendor string. (Devin Matthews) - Blacklist clang10/gcc9 and older for
armsvesubconfig. - Added a
configureoption to control whether or not to use@rpath. (Devin Matthews) - Added armclang detection to
configure. (Devin Matthews) - Use
@path-based install name on MacOSX and use relocatableRPATHentries for testsuite binaries. (Devin Matthews) - For environment variables
CC,CXX,FC,PYTHON,AR, andRANLIB,configurewill now print an error message and abort if a user specifies a specific tool and that tool is not found. (Field Van Zee, Devin Matthews) - Added symlink to
blis.pc.infor out-of-tree builds. (Andrew Wildman) - Register optimized real-domain
copyv,setv, andswapvkernels inzensubconfig. (Dipal M Zambare, AMD) - Added Apple Firestorm (A14/M1) subconfiguration,
firestorm. (RuQing Xu) - Added
armsvesubconfig toarm64configuration family. (RuQing Xu) - Allow using clang with the
thunderx2subconfiguration. (Devin Matthews) - Fixed a subtle substitution bug in
configure. (Chengguo Sun) - Updated top-level Makefile to reflect a dependency on the "flat"
blis.hfile for the BLIS and BLAS testsuite objects. (Devin Matthews) - Mark
xerbla_()as a "weak" symbol on MacOSX. (Devin Matthews) - Fixed a long-standing bug in
common.mkwhereby the header path tocblas.hwas omitted from the compiler flags when compiling CBLAS files within BLIS. - Added a custom-made recursive
sedscript tobuilddirectory. - Minor cleanups and fixes to
configure,common.mk, and others.
Testing:
- Fixed a race condition in the testsuite when the SALT option (simulate application-level threading) is enabled. (Devin Matthews)
- Test 1m method execution during
make check. (Devin Matthews) - Test
make installin Travis CI. (Devin Matthews) - Test C++ in Travis CI to make sure
blis.his C++-compatible. (Devin Matthews) - Disabled SDE testing of pre-Zen microarchitectures via Travis CI.
- Added Travis CI support for testing Arm SVE. (RuQing Xu)
- Updated SDE usage so that it is downloaded from a separate repository (ci-utils) in our GitHub organization. (Field Van Zee, Devin Matthews)
- Updated octave scripts in
test/3to be robust against missing datasets as well as to fixed a few minor issues. - Added
test_axpbyv.candtest_gemm_batch.ctest driver files totestdirectory. (Meghana Vankadari, AMD) - Support all four datatypes in
her,her2,herk, andher2kdrivers intestdirectory. (Madan mohan Manokar, AMD)
Documentation:
- Added documentation for:
setijv,getijv,eqsc,eqv,eqm. - Added
docs/Addons.md. - Added dedicated "Performance" and "Example Code" sections to
README.md. - Updated
README.md. - Updated
docs/Sandboxes.md. - Updated
docs/Multithreading.md. (Devin Matthews) - Updated
docs/KernelHowTo.md. - Updated
docs/Performance.mdto report Fujitsu A64fx (512-bit SVE) results. (RuQing Xu) - Updated
docs/Performance.mdto report Graviton2 Neoverse N1 results. (Nicholai Tukanov) - Updated
docs/FAQ.mdwith new questions. - Fixed typos in
docs/FAQ.md. (Gaëtan Cassiers) - Various other minor fixes.