Tags: NCAR/SPERR
Tags
AVX2 optimization (#247) * write AVX version of std::any_of() * clang-format * simplify function signature * improve memory locality in dwt3d and idwt3d cases * get rid of a compiler warning * minor change on function signatures * Add AVX2 version of gather and scatter. It gives 2X-3X speedup on average * use one implementation for gather/scatter * change CDF function signatures to be using pointers * remove two low level functions: m_dwt1d_one_level() and m_idwt1d_one_level() * simplify m_dwt2d_one_level() * simplify m_idwt2d_one_level() * minor improvement * simplify m_dwt3d_one_level() and m_idwt3d_one_level() * remove the odd/even wavelet code; combine AVX2 and regular scatter/gather functions * use aligned memory allocator and aligned store in gather/scatter functions * use stream writes in Bitstream * add memory fence before memory reallocations * new implementation of Bitmask::wbit() that's 2X to 3X faster * add the sign array in morton order * Revert "add the sign array in morton order" because it might even have a slight performance loss. With the added complexity, it's just not worth to have. This reverts commit 4bc3c48. * minor * minor * minor * add a CMake option to enable/disable AVX2, also add AVX2 implementation of function any_ge() * update README with the mention of AVX2 * minor * minor * minor * add a test in SPECK2D_INT_ENC so that it only uses the SIMD sperr::any_ge() with 16 or more elements * minor * very minor * Revert "very minor" This reverts commit bd6586b. * remove the usage of [[unlikely]] attribute * replace a / 64 to a >> 6 * very minor * clang-format --------- Co-authored-by: Samuel Li <Sam@Nevada> Co-authored-by: Samuel Li <sam@think>
PreviousNext