ENH: AVX support for exp/log for strided float32 arrays #13581

r-devulap · 2019-05-18T01:03:45Z

(1) AVX2 and AVX512F supports gather_ps instruction to load strided data
into a register. Rather than resort to scalar, these provide good
benefit when the input array is strided.

(2) Added some tests to validate AVX based algorithms for exp and log.
The tests compare the output of float32 against glibc's float64
implementation and verify maxulp error.

(1) AVX2 and AVX512F supports gather_ps instruction to load strided data into a register. Rather than resort to scalar, these provide good benefit when the input array is strided. (2) Added some tests to validate AVX based algorithms for exp and log. The tests compare the output of float32 against glibc's float64 implementation and verify maxulp error.

r-devulap · 2019-05-18T01:13:54Z

This patch address some of the performance concerns discussed in ENH: always use buffered iterator for vectorized math #13557
I have added tests to validate the output of AVX based exp/log functions that measure ulp error. Perhaps helps with ENH: confirm accuracy of basic math functions #13515
Some benchmark numbers (output array size = 10000 for all evaluations)

function	stride	SIMD AVX512	SIMD AVX2	Glibc scalar
exp	1	7.05±0.01us	10.9±0.02us	33.3±0.3us
exp	2	7.75±0.03us	12.7±0.03us	33.6±0.2us
exp	4	7.80±0.01us	12.7±0us	33.7±0.01us
exp	8	8.02±0.01us	13.1±0.03us	34.3±0.2us
exp	16	9.47±0.30us	14.2±0.2us	39.4±0.2us

function	stride	SIMD AVX512	SIMD AVX2	Glibc scalar
log	1	7.04±0.01us	16.6±0.06us	38.1±0.2us
log	2	8.09±0.05us	18.7±0.2us	38.1±0.09us
log	4	8.12±0.05us	19.0±0.1us	38.3±0.03us
log	8	8.38±0.01us	19.8±0.4us	39.1±0.04us
log	16	9.68±0.08us	21.2±0.1us	42.9±0.6us

r-devulap · 2019-05-18T15:09:34Z

Seems like a random CI failure nothing to do with the patch:
curl: (56) GnuTLS recv error (-54): Error in the pull function. Unable to download 3.5 archive. The archive may not exist. Please consider a different version.

numpy/core/src/umath/simd.inc.src

juliantaylor · 2019-05-18T15:17:34Z

numpy/core/src/umath/simd.inc.src

-        @vtype@ x  = @isa@_masked_load(load_mask, ip);
+                                                    num_lanes);
+        @vtype@ x;
+        if (stride == 1)


always put { } around loops and conditions also if it is only one line

fixed this.

sorry should have been more specific the numpy style for conditionals and loops is:

if () { } else { }

ah sorry about that, should be more consistent now.

numpy/core/src/umath/simd.inc.src

juliantaylor · 2019-05-18T15:46:07Z

numpy/core/tests/test_umath.py

+        strides = np.random.randint(low=-100, high=100, size=100)
+        sizes = np.random.randint(low=1, high=2000, size=100)
+        for ii in sizes:
+            x_f32 = np.float32(np.random.uniform(low=0.01,high=88.1,size=ii))


denormal floats are excluded here. how is the accuracy for these?

These tests are meant to be like a sanity check and are not comprehensive at all. It will be slow to test for a large sample of float32's. But the MAXULP error of 2.6 and 3.9 hold even for denormals (this is something I validated separately by enumerating all float32 numbers).

juliantaylor · 2019-05-18T15:57:17Z

looks good.

I get very different performance numbers on my AMD Ryzen 7 1700X with glibc 2.27

import numpy as np
d = np.ones(10000, dtype=np.float32) * 2.132

%timeit np.exp(d)
%timeit np.log(d)

function	avx2	glibc scalar
exp	31us	33us
log	33us	37us

on an old intel i5-4310M with the same os it is significantly faster

function	avx2	glibc scalar
exp	196us	369us
log	240us	440us

juliantaylor · 2019-05-18T17:00:17Z

numpy/core/src/umath/simd.inc.src

    const npy_int num_lanes = @BYTES@/sizeof(npy_float);
+    npy_int indexarr[16];
+    for (npy_int ii = 0; ii < 16; ii++)


also add braces

for () { }

juliantaylor · 2019-05-18T17:00:51Z

numpy/core/src/umath/simd.inc.src

    const npy_int num_lanes = @BYTES@/sizeof(npy_float);
    npy_float xmax = 88.72283935546875f;
    npy_float xmin = -87.3365478515625f;
+    npy_int indexarr[16];
+    for (npy_int ii = 0; ii < 16; ii++)


also add braces

for () { }

charris · 2019-05-19T16:41:05Z

As a point of interest, are these functions currently working correctly in master?

r-devulap · 2019-05-19T20:30:57Z

As a point of interest, are these functions currently working correctly in master?

Yes, this patch was only addressing performance issues for strided arrays.

mattip · 2019-05-19T20:32:55Z

It seems there is an issue with clang 7.0, see #13586

juliantaylor reviewed May 18, 2019

View reviewed changes

numpy/core/src/umath/simd.inc.src Show resolved Hide resolved

juliantaylor reviewed May 18, 2019

View reviewed changes

numpy/core/src/umath/simd.inc.src Show resolved Hide resolved

juliantaylor reviewed May 18, 2019

View reviewed changes

r-devulap force-pushed the gather-for-avxsimd branch from 7fc34bb to 3251618 Compare May 18, 2019 16:17

juliantaylor reviewed May 18, 2019

View reviewed changes

r-devulap force-pushed the gather-for-avxsimd branch from 3251618 to 59b2a1d Compare May 18, 2019 20:02

BUG: fixing build issues on clang6.0

59b2a1d

charris added 01 - Enhancement component: numpy._core labels May 19, 2019

juliantaylor merged commit 433d8b2 into numpy:master May 19, 2019

mattip mentioned this pull request May 25, 2019

exp / log / pow for complex<float> xtensor-stack/xsimd#139

Open

rgommers added the component: SIMD Issues in SIMD (fast instruction sets) code or machinery label Jul 12, 2022

Uh oh!

ENH: AVX support for exp/log for strided float32 arrays #13581

ENH: AVX support for exp/log for strided float32 arrays #13581

Uh oh!

Conversation

r-devulap commented May 18, 2019

Uh oh!

r-devulap commented May 18, 2019

Uh oh!

r-devulap commented May 18, 2019

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

juliantaylor May 18, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

juliantaylor commented May 18, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

charris commented May 19, 2019

Uh oh!

r-devulap commented May 19, 2019

Uh oh!

mattip commented May 19, 2019

Uh oh!

Uh oh!

juliantaylor May 18, 2019 •

edited

Loading

juliantaylor commented May 18, 2019 •

edited

Loading