my norm is slow #5218

argriffing · 2014-10-22T16:31:34Z

def norm_1d_inf_axis_1_ad_hoc():
    return [np.max(np.abs(row)) for row in M]

def norm_1d_inf_axis_1_desired():
    return np.linalg.norm(M, ord=np.inf, axis=1)

For M = np.random.randn(4000, 4000) the less-vectorized implementation is a bit faster on my machine (3.9 seconds vs. 4.6 seconds for timeit(..., number=100)). This seems weird to me. Is there a faster way to do what I want?

The text was updated successfully, but these errors were encountered:

juliantaylor · 2014-10-22T17:48:02Z

there are two things that make the ad_hoc variant faster, it is more efficient on the cpu caches and it due to its smaller temporaries do not requires page zeroing in the memory allocator
the latter is the more significant bottleneck factor for these sizes. reusing a pre-existing array as work buffer already almost equalizes the two variants.

  26.47%  ipython  umath.so             [.] DOUBLE_absolute
  23.21%  ipython  [kernel.kallsyms]    [k] clear_page_c
  16.10%  ipython  umath.so             [.] DOUBLE_maximum

one could block the code in norm to make it faster.

argriffing · 2014-10-22T18:49:43Z

I see that 'intel performance primitives' has maxabs, is this something that could make sense as a ufunc in numpy, or is this too much of an "I want to save a few keystrokes" or "I want the composition of every pair of existing ufuncs to be a new ufunc" argument?

juliantaylor · 2014-10-22T19:06:54Z

performancewise it makes a lot of sense, there are numerous of these type of operations that are used often, max_abs, square_sum, abs_sum (or related combined similar operations like min_max, sincos etc.)

For reductions the api could be something along the lines of this:

ufunc.reduce(d, ..., prefilter=ufunc)

a hacky implementation might be to use the data argument of the ufunc innerloop to pass in another ufunc innerloop.
though having it this generic is probably just asking for trouble. Having extra functions might be better.

argriffing · 2014-10-22T19:16:46Z

maxabs is actually on a short list of intel's statistical functions deemed worthy of special-casing, according to https://software.intel.com/en-us/node/502164 which I expect to 404 by the time you read this.

Sum
Max
MaxIndx
MaxAbs
MaxAbsIndx
Min
MinIndx
MinAbs
MinAbsIndx
MinMax
MinMaxIndx
Mean
StdDev
MeanStdDev
Norm
NormDiff
DotProd
MaxEvery, MinEvery
ZeroCrossing
CountInRange

argriffing · 2014-10-22T23:41:20Z

For reference I checked this against a cython implementation that hard-codes the input array ndim as 2 and the axis as 1, and this takes 2.1 seconds, vs. 3.9 for the ad-hoc implementation and 4.6 for the numpy linalg norm.

jaimefrio · 2014-10-27T06:21:51Z

It will not hold for all input shapes, but your original example can be sped up comparably to Cython (by my measurements) by doing the following:

def inf_norm(a, axis=None, keepdims=False):
    mx = a.max(axis=axis, keepdims=keepdims)
    mn = a.min(axis=axis, keepdims=keepdims)
    return np.maximum(np.abs(mx), np.abs(mn))

In [14]: a = np.random.rand(4000, 4000)

In [15]: %timeit np.linalg.norm(a, ord=np.inf, axis=1)
10 loops, best of 3: 85.4 ms per loop

In [16]: %timeit inf_norm(a, axis=1)
10 loops, best of 3: 48.6 ms per loop

It is of course substantially slower for a worst case shape:

In [17]: b = np.random.rand(4000, 1)

In [18]: %timeit np.linalg.norm(b, ord=np.inf, axis=1)
100000 loops, best of 3: 13.7 us per loop

In [19]: %timeit inf_norm(b, axis=1)
10000 loops, best of 3: 29.2 us per loop

So while I am not sure that such a change makes sense, I tend to think that it does. Any thoughts?

argriffing · 2014-10-27T14:10:41Z

I'd rather not slow down the worst-case shape.

solarjoe · 2015-03-12T08:40:20Z

Are the things mentioned here (worse shapes) also the reason why calculating the 2-norm using

np.sqrt(((a[...,0]**2+a[...,1]**2+a[...,2]**2)))

is significantly faster then the numpy methods

 np.linalg.norm(a, axis=-1)

or its underlying

np.sqrt(np.add.reduce((a.conj() * a).real, axis=0))

?

seberg · 2021-02-24T21:31:08Z

I am going to close this in favor of gh-18483, since optimized functions such as maxabs where mentioned here as well. I expect there is a bit more to this issue, but please reopen or create a new issue it when it comes up again!

argriffing mentioned this issue Jan 27, 2015

ENH/WIP: add a max_abs ufunc #5509

Closed

mattip added 01 - Enhancement component: benchmarks labels Aug 8, 2018

jakirkham mentioned this issue Aug 17, 2018

ENH: Create the ability for fused operations (fused ufuncs or map_reduce) style #11622

Open

seberg closed this as completed Feb 24, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

my norm is slow #5218

my norm is slow #5218

argriffing commented Oct 22, 2014

juliantaylor commented Oct 22, 2014

Uh oh!

argriffing commented Oct 22, 2014

Uh oh!

juliantaylor commented Oct 22, 2014

Uh oh!

argriffing commented Oct 22, 2014

Uh oh!

argriffing commented Oct 22, 2014

Uh oh!

jaimefrio commented Oct 27, 2014

Uh oh!

argriffing commented Oct 27, 2014

Uh oh!

solarjoe commented Mar 12, 2015

Uh oh!

seberg commented Feb 24, 2021

Uh oh!

Uh oh!

my norm is slow #5218

my norm is slow #5218

Comments

argriffing commented Oct 22, 2014

juliantaylor commented Oct 22, 2014

Uh oh!

argriffing commented Oct 22, 2014

Uh oh!

juliantaylor commented Oct 22, 2014

Uh oh!

argriffing commented Oct 22, 2014

Uh oh!

argriffing commented Oct 22, 2014

Uh oh!

jaimefrio commented Oct 27, 2014

Uh oh!

argriffing commented Oct 27, 2014

Uh oh!

solarjoe commented Mar 12, 2015

Uh oh!

seberg commented Feb 24, 2021

Uh oh!