Thanks to visit codestin.com
Credit goes to github.com

Skip to content

my norm is slow #5218

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
argriffing opened this issue Oct 22, 2014 · 9 comments
Closed

my norm is slow #5218

argriffing opened this issue Oct 22, 2014 · 9 comments

Comments

@argriffing
Copy link
Contributor

def norm_1d_inf_axis_1_ad_hoc():
    return [np.max(np.abs(row)) for row in M]

def norm_1d_inf_axis_1_desired():
    return np.linalg.norm(M, ord=np.inf, axis=1)

For M = np.random.randn(4000, 4000) the less-vectorized implementation is a bit faster on my machine (3.9 seconds vs. 4.6 seconds for timeit(..., number=100)). This seems weird to me. Is there a faster way to do what I want?

@juliantaylor
Copy link
Contributor

there are two things that make the ad_hoc variant faster, it is more efficient on the cpu caches and it due to its smaller temporaries do not requires page zeroing in the memory allocator
the latter is the more significant bottleneck factor for these sizes. reusing a pre-existing array as work buffer already almost equalizes the two variants.

  26.47%  ipython  umath.so             [.] DOUBLE_absolute
  23.21%  ipython  [kernel.kallsyms]    [k] clear_page_c
  16.10%  ipython  umath.so             [.] DOUBLE_maximum

one could block the code in norm to make it faster.

@argriffing
Copy link
Contributor Author

I see that 'intel performance primitives' has maxabs, is this something that could make sense as a ufunc in numpy, or is this too much of an "I want to save a few keystrokes" or "I want the composition of every pair of existing ufuncs to be a new ufunc" argument?

@juliantaylor
Copy link
Contributor

performancewise it makes a lot of sense, there are numerous of these type of operations that are used often, max_abs, square_sum, abs_sum (or related combined similar operations like min_max, sincos etc.)

For reductions the api could be something along the lines of this:

ufunc.reduce(d, ..., prefilter=ufunc)

a hacky implementation might be to use the data argument of the ufunc innerloop to pass in another ufunc innerloop.
though having it this generic is probably just asking for trouble. Having extra functions might be better.

@argriffing
Copy link
Contributor Author

maxabs is actually on a short list of intel's statistical functions deemed worthy of special-casing, according to https://software.intel.com/en-us/node/502164 which I expect to 404 by the time you read this.

Sum
Max
MaxIndx
MaxAbs
MaxAbsIndx
Min
MinIndx
MinAbs
MinAbsIndx
MinMax
MinMaxIndx
Mean
StdDev
MeanStdDev
Norm
NormDiff
DotProd
MaxEvery, MinEvery
ZeroCrossing
CountInRange

@argriffing
Copy link
Contributor Author

For reference I checked this against a cython implementation that hard-codes the input array ndim as 2 and the axis as 1, and this takes 2.1 seconds, vs. 3.9 for the ad-hoc implementation and 4.6 for the numpy linalg norm.

@jaimefrio
Copy link
Member

It will not hold for all input shapes, but your original example can be sped up comparably to Cython (by my measurements) by doing the following:

def inf_norm(a, axis=None, keepdims=False):
    mx = a.max(axis=axis, keepdims=keepdims)
    mn = a.min(axis=axis, keepdims=keepdims)
    return np.maximum(np.abs(mx), np.abs(mn))

In [14]: a = np.random.rand(4000, 4000)

In [15]: %timeit np.linalg.norm(a, ord=np.inf, axis=1)
10 loops, best of 3: 85.4 ms per loop

In [16]: %timeit inf_norm(a, axis=1)
10 loops, best of 3: 48.6 ms per loop

It is of course substantially slower for a worst case shape:

In [17]: b = np.random.rand(4000, 1)

In [18]: %timeit np.linalg.norm(b, ord=np.inf, axis=1)
100000 loops, best of 3: 13.7 us per loop

In [19]: %timeit inf_norm(b, axis=1)
10000 loops, best of 3: 29.2 us per loop

So while I am not sure that such a change makes sense, I tend to think that it does. Any thoughts?

@argriffing
Copy link
Contributor Author

I'd rather not slow down the worst-case shape.

@solarjoe
Copy link
Contributor

Are the things mentioned here (worse shapes) also the reason why calculating the 2-norm using

np.sqrt(((a[...,0]**2+a[...,1]**2+a[...,2]**2)))

is significantly faster then the numpy methods

 np.linalg.norm(a, axis=-1)

or its underlying

np.sqrt(np.add.reduce((a.conj() * a).real, axis=0))

?

@seberg
Copy link
Member

seberg commented Feb 24, 2021

I am going to close this in favor of gh-18483, since optimized functions such as maxabs where mentioned here as well. I expect there is a bit more to this issue, but please reopen or create a new issue it when it comes up again!

@seberg seberg closed this as completed Feb 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants