Numpy.dot silently returns all zero matrix when the out argument is the same as the input. #8440

se4u · 2017-01-01T23:16:04Z

Currently the numpy dot operation allows for specifying the out parameter. Although the documentation warns people that this is a performance feature and therefore this code will throw an exception if the out argument does not have the right type, it does not throw an exception if someone tries to overwrite the original matrix.

The ideal fix for this will be to do something smart, by doing the matrix multiplication in a memory efficient way by keeping only a column/row of scratch space [1], but short of that it will be better to throw an exception in case someone provides the out matrix as the same as either of the two matrices being multiplied instead of returning an all zero matrix and turning all the values in the input to zero.

The patch below suggests one possible error message that can be shown to people, and the python session illustrates the current wrong behavior.

diff --git a/numpy/ma/core.py b/numpy/ma/core.py
index 4466dc0..6d9c53b 100755
--- a/numpy/ma/core.py
+++ b/numpy/ma/core.py
@@ -7307,6 +7307,10 @@ def dot(a, b, strict=False, out=None):
     am = ~getmaskarray(a)
     bm = ~getmaskarray(b)
 
+    if out is a or out is b:
+        raise ValueError("The out matrix is the same as the input. "
+                         "The multiplication output will be zero "
+                         "and this is definitely not what you want to do.")
     if out is None:
         d = np.dot(filled(a, 0), filled(b, 0))
         m = ~np.dot(am, bm)

>>> import numpy.random
>>> import numpy
>>> def f():
  a = numpy.random.randn(3,3)           
  b = numpy.random.randn(3,3)           
  c = numpy.dot(a,b)            
  return a,b,c

>>> a,b,c = f()
>>> numpy.dot(a,b,out=b)
array([[ 0.,  0.,  0.],
       [ 0.,  0.,  0.],
       [ 0.,  0.,  0.]])
>>> a,b,c = f()
>>> numpy.dot(a,b,out=a)
array([[ 0.,  0.,  0.],
       [ 0.,  0.,  0.],
       [ 0.,  0.,  0.]])
>>> a,b,c = f()
>>> numpy.dot(a,a,out=a)
array([[ 0.,  0.,  0.],
       [ 0.,  0.,  0.],
       [ 0.,  0.,  0.]])
>>> 
>>> print c
[[ 0.89307654  0.55849275 -0.57240046]
 [-1.93567811 -0.75110132  1.60961766]
 [ 0.85899293  1.16581478 -0.8796278 ]]

[1] I looked into scipy.lapack.blas and it exposes the *trmm methods which allow for inplace modification of output when the input matrix is triangular but there is nothing for general rectangular times square matrix.

The text was updated successfully, but these errors were encountered:

charris · 2017-01-02T00:17:10Z

What numpy version?

se4u · 2017-01-02T03:40:28Z

I am using the py27_0 build of numpy 1.11.2 from conda

$ conda list numpy                                          
numpy                     1.11.2                   py27_0

njsmith · 2017-01-02T08:39:45Z

Does #8043 fix this? (The patch is long, and I'm not sure how dot does iteration...)

se4u · 2017-01-02T09:13:53Z

I skimmed #8043 (and the associated #1683) and those changes seem to be fixing an orthogonal issue, I think numpy.dot will be unaffected by those changes. Certainly none of the test cases in that pull request test for this case.

FWIW, since this feature of in place multiplication of a tall thin matrix A with a square matrix B was important to me, I wrote the following cython code that repeatedly calls sgemv to compute the matrix product of a rectangular and square matrix. Currently, I am assuming that A is stored in C contiguous format, and B is stored in F contiguous format, in order to have cache locality, but maybe this code could be generalised to handle both storage formats and to use the appropriate *gemm method.

@cython.initializedcheck(False)
@cython.wraparound(False)
@cython.boundscheck(False)
@cython.overflowcheck(False)
cdef np.ndarray[float,ndim=2] matrix_multiply_impl1(
    np.ndarray[float,ndim=2] a,
    np.ndarray[float,ndim=2] b):
    cdef:
        unsigned int i = 0
        char trans = 't'
        int m=b.shape[0], n=b.shape[1], incx=1, incy=1
        int lda=m
        float alpha=1, x, beta=0
        np.ndarray[float,ndim=1] y = np.zeros((m,), dtype='float32', order='C')
    for i in range(a.shape[0]):
        # (char *trans, int *m, int *n, float *alpha, float *a, int *lda, float *x, int *incx, float *beta, float *y, int *incy)
        blas.sgemv(&trans, &m, &n, &alpha, &b[0, 0], &lda, &a[i,0], &incx, &beta, &y[0], &incy)
        a[i,:] = y
    return a


def matmul(a, b, method=1):
    assert method == 1
    a=np.ascontiguousarray(a)
    b=np.asfortranarray(b)
    return matrix_multiply_impl1(a,b)

njsmith · 2017-01-03T01:32:44Z

In general the rule in numpy has been that passing overlapping arrays as both inputs and outputs produces undefined behavior. This is because there simply wasn't any fast and reliable way to detect whether this was happening, so rather than slow everything down with super-expensive checks we just punted and made it the user's problem. (It's not trivial: consider things like dot(a.T, b, out=a) or dot(a[:, ::2], b, out=a). In fact, detecting whether two arbitrary numpy arrays overlap is NP-hard.) We recently did gain the ability to detect these cases, and #8043 is the PR to start using it in some cases (but I think not dot? I'm not sure).

So for dot, the first question is what the semantics should be. There's a kind of hierarchy of complexity here. When there's overlap between input and output, options from easier to harder are:

Return nonsense
Error out
Make a temporary copy of the overlapping arrays, so you get the right answer expect but there's no efficiency gain
Add special case code to detect particular patterns of overlap and optimize them to use less scratch space (like your contiguous dot(a, b, out=a) case). There's no general solution better than making a full temporary copy, though, because of cases like dot(a.T, b, out=a).

Right now we're at step (0) on this list. Each item on the list is strictly more complicated to implement than the one before, so incremental progress means moving down the list step-by-step.

pv · 2017-01-10T20:42:07Z

gh-8043 only deals with ufuncs. If dot is supposed to behave similarly as ufuncs here, a temporary copy should be made (but that's a matter for a separate PR).

realitix · 2017-01-18T09:11:26Z

Hello, I lost 3 hours today because of this.
I agree with @njsmith about performance and useless checks, but at least, documentation must be updated with a big warning.
How can I do an "in-place" dot if I can't pass the same array as out? Why is it forbidden?

Thanks.

se4u · 2017-01-18T10:38:52Z

If your matrix is triangular and in fortran format then you can use the wrapper of `trmv` found in scipy.linalg.blas. Otherwise, you'll have to loop through the array on your own, basically do the multiplication one row/column at a time so that the row/column that you overwrote is not reused later on. My gist linked above shows a way of doing this in cython.

…

On Wed, Jan 18, 2017 at 4:11 AM, Jean-Sébastien B. ***@***.*** > wrote: Hello, I lost 3 hours today because of this. I agree with @njsmith <https://github.com/njsmith> about performance and useless checks, but at least, documentation must be updated with a big warning. How can I do an "in-place" dot if I can't pass the same array as out? Why is it forbidden? Thanks. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#8440 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ACMdtLTIoAasAHCEgOZP8_ABPpdOialwks5rTddDgaJpZM4LYytM> .

realitix · 2017-01-18T10:52:59Z

Thanks @se4u

pv mentioned this issue Jan 28, 2017

BUG: core: in dot(), make copies if out has memory overlap with input #8539

Merged

charris closed this as completed in #8539 Jan 31, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Numpy.dot silently returns all zero matrix when the out argument is the same as the input. #8440

Numpy.dot silently returns all zero matrix when the out argument is the same as the input. #8440

se4u commented Jan 1, 2017

charris commented Jan 2, 2017

Uh oh!

se4u commented Jan 2, 2017 •

edited

Loading

Uh oh!

njsmith commented Jan 2, 2017

Uh oh!

se4u commented Jan 2, 2017 •

edited

Loading

Uh oh!

njsmith commented Jan 3, 2017

Uh oh!

pv commented Jan 10, 2017 •

edited

Loading

Uh oh!

realitix commented Jan 18, 2017

Uh oh!

se4u commented Jan 18, 2017 via email

Uh oh!

realitix commented Jan 18, 2017

Uh oh!

Uh oh!

Numpy.dot silently returns all zero matrix when the out argument is the same as the input. #8440

Numpy.dot silently returns all zero matrix when the out argument is the same as the input. #8440

Comments

se4u commented Jan 1, 2017

charris commented Jan 2, 2017

Uh oh!

se4u commented Jan 2, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

njsmith commented Jan 2, 2017

Uh oh!

se4u commented Jan 2, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

njsmith commented Jan 3, 2017

Uh oh!

pv commented Jan 10, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

realitix commented Jan 18, 2017

Uh oh!

se4u commented Jan 18, 2017 via email

Uh oh!

realitix commented Jan 18, 2017

Uh oh!

se4u commented Jan 2, 2017 •

edited

Loading

se4u commented Jan 2, 2017 •

edited

Loading

pv commented Jan 10, 2017 •

edited

Loading