Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Numpy.dot silently returns all zero matrix when the out argument is the same as the input. #8440

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
se4u opened this issue Jan 1, 2017 · 9 comments

Comments

@se4u
Copy link

se4u commented Jan 1, 2017

Currently the numpy dot operation allows for specifying the out parameter. Although the documentation warns people that this is a performance feature and therefore this code will throw an exception if the out argument does not have the right type, it does not throw an exception if someone tries to overwrite the original matrix.

The ideal fix for this will be to do something smart, by doing the matrix multiplication in a memory efficient way by keeping only a column/row of scratch space [1], but short of that it will be better to throw an exception in case someone provides the out matrix as the same as either of the two matrices being multiplied instead of returning an all zero matrix and turning all the values in the input to zero.

The patch below suggests one possible error message that can be shown to people, and the python session illustrates the current wrong behavior.

diff --git a/numpy/ma/core.py b/numpy/ma/core.py
index 4466dc0..6d9c53b 100755
--- a/numpy/ma/core.py
+++ b/numpy/ma/core.py
@@ -7307,6 +7307,10 @@ def dot(a, b, strict=False, out=None):
     am = ~getmaskarray(a)
     bm = ~getmaskarray(b)
 
+    if out is a or out is b:
+        raise ValueError("The out matrix is the same as the input. "
+                         "The multiplication output will be zero "
+                         "and this is definitely not what you want to do.")
     if out is None:
         d = np.dot(filled(a, 0), filled(b, 0))
         m = ~np.dot(am, bm)
>>> import numpy.random
>>> import numpy
>>> def f():
  a = numpy.random.randn(3,3)           
  b = numpy.random.randn(3,3)           
  c = numpy.dot(a,b)            
  return a,b,c

>>> a,b,c = f()
>>> numpy.dot(a,b,out=b)
array([[ 0.,  0.,  0.],
       [ 0.,  0.,  0.],
       [ 0.,  0.,  0.]])
>>> a,b,c = f()
>>> numpy.dot(a,b,out=a)
array([[ 0.,  0.,  0.],
       [ 0.,  0.,  0.],
       [ 0.,  0.,  0.]])
>>> a,b,c = f()
>>> numpy.dot(a,a,out=a)
array([[ 0.,  0.,  0.],
       [ 0.,  0.,  0.],
       [ 0.,  0.,  0.]])
>>> 
>>> print c
[[ 0.89307654  0.55849275 -0.57240046]
 [-1.93567811 -0.75110132  1.60961766]
 [ 0.85899293  1.16581478 -0.8796278 ]]

[1] I looked into scipy.lapack.blas and it exposes the *trmm methods which allow for inplace modification of output when the input matrix is triangular but there is nothing for general rectangular times square matrix.

@charris
Copy link
Member

charris commented Jan 2, 2017

What numpy version?

@se4u
Copy link
Author

se4u commented Jan 2, 2017

I am using the py27_0 build of numpy 1.11.2 from conda

$ conda list numpy                                          
numpy                     1.11.2                   py27_0   

@njsmith
Copy link
Member

njsmith commented Jan 2, 2017

Does #8043 fix this? (The patch is long, and I'm not sure how dot does iteration...)

@se4u
Copy link
Author

se4u commented Jan 2, 2017

I skimmed #8043 (and the associated #1683) and those changes seem to be fixing an orthogonal issue, I think numpy.dot will be unaffected by those changes. Certainly none of the test cases in that pull request test for this case.

FWIW, since this feature of in place multiplication of a tall thin matrix A with a square matrix B was important to me, I wrote the following cython code that repeatedly calls sgemv to compute the matrix product of a rectangular and square matrix. Currently, I am assuming that A is stored in C contiguous format, and B is stored in F contiguous format, in order to have cache locality, but maybe this code could be generalised to handle both storage formats and to use the appropriate *gemm method.

@cython.initializedcheck(False)
@cython.wraparound(False)
@cython.boundscheck(False)
@cython.overflowcheck(False)
cdef np.ndarray[float,ndim=2] matrix_multiply_impl1(
    np.ndarray[float,ndim=2] a,
    np.ndarray[float,ndim=2] b):
    cdef:
        unsigned int i = 0
        char trans = 't'
        int m=b.shape[0], n=b.shape[1], incx=1, incy=1
        int lda=m
        float alpha=1, x, beta=0
        np.ndarray[float,ndim=1] y = np.zeros((m,), dtype='float32', order='C')
    for i in range(a.shape[0]):
        # (char *trans, int *m, int *n, float *alpha, float *a, int *lda, float *x, int *incx, float *beta, float *y, int *incy)
        blas.sgemv(&trans, &m, &n, &alpha, &b[0, 0], &lda, &a[i,0], &incx, &beta, &y[0], &incy)
        a[i,:] = y
    return a


def matmul(a, b, method=1):
    assert method == 1
    a=np.ascontiguousarray(a)
    b=np.asfortranarray(b)
    return matrix_multiply_impl1(a,b)

@njsmith
Copy link
Member

njsmith commented Jan 3, 2017

In general the rule in numpy has been that passing overlapping arrays as both inputs and outputs produces undefined behavior. This is because there simply wasn't any fast and reliable way to detect whether this was happening, so rather than slow everything down with super-expensive checks we just punted and made it the user's problem. (It's not trivial: consider things like dot(a.T, b, out=a) or dot(a[:, ::2], b, out=a). In fact, detecting whether two arbitrary numpy arrays overlap is NP-hard.) We recently did gain the ability to detect these cases, and #8043 is the PR to start using it in some cases (but I think not dot? I'm not sure).

So for dot, the first question is what the semantics should be. There's a kind of hierarchy of complexity here. When there's overlap between input and output, options from easier to harder are:

  • Return nonsense
  • Error out
  • Make a temporary copy of the overlapping arrays, so you get the right answer expect but there's no efficiency gain
  • Add special case code to detect particular patterns of overlap and optimize them to use less scratch space (like your contiguous dot(a, b, out=a) case). There's no general solution better than making a full temporary copy, though, because of cases like dot(a.T, b, out=a).

Right now we're at step (0) on this list. Each item on the list is strictly more complicated to implement than the one before, so incremental progress means moving down the list step-by-step.

@pv
Copy link
Member

pv commented Jan 10, 2017

gh-8043 only deals with ufuncs. If dot is supposed to behave similarly as ufuncs here, a temporary copy should be made (but that's a matter for a separate PR).

@realitix
Copy link

Hello, I lost 3 hours today because of this.
I agree with @njsmith about performance and useless checks, but at least, documentation must be updated with a big warning.
How can I do an "in-place" dot if I can't pass the same array as out? Why is it forbidden?

Thanks.

@se4u
Copy link
Author

se4u commented Jan 18, 2017 via email

@realitix
Copy link

Thanks @se4u

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants