Thanks to visit codestin.com
Credit goes to github.com

Skip to content

save/load and tofile/fromfile fail silently for large arrays on Mac os X #2806

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
lukauskas opened this issue Dec 10, 2012 · 19 comments
Closed

Comments

@lukauskas
Copy link

It seems that saving large numpy arrays to disk using numpy.save fails silently for very large matrices, example:

In [5]: a = np.random.randn(500000000)

In [6]: 

In [6]: a
Out[6]: 
array([-1.00736353, -2.06696394,  1.46569636, ...,  0.89738222,
       -0.06982733,  0.06954417])

In [7]: np.save('a', a)

In [8]: b = np.load('a.npy')

In [9]: b
Out[9]: 
array([-1.00736353, -2.06696394,  1.46569636, ...,  0.        ,
        0.        ,  0.        ])

Note zeros in the end of the matrix that do not exist in a

I'm running numpy version 1.6.2:

In [11]: np.version.version
Out[11]: '1.6.2'

on python 2.7.3 on Mac os X 10.8.2 if that helps

@lukauskas
Copy link
Author

I think this is local to Mac os X and is related to numpy.fromfile failing for large arrays.

See this stackoverflow question:
http://stackoverflow.com/questions/13769545/max-limit-for-file-size-with-np-fromfile

I can verify the bug described in stackoverflow case:

>>> import numpy
>>> a = numpy.random.randn(300000000)
>>> a.tofile('a.tofile')
>>> b = numpy.fromfile('a.tofile', count=int(8e7))
>>> b
array([-0.57060504,  0.32796127, -1.23472672, ...,  0.28363057,
       -1.69623226,  2.36057118])
>>> b = numpy.fromfile('a.tofile', count=int(8e8))
>>> b
array([-0.57060504,  0.32796127, -1.23472672, ...,  0.        ,
        0.        ,  0.        ])
>>> b = numpy.fromfile('a.tofile', count=int(8e9))
>>> b
array([ 0.,  0.,  0., ...,  0.,  0.,  0.])
>>> numpy.fromfile('a.tofile')
array([ 0.,  0.,  0., ...,  0.,  0.,  0.])

It looks like it is a bug with Mac os X fread and might need a similar workaround as provided for issue #2256
There is another bug raised, Issue #574 that asks for a similar workaround for tofile as well.

@lukauskas
Copy link
Author

I think I found a way to fix both of the issues.

The end of array_fromfile_binary in numpy/core/src/multiarray/ctors.c currently looks like this:

NPY_BEGIN_ALLOW_THREADS;
*nread = fread(PyArray_DATA(r), dtype->elsize, num, fp);
NPY_END_ALLOW_THREADS;

changing it into fixes both the issues with save and load and the issues with fromfile/tofile.

    NPY_BEGIN_ALLOW_THREADS;
#if defined(__APPLE__)
            /* Workaround for read failures on OS X. Issue #2806 */
            {
                npy_intp maxsize = 2147483647 / dtype->elsize;
                npy_intp chunksize;

                size_t n = 0;
                size_t n2;

                while (num > 0) {
                    chunksize = (num > maxsize) ? maxsize : num;
                    n2 = fread((const void *)
                             ((char *)PyArray_DATA(r) + (n * dtype->elsize)),
                             dtype->elsize,
                             (size_t) chunksize, fp);
                    if (n2 < chunksize) {
                        break;
                    }
                    n += n2;
                    num -= chunksize;
                }
                *nread = n;
            }
#else
    *nread = fread(PyArray_DATA(r), dtype->elsize, num, fp);
#endif
    NPY_END_ALLOW_THREADS;

now about the number, 2147483647:

I originally copy-pasted code for the workaround for issue #2256 that uses 2147483648 ( == 2^31) as the maxsize threshold. I found this threshold, 2^31, to provide a workaround for the issue with np.save, but not for the issues with fromfile/tofile. 2^31 -1 fixes it for both bugs though.

This worries me as I feel like I am missing something here, because from what I understand, np.load calls fromfile at some point -- it should either work or not for both of them not just for one of them.

Would like to hear some thoughts on this.

@lukauskas
Copy link
Author

I have put a proposed fix along with unit tests, into my fork:

https://github.com/sauliusl/numpy/tree/bug_large_save

@pv
Copy link
Member

pv commented Jan 6, 2013

@sauliusl: it would be cleaner to add a new helper functions NumPyOS_fread and NumPyOS_fwrite that work around these bugs --- adding them in numpyos.c/h and then using those everywhere intead of fwrite and fread sounds OK.

@njsmith
Copy link
Member

njsmith commented Jan 6, 2013

multiarray/convert.c also has an inline copy of this same loop for fwrite, in an #ifdef _WIN64. This kind of duplication is silly, like @pv says; we should pull these out into wrapper functions, and the wrapper should just use a sensible buffer size that's well-clear of these magic limits -- like 2*25 or something (= 32 megabytes per call). It just needs to be large enough that reading/writing that much data is more expensive than a syscall, and syscalls are pretty cheap. And the loop should be used unconditionally on all systems instead of messing about with #ifdefs, there's no *downside to calling fread/fwrite 10 times instead of 1 when reading/writing hundreds of megabytes, and it's always better to have one well-tested code path instead of many poorly tested ones.

(Also please use more descriptive commit messages -- "attempt at fixing the bug" is clear now, but will be pretty puzzling in a few years when someone is looking at 'git log' ;-).)

Are you testing load and fromfile with the same dtype? The strange behaviour you see could be some sort of artifact of different rounding in the integer division by elsize producing off-by-one errors down the line. But it doesn't really matter, since we shouldn't push the limit like this anyway.

Go ahead and submit a pull request and we can sort all this stuff out there.

lukauskas added a commit to lukauskas/numpy that referenced this issue Feb 13, 2013
as NumPyOS_fwrite to unify the fwrites.

Also made multiarray/ctors.c and multiarray/convert.c to use these new
functions.

see discussion in issue numpy#2806
lukauskas added a commit to lukauskas/numpy that referenced this issue Feb 17, 2014
as NumPyOS_fwrite to unify the fwrites.

Also made multiarray/ctors.c and multiarray/convert.c to use these new
functions.

see discussion in issue numpy#2806
@charris
Copy link
Member

charris commented May 5, 2014

I believe this is fixed in Mavericks, closing as the best fix is an OS X upgrade. Please reopen if the problem is something you think needs to be dealt with.

@astrofrog
Copy link
Contributor

@charris - just to note that I also just ran into this. Personally I think that a workaround should be provided in Numpy, as an OS upgrade is a pretty extreme solution.

@matthew-brett
Copy link
Contributor

Does the bug apply on 10.6? If so, and there is a patch, I think that would be worth merging, because 10.6 is still very much with us; current figures from https://www.adium.im/sparkle/#osVersion gives 10.6 as 14.5 percent of OSX, and this for a messaging app which might well be run on machines with a less conservative upgrade scheme than Macs used for science.

@charris
Copy link
Member

charris commented Nov 6, 2014

@astrofrog It's the Apple way ;) However, chunking is not a bad idea in any case, so if someone is motivated to make a PR I'll take it.

@matthew-brett
Copy link
Contributor

Chuck - how about #2931 ?

@charris
Copy link
Member

charris commented Nov 6, 2014

@matthew-brett The old PR languished, it lacked follow through.

@matthew-brett
Copy link
Contributor

Would you be prepared to accept a de-languished version of the PR? @astrofrog - do you have any interest in working on this?

@charris
Copy link
Member

charris commented Nov 6, 2014

Sure.

@astrofrog
Copy link
Contributor

@matthew-brett - what remained to be done with respect to #2931?

@charris
Copy link
Member

charris commented Nov 6, 2014

IIRC, chunk size. A chunk size of around 256 meg seemed to be optimal for the zip file fixes. See http://nbviewer.ipython.org/gist/btel/5729671.

@charris
Copy link
Member

charris commented Nov 6, 2014

Testing was also a problem, but with the smaller chunk size I think one could get away with a smaller testfile. IIRC, the Travis environment gives us 2 GiB.

@astrofrog
Copy link
Contributor

@charris - thanks! I won't have time to look into this over the next week, but hopefully will have some time after.

@lukauskas
Copy link
Author

@charris Depends what you are testing -- the bug only occured for filesizes over 2GB therefore the test doing the massive IO operation to verify that all is fine.
If one was to test that the output is being chunked, it is a completely different test.

@njsmith
Copy link
Member

njsmith commented Nov 6, 2014

A test that files larger than the chunk size work would cover 99% of the
things that could go wrong, once chunking is implemented. (Of course a
proper regression test with >2GiB files would also be good, but that's a
lot harder to run on a regular basis.)
On 6 Nov 2014 20:25, "Saulius Lukauskas" [email protected] wrote:

@charris https://github.com/charris Depends what you are testing -- the
bug only occured for filesizes over 2GB therefore the test doing the
massive IO operation to verify that all is fine.
If one was to test that the output is being chunked, it is a completely
different test.


Reply to this email directly or view it on GitHub
#2806 (comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants