save/load and tofile/fromfile fail silently for large arrays on Mac os X #2806

lukauskas · 2012-12-10T23:52:27Z

It seems that saving large numpy arrays to disk using numpy.save fails silently for very large matrices, example:

In [5]: a = np.random.randn(500000000)

In [6]: 

In [6]: a
Out[6]: 
array([-1.00736353, -2.06696394,  1.46569636, ...,  0.89738222,
       -0.06982733,  0.06954417])

In [7]: np.save('a', a)

In [8]: b = np.load('a.npy')

In [9]: b
Out[9]: 
array([-1.00736353, -2.06696394,  1.46569636, ...,  0.        ,
        0.        ,  0.        ])

Note zeros in the end of the matrix that do not exist in a

I'm running numpy version 1.6.2:

In [11]: np.version.version
Out[11]: '1.6.2'

on python 2.7.3 on Mac os X 10.8.2 if that helps

lukauskas · 2012-12-30T00:09:00Z

I think this is local to Mac os X and is related to numpy.fromfile failing for large arrays.

See this stackoverflow question:
http://stackoverflow.com/questions/13769545/max-limit-for-file-size-with-np-fromfile

I can verify the bug described in stackoverflow case:

>>> import numpy
>>> a = numpy.random.randn(300000000)
>>> a.tofile('a.tofile')
>>> b = numpy.fromfile('a.tofile', count=int(8e7))
>>> b
array([-0.57060504,  0.32796127, -1.23472672, ...,  0.28363057,
       -1.69623226,  2.36057118])
>>> b = numpy.fromfile('a.tofile', count=int(8e8))
>>> b
array([-0.57060504,  0.32796127, -1.23472672, ...,  0.        ,
        0.        ,  0.        ])
>>> b = numpy.fromfile('a.tofile', count=int(8e9))
>>> b
array([ 0.,  0.,  0., ...,  0.,  0.,  0.])
>>> numpy.fromfile('a.tofile')
array([ 0.,  0.,  0., ...,  0.,  0.,  0.])

It looks like it is a bug with Mac os X fread and might need a similar workaround as provided for issue #2256
There is another bug raised, Issue #574 that asks for a similar workaround for tofile as well.

lukauskas · 2012-12-30T23:53:33Z

I think I found a way to fix both of the issues.

The end of array_fromfile_binary in numpy/core/src/multiarray/ctors.c currently looks like this:

NPY_BEGIN_ALLOW_THREADS;
*nread = fread(PyArray_DATA(r), dtype->elsize, num, fp);
NPY_END_ALLOW_THREADS;

changing it into fixes both the issues with save and load and the issues with fromfile/tofile.

    NPY_BEGIN_ALLOW_THREADS;
#if defined(__APPLE__)
            /* Workaround for read failures on OS X. Issue #2806 */
            {
                npy_intp maxsize = 2147483647 / dtype->elsize;
                npy_intp chunksize;

                size_t n = 0;
                size_t n2;

                while (num > 0) {
                    chunksize = (num > maxsize) ? maxsize : num;
                    n2 = fread((const void *)
                             ((char *)PyArray_DATA(r) + (n * dtype->elsize)),
                             dtype->elsize,
                             (size_t) chunksize, fp);
                    if (n2 < chunksize) {
                        break;
                    }
                    n += n2;
                    num -= chunksize;
                }
                *nread = n;
            }
#else
    *nread = fread(PyArray_DATA(r), dtype->elsize, num, fp);
#endif
    NPY_END_ALLOW_THREADS;

now about the number, 2147483647:

I originally copy-pasted code for the workaround for issue #2256 that uses 2147483648 ( == 2^31) as the maxsize threshold. I found this threshold, 2^31, to provide a workaround for the issue with np.save, but not for the issues with fromfile/tofile. 2^31 -1 fixes it for both bugs though.

This worries me as I feel like I am missing something here, because from what I understand, np.load calls fromfile at some point -- it should either work or not for both of them not just for one of them.

Would like to hear some thoughts on this.

lukauskas · 2013-01-06T15:50:55Z

I have put a proposed fix along with unit tests, into my fork:

https://github.com/sauliusl/numpy/tree/bug_large_save

pv · 2013-01-06T16:02:26Z

@sauliusl: it would be cleaner to add a new helper functions NumPyOS_fread and NumPyOS_fwrite that work around these bugs --- adding them in numpyos.c/h and then using those everywhere intead of fwrite and fread sounds OK.

njsmith · 2013-01-06T16:13:55Z

multiarray/convert.c also has an inline copy of this same loop for fwrite, in an #ifdef _WIN64. This kind of duplication is silly, like @pv says; we should pull these out into wrapper functions, and the wrapper should just use a sensible buffer size that's well-clear of these magic limits -- like 2*25 or something (= 32 megabytes per call). It just needs to be large enough that reading/writing that much data is more expensive than a syscall, and syscalls are pretty cheap. And the loop should be used unconditionally on all systems instead of messing about with #ifdefs, there's no *downside to calling fread/fwrite 10 times instead of 1 when reading/writing hundreds of megabytes, and it's always better to have one well-tested code path instead of many poorly tested ones.

(Also please use more descriptive commit messages -- "attempt at fixing the bug" is clear now, but will be pretty puzzling in a few years when someone is looking at 'git log' ;-).)

Are you testing load and fromfile with the same dtype? The strange behaviour you see could be some sort of artifact of different rounding in the integer division by elsize producing off-by-one errors down the line. But it doesn't really matter, since we shouldn't push the limit like this anyway.

Go ahead and submit a pull request and we can sort all this stuff out there.

as NumPyOS_fwrite to unify the fwrites. Also made multiarray/ctors.c and multiarray/convert.c to use these new functions. see discussion in issue numpy#2806

charris · 2014-05-05T18:14:49Z

I believe this is fixed in Mavericks, closing as the best fix is an OS X upgrade. Please reopen if the problem is something you think needs to be dealt with.

astrofrog · 2014-11-06T18:54:53Z

@charris - just to note that I also just ran into this. Personally I think that a workaround should be provided in Numpy, as an OS upgrade is a pretty extreme solution.

matthew-brett · 2014-11-06T19:12:28Z

Does the bug apply on 10.6? If so, and there is a patch, I think that would be worth merging, because 10.6 is still very much with us; current figures from https://www.adium.im/sparkle/#osVersion gives 10.6 as 14.5 percent of OSX, and this for a messaging app which might well be run on machines with a less conservative upgrade scheme than Macs used for science.

charris · 2014-11-06T19:14:18Z

@astrofrog It's the Apple way ;) However, chunking is not a bad idea in any case, so if someone is motivated to make a PR I'll take it.

matthew-brett · 2014-11-06T19:16:10Z

Chuck - how about #2931 ?

charris · 2014-11-06T19:17:55Z

@matthew-brett The old PR languished, it lacked follow through.

matthew-brett · 2014-11-06T19:19:06Z

Would you be prepared to accept a de-languished version of the PR? @astrofrog - do you have any interest in working on this?

charris · 2014-11-06T19:23:28Z

Sure.

astrofrog · 2014-11-06T19:52:17Z

@matthew-brett - what remained to be done with respect to #2931?

charris · 2014-11-06T20:00:08Z

IIRC, chunk size. A chunk size of around 256 meg seemed to be optimal for the zip file fixes. See http://nbviewer.ipython.org/gist/btel/5729671.

charris · 2014-11-06T20:02:35Z

Testing was also a problem, but with the smaller chunk size I think one could get away with a smaller testfile. IIRC, the Travis environment gives us 2 GiB.

astrofrog · 2014-11-06T20:23:36Z

@charris - thanks! I won't have time to look into this over the next week, but hopefully will have some time after.

lukauskas · 2014-11-06T20:25:30Z

@charris Depends what you are testing -- the bug only occured for filesizes over 2GB therefore the test doing the massive IO operation to verify that all is fine.
If one was to test that the output is being chunked, it is a completely different test.

njsmith · 2014-11-06T21:31:33Z

A test that files larger than the chunk size work would cover 99% of the
things that could go wrong, once chunking is implemented. (Of course a
proper regression test with >2GiB files would also be good, but that's a
lot harder to run on a regular basis.)
On 6 Nov 2014 20:25, "Saulius Lukauskas" [email protected] wrote:

@charris https://github.com/charris Depends what you are testing -- the
bug only occured for filesizes over 2GB therefore the test doing the
massive IO operation to verify that all is fine.
If one was to test that the output is being chunked, it is a completely
different test.

—
Reply to this email directly or view it on GitHub
#2806 (comment).

lukauskas mentioned this issue Jan 18, 2013

Bugfix for broken tofile/fromfile and save/load in Mac OS X Issue #2806 #2931

Closed

lukauskas added a commit to lukauskas/numpy that referenced this issue Feb 13, 2013

TST: Issue numpy#2806 added test forfromfile/tofile with large arrays

fffe578

lukauskas added a commit to lukauskas/numpy that referenced this issue Feb 13, 2013

FIX: fixing handling tofile/fromfile for large arrays Issue numpy#2806

4a39273

pv mentioned this issue Jun 25, 2013

Save large array with numpy.savez(_compressed) #3473

Closed

pv mentioned this issue Oct 3, 2013

ERROR: test_big_arrays (test_io.TestSavezLoad) on OS X + Python 3.3 #3858

Closed

lukauskas added a commit to lukauskas/numpy that referenced this issue Feb 17, 2014

TST: Issue numpy#2806 added test forfromfile/tofile with large arrays

dc887a5

lukauskas added a commit to lukauskas/numpy that referenced this issue Feb 17, 2014

FIX: fixing handling tofile/fromfile for large arrays Issue numpy#2806

2fd33d5

charris closed this as completed May 5, 2014

astrofrog mentioned this issue Nov 6, 2014

Bug with reading large FITS files on OSX with memory mapping is turned off astropy/astropy#3078

Closed

cbrt64 mentioned this issue Apr 18, 2023

BUG: Use 2GiB chunking code for fwrite() on mingw32/64 #23505

Merged

Uh oh!

save/load and tofile/fromfile fail silently for large arrays on Mac os X #2806

save/load and tofile/fromfile fail silently for large arrays on Mac os X #2806

Comments

lukauskas commented Dec 10, 2012

lukauskas commented Dec 30, 2012

Uh oh!

lukauskas commented Dec 30, 2012

Uh oh!

lukauskas commented Jan 6, 2013

Uh oh!

pv commented Jan 6, 2013

Uh oh!

njsmith commented Jan 6, 2013

Uh oh!

charris commented May 5, 2014

Uh oh!

astrofrog commented Nov 6, 2014

Uh oh!

matthew-brett commented Nov 6, 2014

Uh oh!

charris commented Nov 6, 2014

Uh oh!

matthew-brett commented Nov 6, 2014

Uh oh!

charris commented Nov 6, 2014

Uh oh!

matthew-brett commented Nov 6, 2014

Uh oh!

charris commented Nov 6, 2014

Uh oh!

astrofrog commented Nov 6, 2014

Uh oh!

charris commented Nov 6, 2014

Uh oh!

charris commented Nov 6, 2014

Uh oh!

astrofrog commented Nov 6, 2014

Uh oh!

lukauskas commented Nov 6, 2014

Uh oh!

njsmith commented Nov 6, 2014

Uh oh!