Bugfix for broken tofile/fromfile and save/load in Mac OS X Issue #2806 #2931

lukauskas · 2013-01-18T22:34:55Z

I have hacked around a fix for the issue where large matrices were not read from memory correctly (issue #2806). I have been using this branch personally ever since it was created and did not notice any problems with it, therefore I am confident in sending a pull request.

This might need a bit of refactoring to fit the rest of the code cleaner, though.
Please see #2806 for more discussion on this.

njsmith · 2013-01-24T06:42:15Z

This still needs the changes pointed out by @pv and me here: #2806 (comment)

lukauskas · 2013-02-10T22:04:53Z

Hi,

I have now implemented the changes pointed out by @pv and @njsmith. Have a look.

The unit test added by 2e8a123 detects another issue with savez, though:

ERROR: Test IO functions when the arrays are massive.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/saulius/.virtualenvs/numpy_dev/lib/python2.7/site-packages/numpy/lib/tests/test_io.py", line 124, in test_large_array
    self.roundtrip(a, file_on_disk=True)
  File "/Users/saulius/.virtualenvs/numpy_dev/lib/python2.7/site-packages/numpy/lib/tests/test_io.py", line 135, in roundtrip
    assert_equal(arr, self.arr_reloaded['arr_%d' % n])
  File "/Users/saulius/.virtualenvs/numpy_dev/lib/python2.7/site-packages/numpy/lib/npyio.py", line 241, in __getitem__
    return format.read_array(value)
  File "/Users/saulius/.virtualenvs/numpy_dev/lib/python2.7/site-packages/numpy/lib/format.py", line 458, in read_array
    data = fp.read(int(count * dtype.itemsize))
SystemError: error return without exception set

If I understand correctly, this is a separate problem related to reading from BytesIO object (rather than using the patched fromfile routine). I guess this could also be fixed by chunking it. The code around the function even suggests this:

# This is not a real file. We have to read it the memory-intensive
# way.
# XXX: we can probably chunk this to avoid the memory hit.
data = fp.read(int(count * dtype.itemsize))
array = numpy.fromstring(data, dtype=dtype, count=count)

shall I go on and do this chunking then? any ideas on how to relate the chunking constant between npy_os.h and the python file?

lukauskas · 2013-02-10T22:36:12Z

Okay, this seems to have broken travis build due to multiple declarations of the functions due to multiple imports of npy_os.h. Surprised it built locally.

Can anyone give me a hint where the imports/code should go to avoid this?

lukauskas · 2013-02-14T02:20:14Z

Alright, I have now moved the functions to where they should be and the travis build passes again.
Yay!

seberg · 2013-02-14T13:04:00Z

numpy/core/src/multiarray/numpyos.c

+         if (n2 < chunksize) {
+             break;
+         }
+         n += n2;


I doubt it matters (but for the error reporting maybe), but conceptually shouldn't the break be after the addition? (same in the fwrite function)

Quoting GNU libc manual: """If `fread' encounters end of file in the middle of an object, it returns the number of complete objects read, and discards the partial object.""" So it should indeed be after the addition. (EDIT: oops, edited.)

seberg · 2013-02-14T13:19:47Z

Anyway, looks nice to my C knowledge. Just a bit unsure about the tests. They use quite large intermediates, etc. so I think they may need some error handling for devices with little RAM or harddrive space? Or do we assume that "slow" tests are fine in this regard as they are not normally run?

seberg · 2013-02-14T13:23:04Z

Sorry, accidental button press...

pv · 2013-02-14T13:24:19Z

Tests requiring that much memory may cause problems for people running them on machines with less physical memory -> may grind the whole machine to a halt due to the use of swap. I'd prefer adding a new decorator largemem or something.

seberg · 2013-02-14T14:19:25Z

Yeah, is it possible to have this inside nose in a way that it doesn't get run accidentally, i.e. with np.test('full')? There are enough things that could use a test which can only be run if you have a 64bit system with plenty of memory (more then 4 Gigs) for all those larger then 32bit issues, but maybe they should just be put into a manual run test file, like high_mem_tests next to that print_coercion_tables or such?

pv · 2013-02-14T15:01:10Z

Prediction: manual run test files = will not be run + bitrot. I don't see a problem in adding more nose test labels, those would then be available also for other projects using numpy.testing.

njsmith · 2013-02-14T16:22:39Z

Really I think the only long-run solution is going to be to use decorators
like

units are gibibytes

@needs(mem=4, disk=4)
Plus some simple system to guess reasonable default resource limits for a
given system, and to take in user overrides. (And a --no-limits flag for
release tests.) Even better if we print a message at startup saying
Assuming resource availability of: 4 GB ram, 1 GB scratch disk.
--mem and --disk change these values, or --no-limits to run all tests.
That will give people some recourse if the defaults kill their (well some
recourse before yelling at us), and ensure that a steady stream of people
actually will run the full test suite on random hardware.

It's easy to extend nose to add new command line options and test filters:
https://nose.readthedocs.org/en/latest/plugins/writing.html#registering-a-plugin-without-setuptools
(Maybe numpy.testing already does this.)
On 14 Feb 2013 07:02, "Pauli Virtanen" [email protected] wrote:

Prediction: manual run test files = will not be run + bitrot. I don't see
a problem in adding more nose test labels.

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/2931#issuecomment-13553070.

lukauskas · 2013-02-14T23:45:16Z

Hi all, thanks for the input.
I have now fixed the issues in the C code @seberg pointed out.

I will let you guys decide on the unit test issues.
In my personal opinion, it will be infeasible to run more than a few tests like these in the long term regardless of decorator used.

Maybe it is worth creating a single integration/performance test that would try to push the system up to its maximum limit (>32bit) and use it to spot potential failures instead?

njsmith · 2013-02-15T20:01:23Z

I guess the immediate issue is what to do with the tests in this PR. Do I understand correctly that the tests in the PR currently allocate and write to disk arrays with 2,400,000,000 bytes?

How long do these tests take to run? (It looks like Travis is set to run tests in 'fast' mode, so it's skipping these tests and I can't tell from that.)

It's not reasonable to have only a few tests that use large arrays, because the usual bugs in handling large arrays are things that you can't avoid by localizing them to a particular place -- every C function that handles array indices is another potential bug. And it's entirely possible to run a large set of large-array tests with a little preparation (though it might require a machine with >8 GB RAM and the use of a ram-disk). The question is just, which tests require how much preparation, and how do we prevent people from accidentally running tests that will take multiple hours and trigger the OOM killer -- and for right now, the question is, where do these particular tests fall on that spectrum.

lukauskas · 2013-02-17T20:57:16Z

It is not that slow on my Machine (16GB ram, SSD), the full test suite runs in two minutes.
However, I would expect this much slower on a machine with less ram and spin HDD

charris · 2013-03-02T02:42:02Z

@njsmith Do the Travis bots have memory limits in the virtual environments.

njsmith · 2013-03-02T11:52:13Z

I don't know how much memory is in those VMs, let's see: #3109

On Sat, Mar 2, 2013 at 2:42 AM, Charles Harris [email protected]:

@njsmith https://github.com/njsmith Do the Travis bots have memory
limits in the virtual environments.

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/2931#issuecomment-14321603
.

njsmith · 2013-03-02T18:38:03Z

It looks like the travis VMs have 3 GB of accessible memory, 1.5 GB of
swap, 64-bit address spaces (which means we may be able to exploit
overcommit to fake allocate more memory than actually exists), and tons of
disk space.
On 2 Mar 2013 11:52, "Nathaniel Smith" [email protected] wrote:

I don't know how much memory is in those VMs, let's see: #3109

On Sat, Mar 2, 2013 at 2:42 AM, Charles Harris [email protected]:

@njsmith https://github.com/njsmith Do the Travis bots have memory
limits in the virtual environments.

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/2931#issuecomment-14321603
.

charris · 2013-05-04T23:41:05Z

numpy/core/include/numpy/npy_os.h

@@ -1,3 +1,5 @@
+#include <numpy/npy_common.h>


Wrong place for include, it should go after the guard. In fact, I'm not sure why the include is here at all.

charris · 2013-05-04T23:49:07Z

This seems somewhat related to #2942, so it may be desirable to make the chunksize smaller. It looks like 256MB was the optimum size.

charris · 2013-06-12T02:19:49Z

numpy/core/src/multiarray/numpyos.c

+#define NUMPYOS_FREAD_THRESHOLD 33554432
+#define NUMPYOS_FWRITE_THRESHOLD 33554432
+/* A general workaround of OS issues with fread/fwrite, see issue #2806 */
+NPY_NO_EXPORT size_t NumPyOS_fread( void * ptr, size_t size, size_t count, FILE * stream )


Function name on next line.

charris · 2013-07-09T15:29:33Z

I'd like to get this in, but it needs some fixes. Anyone have an idea how to implement a test decorator? Is there a python function that will return the amount of usable memory? There doesn't seem to be a package in the Python standard library that does what is needed. Maybe just try to allocate a large array and catch a memory error?

I'm curious if this MAC problem also affects memmap?

njsmith · 2013-07-09T15:36:20Z

Allocating a large array won't in general let you determine the amount of
memory - usually the allocation will "succeed" but then at some point when
you're writing to the array you'll get a segfault.

That said, we could use the psutil library, or some simple hacks that
supported just Mac/Win/Linux.
On 9 Jul 2013 16:31, "Charles Harris" [email protected] wrote:

I'd like to get this in, but it needs some fixes. Anyone have an idea how
to implement a test decorator? Is there a python function that will return
the amount of usable memory? There doesn't seem to be a package in the
Python standard library that does what is needed. Maybe just try to
allocate a large array and catch a memory error?

I'm curious if this MAC problem also affects memmap?

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/2931#issuecomment-20682316
.

charris · 2013-07-09T15:53:09Z

On Tue, Jul 9, 2013 at 9:36 AM, njsmith [email protected] wrote:

Allocating a large array won't in general let you determine the amount of
memory - usually the allocation will "succeed" but then at some point when
you're writing to the array you'll get a segfault.

That said, we could use the psutil library, or some simple hacks that
supported just Mac/Win/Linux.
On 9 Jul 2013 16:31, "Charles Harris" [email protected] wrote:

I'd like to get this in, but it needs some fixes. Anyone have an idea
how
to implement a test decorator? Is there a python function that will
return
the amount of usable memory? There doesn't seem to be a package in the
Python standard library that does what is needed. Maybe just try to
allocate a large array and catch a memory error?

I'm curious if this MAC problem also affects memmap?

The psutils library looks small, but is broken down by platform rather than
function so is probably non-trivial to pull out just the interesting bits.
I suppose we could just run the test on MAC platforms, or perhaps have a
psutils dependency and skip the test if they aren't found.

Chuck

charris · 2013-07-10T01:46:56Z

For this, I could actually see skipping a test if it can't be reasonably implemented. It's a workaround for a MAC deficiency. Of course, it would be nice to know that it still works on other platforms...

charris · 2013-08-16T18:37:14Z

@sauliusl 1.8 branch coming up this Sunday, Aug 18.

charris · 2014-02-17T18:50:18Z

@sauliusl Are you still working with this?

lukauskas · 2014-02-17T18:55:33Z

Not really, no. Though from last time the patch was working, the pull
request did not go through due to nobody wanting to have an unit test that
writes down 2gb of the data.
How about I remove the test?

On Mon, Feb 17, 2014 at 6:50 PM, Charles Harris [email protected]:

@saulist Are you still working with this?

Reply to this email directly or view it on GitHubhttps://github.com//pull/2931#issuecomment-35312094
.

charris · 2014-02-17T19:03:30Z

Let's skip the unit test here, I do think the blocking could be smaller however. Do you know if this bug, Apple ID# 6434977, has been fixed?

charris · 2014-02-17T19:08:51Z

Oh, and if you pursue this, add a note in doc/release/1.9.0-notes.rst. This is probably worth a backport to 1.8 also, but we can do that.

as NumPyOS_fwrite to unify the fwrites. Also made multiarray/ctors.c and multiarray/convert.c to use these new functions. see discussion in issue numpy#2806

@pv

…npy_os.h. This is where it was meant to go in the first place, I just misunderstood @pv.

@seberg

NumPyOS_fread/NumPyOs_fwrite as pointed out by @seberg Also changed the comment style to be consistent with the rest of numpy.

lukauskas · 2014-02-17T21:20:06Z

Hm, I cannot reproduce the bug any more on OSX 10.9.1 (Mavericks).

Python 2.7.5 (default, Aug 25 2013, 00:04:04) 
[GCC 4.2.1 Compatible Apple LLVM 5.0 (clang-500.0.68)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy
>>> numpy.__version__
'1.9.0.dev-297f54b'
>>> x = numpy.random.randn(300000000)
>>> x.tofile('derp.npy')
>>> y = numpy.fromfile('derp.npy')
>>> x[-4:]
array([-0.26568164,  0.846976  , -0.98774847, -0.14033526])
>>> y[-4:] # last characters in y used to be zeros
array([-0.26568164,  0.846976  , -0.98774847, -0.14033526])
>>> from numpy.testing import assert_array_equal
>>> assert_array_equal(x, y)

Not sure how to proceed with this now.
Were there bug reports raised about similar issues recently?

Do we still want this in core?
I would probably guess it is still broken on earlier OSX versions, but have no way to check this.

charris · 2014-02-17T22:12:35Z

Apparently it broke in Lion, but perhaps it was silently failing in earlier versions. I don't know if it got fixed in later Lion releases.

I haven't seen any recent error messages about this, so, one possibility is to just let it go, with a recommendation to upgrade the OS. I know old hardware can't take that route, but...

charris · 2014-02-17T22:17:54Z

@rgommers @pv Anything you can add here?

rgommers · 2014-03-23T10:42:10Z

I can check this on OS X 10.6 if you want. Did you check there's no overlap with gh-4214?

charris · 2014-05-04T16:11:07Z

Closing. The fix is to upgrade to Mavericks.

seberg reviewed Feb 14, 2013
View reviewed changes

seberg closed this Feb 14, 2013

seberg reopened this Feb 14, 2013

charris reviewed May 4, 2013
View reviewed changes

btel mentioned this pull request Jun 10, 2013

BUG: Fix loading npz files >2GB on 64bit systems #2942

Merged

charris reviewed Jun 12, 2013
View reviewed changes

lukauskas added 6 commits February 17, 2014 20:34

TST: Issue numpy#2806 added test forfromfile/tofile with large arrays

dc887a5

FIX: fixing handling tofile/fromfile for large arrays Issue numpy#2806

2fd33d5

implemented NumPyOS_fread that would standardise all fread calls as well

70d5a7c

as NumPyOS_fwrite to unify the fwrites. Also made multiarray/ctors.c and multiarray/convert.c to use these new functions. see discussion in issue numpy#2806

Unified fread/fwrite limits for NumPyOS_fread and NumPyOS_fwrite

cdff800

BLD: Moved NumPyOS_fread and NumPyOS_fwrite to numpyos.c rather than …

927eb3c

…npy_os.h. This is where it was meant to go in the first place, I just misunderstood @pv.

FIX: Fix incorrect calculation of bytes read/written in

258e4a9

NumPyOS_fread/NumPyOs_fwrite as pointed out by @seberg Also changed the comment style to be consistent with the rest of numpy.

charris mentioned this pull request Feb 24, 2014

ERROR: test_big_arrays (test_io.TestSavezLoad) on OS X + Python 3.3 #3858

Closed

charris closed this May 4, 2014

matthew-brett mentioned this pull request Nov 6, 2014

save/load and tofile/fromfile fail silently for large arrays on Mac os X #2806

Closed

cbrt64 mentioned this pull request Apr 18, 2023

BUG: Use 2GiB chunking code for fwrite() on mingw32/64 #23505

Merged

Uh oh!

Bugfix for broken tofile/fromfile and save/load in Mac OS X Issue #2806 #2931

Bugfix for broken tofile/fromfile and save/load in Mac OS X Issue #2806 #2931

Uh oh!

Conversation

lukauskas commented Jan 18, 2013

Uh oh!

njsmith commented Jan 24, 2013

Uh oh!

lukauskas commented Feb 10, 2013

Uh oh!

lukauskas commented Feb 10, 2013

Uh oh!

lukauskas commented Feb 14, 2013

Uh oh!

seberg Feb 14, 2013

Choose a reason for hiding this comment

Uh oh!

pv Feb 14, 2013

Choose a reason for hiding this comment

Uh oh!

seberg commented Feb 14, 2013

Uh oh!

seberg commented Feb 14, 2013

Uh oh!

pv commented Feb 14, 2013

Uh oh!

seberg commented Feb 14, 2013

Uh oh!

pv commented Feb 14, 2013

Uh oh!

njsmith commented Feb 14, 2013

units are gibibytes

Uh oh!

lukauskas commented Feb 14, 2013

Uh oh!

njsmith commented Feb 15, 2013

Uh oh!

lukauskas commented Feb 17, 2013

Uh oh!

charris commented Mar 2, 2013

Uh oh!

njsmith commented Mar 2, 2013

Uh oh!

njsmith commented Mar 2, 2013

Uh oh!

charris May 4, 2013

Choose a reason for hiding this comment

Uh oh!

charris commented May 4, 2013

Uh oh!

charris Jun 12, 2013

Choose a reason for hiding this comment

Uh oh!

charris commented Jul 9, 2013

Uh oh!

njsmith commented Jul 9, 2013

Uh oh!

charris commented Jul 9, 2013

Uh oh!

charris commented Jul 10, 2013

Uh oh!

charris commented Aug 16, 2013

Uh oh!

charris commented Feb 17, 2014

Uh oh!

lukauskas commented Feb 17, 2014

Uh oh!

charris commented Feb 17, 2014

Uh oh!

charris commented Feb 17, 2014

Uh oh!

lukauskas commented Feb 17, 2014

Uh oh!

charris commented Feb 17, 2014

Uh oh!

charris commented Feb 17, 2014

Uh oh!

rgommers commented Mar 23, 2014

Uh oh!