Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Bugfix for broken tofile/fromfile and save/load in Mac OS X Issue #2806 #2931

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 6 commits into from

Conversation

lukauskas
Copy link

I have hacked around a fix for the issue where large matrices were not read from memory correctly (issue #2806). I have been using this branch personally ever since it was created and did not notice any problems with it, therefore I am confident in sending a pull request.

This might need a bit of refactoring to fit the rest of the code cleaner, though.
Please see #2806 for more discussion on this.

@njsmith
Copy link
Member

njsmith commented Jan 24, 2013

This still needs the changes pointed out by @pv and me here: #2806 (comment)

@lukauskas
Copy link
Author

Hi,

I have now implemented the changes pointed out by @pv and @njsmith. Have a look.

The unit test added by 2e8a123 detects another issue with savez, though:

ERROR: Test IO functions when the arrays are massive.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/saulius/.virtualenvs/numpy_dev/lib/python2.7/site-packages/numpy/lib/tests/test_io.py", line 124, in test_large_array
    self.roundtrip(a, file_on_disk=True)
  File "/Users/saulius/.virtualenvs/numpy_dev/lib/python2.7/site-packages/numpy/lib/tests/test_io.py", line 135, in roundtrip
    assert_equal(arr, self.arr_reloaded['arr_%d' % n])
  File "/Users/saulius/.virtualenvs/numpy_dev/lib/python2.7/site-packages/numpy/lib/npyio.py", line 241, in __getitem__
    return format.read_array(value)
  File "/Users/saulius/.virtualenvs/numpy_dev/lib/python2.7/site-packages/numpy/lib/format.py", line 458, in read_array
    data = fp.read(int(count * dtype.itemsize))
SystemError: error return without exception set

If I understand correctly, this is a separate problem related to reading from BytesIO object (rather than using the patched fromfile routine). I guess this could also be fixed by chunking it. The code around the function even suggests this:

# This is not a real file. We have to read it the memory-intensive
# way.
# XXX: we can probably chunk this to avoid the memory hit.
data = fp.read(int(count * dtype.itemsize))
array = numpy.fromstring(data, dtype=dtype, count=count)

shall I go on and do this chunking then? any ideas on how to relate the chunking constant between npy_os.h and the python file?

@lukauskas
Copy link
Author

Okay, this seems to have broken travis build due to multiple declarations of the functions due to multiple imports of npy_os.h. Surprised it built locally.

Can anyone give me a hint where the imports/code should go to avoid this?

@lukauskas
Copy link
Author

Alright, I have now moved the functions to where they should be and the travis build passes again.
Yay!

if (n2 < chunksize) {
break;
}
n += n2;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I doubt it matters (but for the error reporting maybe), but conceptually shouldn't the break be after the addition? (same in the fwrite function)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quoting GNU libc manual: """If `fread' encounters end of file in the middle of an object, it returns the number of complete objects read, and discards the partial object.""" So it should indeed be after the addition. (EDIT: oops, edited.)

@seberg
Copy link
Member

seberg commented Feb 14, 2013

Anyway, looks nice to my C knowledge. Just a bit unsure about the tests. They use quite large intermediates, etc. so I think they may need some error handling for devices with little RAM or harddrive space? Or do we assume that "slow" tests are fine in this regard as they are not normally run?

@seberg seberg closed this Feb 14, 2013
@seberg seberg reopened this Feb 14, 2013
@seberg
Copy link
Member

seberg commented Feb 14, 2013

Sorry, accidental button press...

@pv
Copy link
Member

pv commented Feb 14, 2013

Tests requiring that much memory may cause problems for people running them on machines with less physical memory -> may grind the whole machine to a halt due to the use of swap. I'd prefer adding a new decorator largemem or something.

@seberg
Copy link
Member

seberg commented Feb 14, 2013

Yeah, is it possible to have this inside nose in a way that it doesn't get run accidentally, i.e. with np.test('full')? There are enough things that could use a test which can only be run if you have a 64bit system with plenty of memory (more then 4 Gigs) for all those larger then 32bit issues, but maybe they should just be put into a manual run test file, like high_mem_tests next to that print_coercion_tables or such?

@pv
Copy link
Member

pv commented Feb 14, 2013

Prediction: manual run test files = will not be run + bitrot. I don't see a problem in adding more nose test labels, those would then be available also for other projects using numpy.testing.

@njsmith
Copy link
Member

njsmith commented Feb 14, 2013

Really I think the only long-run solution is going to be to use decorators
like

units are gibibytes

@needs(mem=4, disk=4)
Plus some simple system to guess reasonable default resource limits for a
given system, and to take in user overrides. (And a --no-limits flag for
release tests.) Even better if we print a message at startup saying
Assuming resource availability of: 4 GB ram, 1 GB scratch disk.
--mem and --disk change these values, or --no-limits to run all tests.
That will give people some recourse if the defaults kill their (well some
recourse before yelling at us), and ensure that a steady stream of people
actually will run the full test suite on random hardware.

It's easy to extend nose to add new command line options and test filters:
https://nose.readthedocs.org/en/latest/plugins/writing.html#registering-a-plugin-without-setuptools
(Maybe numpy.testing already does this.)
On 14 Feb 2013 07:02, "Pauli Virtanen" [email protected] wrote:

Prediction: manual run test files = will not be run + bitrot. I don't see
a problem in adding more nose test labels.


Reply to this email directly or view it on GitHubhttps://github.com//pull/2931#issuecomment-13553070.

@lukauskas
Copy link
Author

Hi all, thanks for the input.
I have now fixed the issues in the C code @seberg pointed out.

I will let you guys decide on the unit test issues.
In my personal opinion, it will be infeasible to run more than a few tests like these in the long term regardless of decorator used.

Maybe it is worth creating a single integration/performance test that would try to push the system up to its maximum limit (>32bit) and use it to spot potential failures instead?

@njsmith
Copy link
Member

njsmith commented Feb 15, 2013

I guess the immediate issue is what to do with the tests in this PR. Do I understand correctly that the tests in the PR currently allocate and write to disk arrays with 2,400,000,000 bytes?

How long do these tests take to run? (It looks like Travis is set to run tests in 'fast' mode, so it's skipping these tests and I can't tell from that.)

It's not reasonable to have only a few tests that use large arrays, because the usual bugs in handling large arrays are things that you can't avoid by localizing them to a particular place -- every C function that handles array indices is another potential bug. And it's entirely possible to run a large set of large-array tests with a little preparation (though it might require a machine with >8 GB RAM and the use of a ram-disk). The question is just, which tests require how much preparation, and how do we prevent people from accidentally running tests that will take multiple hours and trigger the OOM killer -- and for right now, the question is, where do these particular tests fall on that spectrum.

@lukauskas
Copy link
Author

It is not that slow on my Machine (16GB ram, SSD), the full test suite runs in two minutes.
However, I would expect this much slower on a machine with less ram and spin HDD

@charris
Copy link
Member

charris commented Mar 2, 2013

@njsmith Do the Travis bots have memory limits in the virtual environments.

@njsmith
Copy link
Member

njsmith commented Mar 2, 2013

I don't know how much memory is in those VMs, let's see: #3109

On Sat, Mar 2, 2013 at 2:42 AM, Charles Harris [email protected]:

@njsmith https://github.com/njsmith Do the Travis bots have memory
limits in the virtual environments.


Reply to this email directly or view it on GitHubhttps://github.com//pull/2931#issuecomment-14321603
.

@njsmith
Copy link
Member

njsmith commented Mar 2, 2013

It looks like the travis VMs have 3 GB of accessible memory, 1.5 GB of
swap, 64-bit address spaces (which means we may be able to exploit
overcommit to fake allocate more memory than actually exists), and tons of
disk space.
On 2 Mar 2013 11:52, "Nathaniel Smith" [email protected] wrote:

I don't know how much memory is in those VMs, let's see: #3109

On Sat, Mar 2, 2013 at 2:42 AM, Charles Harris [email protected]:

@njsmith https://github.com/njsmith Do the Travis bots have memory
limits in the virtual environments.


Reply to this email directly or view it on GitHubhttps://github.com//pull/2931#issuecomment-14321603
.

@@ -1,3 +1,5 @@
#include <numpy/npy_common.h>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrong place for include, it should go after the guard. In fact, I'm not sure why the include is here at all.

@charris
Copy link
Member

charris commented May 4, 2013

This seems somewhat related to #2942, so it may be desirable to make the chunksize smaller. It looks like 256MB was the optimum size.

#define NUMPYOS_FREAD_THRESHOLD 33554432
#define NUMPYOS_FWRITE_THRESHOLD 33554432
/* A general workaround of OS issues with fread/fwrite, see issue #2806 */
NPY_NO_EXPORT size_t NumPyOS_fread( void * ptr, size_t size, size_t count, FILE * stream )
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function name on next line.

@charris
Copy link
Member

charris commented Jul 9, 2013

I'd like to get this in, but it needs some fixes. Anyone have an idea how to implement a test decorator? Is there a python function that will return the amount of usable memory? There doesn't seem to be a package in the Python standard library that does what is needed. Maybe just try to allocate a large array and catch a memory error?

I'm curious if this MAC problem also affects memmap?

@njsmith
Copy link
Member

njsmith commented Jul 9, 2013

Allocating a large array won't in general let you determine the amount of
memory - usually the allocation will "succeed" but then at some point when
you're writing to the array you'll get a segfault.

That said, we could use the psutil library, or some simple hacks that
supported just Mac/Win/Linux.
On 9 Jul 2013 16:31, "Charles Harris" [email protected] wrote:

I'd like to get this in, but it needs some fixes. Anyone have an idea how
to implement a test decorator? Is there a python function that will return
the amount of usable memory? There doesn't seem to be a package in the
Python standard library that does what is needed. Maybe just try to
allocate a large array and catch a memory error?

I'm curious if this MAC problem also affects memmap?


Reply to this email directly or view it on GitHubhttps://github.com//pull/2931#issuecomment-20682316
.

@charris
Copy link
Member

charris commented Jul 9, 2013

On Tue, Jul 9, 2013 at 9:36 AM, njsmith [email protected] wrote:

Allocating a large array won't in general let you determine the amount of
memory - usually the allocation will "succeed" but then at some point when
you're writing to the array you'll get a segfault.

That said, we could use the psutil library, or some simple hacks that
supported just Mac/Win/Linux.
On 9 Jul 2013 16:31, "Charles Harris" [email protected] wrote:

I'd like to get this in, but it needs some fixes. Anyone have an idea
how
to implement a test decorator? Is there a python function that will
return
the amount of usable memory? There doesn't seem to be a package in the
Python standard library that does what is needed. Maybe just try to
allocate a large array and catch a memory error?

I'm curious if this MAC problem also affects memmap?

The psutils library looks small, but is broken down by platform rather than
function so is probably non-trivial to pull out just the interesting bits.
I suppose we could just run the test on MAC platforms, or perhaps have a
psutils dependency and skip the test if they aren't found.

Chuck

@charris
Copy link
Member

charris commented Jul 10, 2013

For this, I could actually see skipping a test if it can't be reasonably implemented. It's a workaround for a MAC deficiency. Of course, it would be nice to know that it still works on other platforms...

@charris
Copy link
Member

charris commented Aug 16, 2013

@sauliusl 1.8 branch coming up this Sunday, Aug 18.

@charris
Copy link
Member

charris commented Feb 17, 2014

@sauliusl Are you still working with this?

@lukauskas
Copy link
Author

Not really, no. Though from last time the patch was working, the pull
request did not go through due to nobody wanting to have an unit test that
writes down 2gb of the data.
How about I remove the test?

On Mon, Feb 17, 2014 at 6:50 PM, Charles Harris [email protected]:

@saulist Are you still working with this?

Reply to this email directly or view it on GitHubhttps://github.com//pull/2931#issuecomment-35312094
.

@charris
Copy link
Member

charris commented Feb 17, 2014

Let's skip the unit test here, I do think the blocking could be smaller however. Do you know if this bug, Apple ID# 6434977, has been fixed?

@charris
Copy link
Member

charris commented Feb 17, 2014

Oh, and if you pursue this, add a note in doc/release/1.9.0-notes.rst. This is probably worth a backport to 1.8 also, but we can do that.

as NumPyOS_fwrite to unify the fwrites.

Also made multiarray/ctors.c and multiarray/convert.c to use these new
functions.

see discussion in issue numpy#2806
…npy_os.h. This is where it was meant to go in the first place, I just misunderstood @pv.
NumPyOS_fread/NumPyOs_fwrite as pointed out by
@seberg

Also changed the comment style to be consistent with the rest of numpy.
@lukauskas
Copy link
Author

Hm, I cannot reproduce the bug any more on OSX 10.9.1 (Mavericks).

Python 2.7.5 (default, Aug 25 2013, 00:04:04) 
[GCC 4.2.1 Compatible Apple LLVM 5.0 (clang-500.0.68)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy
>>> numpy.__version__
'1.9.0.dev-297f54b'
>>> x = numpy.random.randn(300000000)
>>> x.tofile('derp.npy')
>>> y = numpy.fromfile('derp.npy')
>>> x[-4:]
array([-0.26568164,  0.846976  , -0.98774847, -0.14033526])
>>> y[-4:] # last characters in y used to be zeros
array([-0.26568164,  0.846976  , -0.98774847, -0.14033526])
>>> from numpy.testing import assert_array_equal
>>> assert_array_equal(x, y)

Not sure how to proceed with this now.
Were there bug reports raised about similar issues recently?

Do we still want this in core?
I would probably guess it is still broken on earlier OSX versions, but have no way to check this.

@charris
Copy link
Member

charris commented Feb 17, 2014

Apparently it broke in Lion, but perhaps it was silently failing in earlier versions. I don't know if it got fixed in later Lion releases.

I haven't seen any recent error messages about this, so, one possibility is to just let it go, with a recommendation to upgrade the OS. I know old hardware can't take that route, but...

@charris
Copy link
Member

charris commented Feb 17, 2014

@rgommers @pv Anything you can add here?

@rgommers
Copy link
Member

I can check this on OS X 10.6 if you want. Did you check there's no overlap with gh-4214?

@charris
Copy link
Member

charris commented May 4, 2014

Closing. The fix is to upgrade to Mavericks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants