-
-
Notifications
You must be signed in to change notification settings - Fork 10.9k
~2**32 byte tofile()/fromfile() limit in 64-bit Windows (Trac #1660) #2256
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@cgohlke wrote on 2010-11-03 The tofile() call hangs in numpy\core\src\multiarray\convert.c line 84:
This seems to be the reason: [http://social.msdn.microsoft.com/Forums/en-US/vcgeneral/thread/7c913001-227e-439b-bf07-54369ba07994 "fwrite issues with large data write"] |
@cgohlke wrote on 2010-11-03 Probably the same issue as [http://bugs.python.org/issue9015]. |
trac user mspacek wrote on 2010-11-03 Well, that's depressing. I should just switch to linux already. It looks like fseek and ftell are 32 bit in msvc, even in Win64. Perhaps _fseeki64 and _ftelli64 need to be used instead?: http://msdn.microsoft.com/en-us/library/75yw9bf3%28v=VS.90%29.aspx http://msdn.microsoft.com/en-us/library/0ys3hc0b%28v=VS.90%29.aspx http://www.firstobject.com/fseeki64-ftelli64-in-vc++.htm This might explain why "IOError: could not seek in file" is thrown in /numpy/core/src/multiarray/ctors.c line 3037 when I call fromfile():
However, I don't understand why the error comes up when I call fromfile() directly, but not when I call it indirectly via np.load() |
trac user mspacek wrote on 2010-11-03 As a workaround for the tofile() problem, perhaps it could check if the platform is windows, and if so, call fwrite multiple times in say 2GB chunks until all data is written out. Unfortunately, I don't have the skills to compile on win64, let alone implement something like this. |
@cgohlke wrote on 2010-11-03 Rather than getting depressed or dumping Windows, how about you write/adjust the unit tests and I'll try to fix the code to pass the tests. |
@cgohlke wrote on 2010-11-03 Please consider the attached patch for numpy 1.5.x. The *i64 functions are not available in the default compiler used by Python 2.5 64 bit. This simple test now passes but may take several minutes to complete (not appropriate for a standard unit test):
|
Milestone changed to |
Attachment added by @cgohlke on 2010-11-04: ticket1660.diff |
trac user mspacek wrote on 2010-11-04 I've tried out Christoph's patch, as provided in his latest binary at http://www.lfd.uci.edu/~gohlke/pythonlibs/#numpy, and it all seems to work now! Here are the tests I ran. They do take a fair while to run. 6GB seems a reasonable trade-off between speed and ensuring there's no size limit:
I've never written unit tests before. I'll see what I can learn and try and submit something to the ticket. |
trac user mspacek wrote on 2010-11-04 Also, I think I've figured out why the IOError was being raised during the direct call to np.fromfile(), but not during the indirect call via np.load(). fromfile's count arg defaults to -1, specifying all entries, which drops it into the loop in /numpy/core/src/multiarray/ctors.c line ~3037. There, it explicitly does some normal 32 bit seek calls, and those were failing. In np.load(), fromfile is called with count set explicitly to something positive, which means all the seeking stuff is skipped, and it goes clean through to the fread call without any problems. |
trac user mspacek wrote on 2010-11-04 Here's a unit test patch. With this, np.test('full') hangs without Christoph's patch, and completes successfully with his patch. Unfortunately, the test takes > 1 min to write slightly more than 4GB (the required threshold) to a non-SSD drive. Don't think there's much that can be done about that. |
Attachment added by trac user mspacek on 2010-11-04: 0001-add-unittest-for-ticket-1660.patch |
Attachment added by trac user mspacek on 2010-11-04: test_big_binary.py |
@rgommers wrote on 2010-11-04 About the unit test: you applied the slow decorator, that's all you can do probably. Or maybe we need @dec.very_slow ... A few other points:
|
Attachment added by trac user mspacek on 2010-11-04: add-unittest-for-ticket-1660.patch |
@charris wrote on 2011-01-23 Code style needs some fixes, which I'll do. |
Original ticket http://projects.scipy.org/numpy/ticket/1660 on 2010-11-03 by trac user mspacek, assigned to @charris.
I'm using Christoph Gohlke's amd64 builds (http://www.lfd.uci.edu/~gohlke/pythonlibs/#numpy) on 64-bit Windows 7, with the official Python 2.6.6 amd64 install. I can't tofile() or save() or fromfile() numpy arrays roughly > 2**32 bytes. For save() and tofile() Python hangs, using 100% CPU on one core. For fromfile(), it raises an IOError. This doesn't happen in 64-bit Linux on the same machine. load() seems to work on any size of file. Here are some examples, and some caveats:
I have this problem regardless of which of Christoph's amd64 numpy builds (1.4.1, 1.5.0, 1.5.1RC1, MKL or non-MKL) I use. This happens on both my 64-bit Win7 i7 12 GB machine, and on another machine running 64-bit WinXP. Christoph has also confirmed this on Python 2.5 through 3.1 and says there is nothing special about his builds, and that this will affect all 64-bit Windows distributions of numpy (EPD, ActiveState, etc).
The text was updated successfully, but these errors were encountered: