loop over write chunks added for raw encoding in writer#106
Conversation
Codecov Report
@@ Coverage Diff @@
## master #106 +/- ##
=====================================
Coverage 100% 100%
=====================================
Files 6 6
Lines 385 391 +6
Branches 125 126 +1
=====================================
+ Hits 385 391 +6
Continue to review full report at Codecov.
|
|
Hello, thanks for the PR. Logically, it all looks good and seems like a good thing to have. It's good to be consistent between NRRD's with vs. without encoding (i.e. writing in 2GB chunks). Just so I'm clear, is the writing >4GB chunks just an issue in Python itself? When the chunked writing for encoded NRRDs was introduced, it seemed to be to circumvent a bug in Python but I was unclear whether that bug was still present in recent versions of Python. |
|
From the limited reading I did regarding this issue I believe as you
suggest that it is related to an issue in Python. In my case, I was trying
to write a numpy array to NRRD with raw encoding that was uint16 and about
2400x2400x300 in size, that's about 3.5GB uncompressed. I got an
enigmatic error that seemed to come from python itself rather than any of
my or the pynrrd code:
OSError: [Errno 22] Invalid Argument
I googled that error and found:
https://stackoverflow.com/questions/48122798/oserror-errno-22-invalid-argument-when-reading-a-huge-file
(and
some similar posts).
There does not seem to be consensus between different sources about the
maximum file size, or even which versions of python have this issue. Those
posts combined with the comments in the pynrrd/writer.py file led me to try
chunking the raw write, and it worked with no issues. I was able to load
the data in FIJI (ImageJ) and Paraview without issues.
This was with Python 3.6.4 and I ran `pip install -upgrade pynrrd` before
making any changes to pynrrd itself to make sure I had the latest release
version.
One can speculate as to what might cause this issue in underlying python -
if an unsigned 32 bit integer data type keeps track of the address to which
data is being written on the file system, or read from memory, then it
would only be able to keep track of a maximum of ~4GB of data. But that's
just conjecture of course.
Thanks for the great tool, I use pynrrd all the time!
…On Thu, Dec 26, 2019 at 9:08 AM Addison Elliott ***@***.***> wrote:
Hello, thanks for the PR. Logically, it all looks good and seems like a
good thing to have. It's good to be consistent between NRRD's with vs.
without encoding (i.e. writing in 2GB chunks).
Just so I'm clear, is the writing >4GB chunks just an issue in Python
itself? When the chunked writing for encoded NRRDs was introduced, it
seemed to be to circumvent a bug in Python but I was unclear whether that
bug was still present in recent versions of Python.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#106?email_source=notifications&email_token=ACA46RQXV5U3F5RL5NPXVN3Q2S3HJA5CNFSM4J7ATSY2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEHVTZWQ#issuecomment-569064666>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACA46RSY5CTMLGO6N25WINLQ2S3HJANCNFSM4J7ATSYQ>
.
--
Greg M. Fleishman
UCLA Bioengineering, PhD 2016
UCLA Bioengineering, MS 2014
UCSD Bioinformatics, BS 2011
|
|
Thanks for the PR! |
raw data encoding in writer did not have loop over chunks implemented. For me, in python 3.6.7, writing nrrds with raw encoding larger than ~2GB failed. Implementing this loop and writing in chunks resolved the issue. Some programs, notably in my case Paraview, do not read NRRD files with compression (which is silly) but for the time being I need to write nrrds with raw encoding.