Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@billyc
Copy link
Contributor

@billyc billyc commented Mar 14, 2017

Without compression, orca.run() and orca.write_files() produce massive output files, several gigabytes in size.

Adding a simple HDF5-standard zlib compression parameter cuts output file size by 80%, while adding just a few seconds to I/O while writing.

  • The new compress parameter is optional and the default is False, so existing UDST code does not need to be modified and will not be affected by this new option, if people do not want smaller files.

  • zlib is not a very fast compression algorithm, but it is included as standard in every HDF5 implementation. It's the correct choice.

Without compression, write_files() produces massive output files on
the order of gigabytes per run.

Adding a simple HDF5-standard zlib compression parameter cuts output
file size by 75%, and adds a few seconds to I/O while writing.

zlib is not a very fast compression algorithm, but it is included as
standard in every HDF5 implementation.
@coveralls
Copy link

coveralls commented Mar 14, 2017

Coverage Status

Coverage increased (+0.004%) to 96.527% when pulling 672a2fa on billyc:master into d8df9f8 on UDST:master.

@janowicz
Copy link
Contributor

This is great, thanks! I tried this out and see the ~80% reduction in the size of output .h5. In my case, the write-out with compression took about 6 seconds longer.

@bridwell
Copy link
Contributor

That's awesome. Does the compression impact read times at all?

@billyc
Copy link
Contributor Author

billyc commented Mar 14, 2017

@bridwell Does not seem to affect read time on my machine -- at least, it's within margin of error given everything else happening on my desktop.

(edit) -- in fact, sometimes it's faster since there is so much less file to read.

@waddell
Copy link
Member

waddell commented Mar 15, 2017

Thanks, Billy. I've used the compression library on large HDF5 files before, but had not checked to see that we were not using it as a default in orca. Good call.

@janowicz janowicz merged commit f49398e into UDST:master Mar 15, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants