-
Notifications
You must be signed in to change notification settings - Fork 18.6k
Allow H5T_INTEGER in HDF5 files #2978
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
src/caffe/util/hdf5.cpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you remove this and the below LOG(INFO)
s (or change to VLOG
, or one of the LOG
variations that only happens the first time it's hit, if you prefer)? I think this creates too much noise while training nets with an HDF5DataLayer, especially if using relatively small HDF5 files.
This LGTM except as commented above. Thanks @lukeyeager! |
e1da824
to
ebc9963
Compare
Updated |
Great, thanks again @lukeyeager! |
Allow H5T_INTEGER in HDF5 files
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops -- I missed this in my review: we should have kept the script_dir
-relative paths here and below. I'll fix this in the near future (or feel free to send a patch, @lukeyeager or anyone else).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like I had a good reason for doing this, but I can't remember what it was now...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh right, this lets you run the generate_sample_data.py
script from the src/caffe/test/test_data/
directory and still generate the correct paths in the text file. Otherwise, if you run the script again, the paths in the textfiles change from:
src/caffe/test/test_data/sample_data.h5
to:
/home/lyeager/caffe/caffe/src/caffe/test/test_data/sample_data.h5
I chose to fix it this way. The other way to fix it would be to remove the os.path.abspath()
call on line 8. Would you rather me fix it that way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah...thanks for the explanation, I didn't realize that was an issue when I merged #2274. In retrospect I think we should have just stuck with the existing behavior which required this script to be run from Caffe root, unless we were going to move to running this script to generate the data on-the-fly at test time (rather than tracking the test data files as we do now), which I think is probably a better way to do it (but might be rather involved w.r.t. the potential payoff...). Anyway, I agree with you & retract my previous suggestion, and I think we should probably revert most or all of #2274 since we can't use the script_dir
-relative paths everywhere.
Is it ok to use |
Try it out and let me know. It works for me, so I'm assuming |
I made a simple test, no problem. I had been lazy not verifying myself. Thanks for adding this feature.@lukeyeager |
Caffe is currently hardcoded to only support the
H5T_FLOAT
datatype class (originally added by @sergeyk in #203). All I had to do was allow theH5T_INTEGER
class and now I can use HDF5 datasets created withdtype='uint8'
. That lets me create datasets with much smaller filesizes, comparable to LMDB sizes with uncompressed data (see comparison here - NVIDIA/DIGITS#226 (comment)).Was there a particular reason that
H5T_INTEGER
was disallowed?