-
-
Notifications
You must be signed in to change notification settings - Fork 18.6k
BUG: HDFStore utf8 corrupted reads #6505
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
are you looking at master? this changed from 0.13.1 this is very tricky because the decoding should work and be very fast via numpy (could be a bug their too). the vectorized decoding is quite slow, FYI (but I bet ALWAYS works) |
Yes, its on master. I'll dig deeper into it. Just wanted to already open a ticket for reference. |
perfect....note that the current code DOES pass 3.3 / 1.8 on windows / linux, but if you have a test case that breaks...by all means |
@wabu update on this? |
@wabu update on this? |
sorry for the long absence. It's still a problem, so I created a test that fails. It still is invalid for older python version but shows that there is a problem. I also tried to reproduce it with plain numpy.astype('U').astype(object), but it didn't show up. Moreover pytables can read the correct data from the hdf5 file. |
hmm |
here's a more complete test.
|
example with nonempty strings:
|
ok why don't u create a new issue (and link to this one) |
When reading back utf8 encoded data, it randomly is corrupted when read back.
The output contains random values or fails with an
UnicodeDecodeError
.After commenting out the fast version (
astype
) in_unconvert_string_array
the issue did not show up anymore.pyton 3.3, numpy 1.8.0, pytables 3.1.0
I'll try to get a reproducible test up in the next days.
The text was updated successfully, but these errors were encountered: