-
-
Notifications
You must be signed in to change notification settings - Fork 11k
BUG: astype(object) downcasts for datetime-dtype #12550
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Note that the result is usually a
|
Fair enough, as long as it's not an int... ;-) |
Well in this case, it doesn't fit in a date time, so we don't have that option. Is it better to return:
Right now, we do 3. |
To me the answer is clearly 1 (should be clear that a mixture is bound to inconsistencies...). TBH, I was really surprised that (from your example):
rather than
This is also inconsistent because passing an
|
Well, the current state does seem strange. But I am not sure I like to change the behaviour because of something that on first sight seems rather hypothetical? (I mean changing the behaviour to returning datetime64, which seems pretty different.) Do you actually work with dates past 10000 years? If that is a real use case, maybe we should see that python allows it rather? |
I don't really get that point. The underlying values were
This is not an issue of date range limitations (to me). I'm writing lots of parametrized tests for a larger pandas-PR, and this discrepancy between |
This is annoying, but... If you look closer, you will notice that also other types cast to the python version. For datetime this is a bigger step/change and also annoying. What I mean is that we use
So, yes, this is not a safe cast. And yes it is utterly broken. But, I need a a lot more to be convinced that changing the cast from returning If such a fix would actually fix bugs for downstream in the long run, it would be more compelling. But it seems to me that for most users the bug will be very mild... |
Note that this discrepancy occurs also for all of the numerical types in numpy, even if it may be just as surprising there. |
@seberg I can work around the |
I noticed that years ago I once glanced at changing the default behaviour for all types here (so that the object cast would retain the numpy type). But I am not quite sure we should do it. EDIT: or well, aim for it maybe. First, there is no way to warn about it. Second, we probably have no clue about how disruptive that would be. |
.astype(np.generic) might be an interesting way to allow spelling that,
@seberg
…On Sun, Dec 30, 2018, 3:13 PM Sebastian Berg ***@***.*** wrote:
I noticed that years ago I once glanced at changing the default behaviour
for all types here (so that the object cast would retain the numpy type).
But I am not quite sure we should do it.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#12550 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAZ9LErr9lbXUlPGXTUYlxAzhIqfediNks5u-MnugaJpZM4ZUOsf>
.
|
There seem to be some similar odd casts noted in gh-5180, will close that one in favor if this one. |
How about adding a keyword argument to |
There seems to be tons of discussion, but it's clear that when I do x.astype(datetime) and it returns an integer instead of a datetime is a serious bug. Just wasted far too long tracking this down, and was completely shocked to realize that the type was wrong after ".astype(datetime)". Error message or options would be great. |
@eraoul yes, I think we should do something. An error, or at least a warning seems fair to me (it is within the casting machinery, so it may take a bit of care). PRs are welcome, I can give you pointers if you want to look into it. |
I commented a bit on the sister issue #7619, suggesting that I'm fine with either solution, but certainly, the |
It seems we converged to the following when talking about it today:
Note that this would include the conversion of Please just comment on this, doing the actual change should be fairly straight forward. EDIT: Marking with milestone to not forget it, please feel free to move when it comes to it. |
@seberg Still, I think that valid |
This is to avoid implicitly casting e.g. datetime64 to (python-)datetimes when using operations - like .astype(object) - that are expected not to change the type. Fixes numpy#12550. Co-Authored-By: Sebastian Berg <[email protected]>
@h-vetinari Are you still working on this? |
Hey @charris, thanks for checking in; fundamentally yes, I still want to fix it, but since I read somewhere that 1.21 branches in May, that's not gonna happen until then. #18683 is my first foray into the entrails of numpy, and that's a pretty large initial hurdle to overcome (and work/life hasn't been kind recently). |
Bumped to |
Started a branch with the just the diff proposed by @seberg here, very similar to #18683. Eventually ended up scaling it back to only try to change the cases that give integers. So the diff looks like
and added tests
With this, everything in numpy/core/tests/test_datetime.py passes. But I still get a segfault in numpy/core/tests/test_array_coercion.py::TestTimeScalars::test_coercion_timedelta_convert_to_number I also tried to just call convert_timedelta_to_pyobject/convert_datetime_to_pyobject once in each function but that gave compile-time errors that I punted on. @h-vetinari think you can get this over the finish line? |
Started looking at this again. I don't have a compiler stack set up on the only machine I have easily available currently, so I'm mostly restricted to CI. Might set up a VM if it becomes necessary.
I started digging a bit, but I'm not familiar at all with the numpy code-base. Any pointers which pieces I should look at to avoid the kind of recursion you mentioned? |
Pushing this off again unfortunately since branching is getting close. Please ping to bump, although IIRC the whole thing was a bit tricky to fix unfortunately. |
I don't know if anyone has the cycles/enthusiasm to dive into this one again. But I fixed some infinite recursions here recently which may make this more feasible. |
The dtype
np.object_
is the most general catch-all type to contain arbitrary python objects. As such,.astype(object)
should preferably not upcast, but certainly never downcast:Reproducing code example:
The expected outcome for the last two lines is:
as well as
Numpy/Python version information:
The text was updated successfully, but these errors were encountered: