Thanks to visit codestin.com
Credit goes to github.com

Skip to content

provide converters for datetime64 types #9610

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
dstansby opened this issue Oct 29, 2017 · 33 comments · Fixed by #9726
Closed

provide converters for datetime64 types #9610

dstansby opened this issue Oct 29, 2017 · 33 comments · Fixed by #9726
Labels
Difficulty: Medium https://matplotlib.org/devdocs/devel/contribute.html#good-first-issues third-party integration topic: date handling
Milestone

Comments

@dstansby
Copy link
Member

See https://travis-ci.org/matplotlib/matplotlib/jobs/294376461#L2096 for more details

@dstansby
Copy link
Member Author

@dstansby
Copy link
Member Author

Hmm, actually seems to be because the units aren't being converted by the call to self.convert_xunits

@dstansby
Copy link
Member Author

dstansby commented Oct 29, 2017

The issue is this line:

X, Y, C = [np.asanyarray(a) for a in args]

Running a 'pandas.core.indexes.datetimes.DatetimeIndex' object through np.asanyarray now returns a 'numpy.ndarray', so the unit conversion isn't done properly later. I don't have time to throw together a fix, but hopefully this helps!

@tacaswell tacaswell added this to the v2.1.1 milestone Oct 29, 2017
@anntzer anntzer added the Release critical For bugs that make the library unusable (segfaults, incorrect plots, etc) and major regressions. label Oct 29, 2017
@jklymak
Copy link
Member

jklymak commented Nov 9, 2017

Thats a bother. Particularly as we don't like numpy.datetime64 which is the dtype it gets converted to.

A fix is to check for datetime64 arrays after the call above, and convert (back) to pandas after....

Another fix is for the units registry to recognize datetime64 arrays and convert there....

@story645
Copy link
Member

story645 commented Nov 9, 2017

Another fix is for the units registry to recognize datetime64 arrays and convert there....

Pandas does it in their registry code: link

def register():
    units.registry[lib.Timestamp] = DatetimeConverter()
    units.registry[Period] = PeriodConverter()
    units.registry[pydt.datetime] = DatetimeConverter()
    units.registry[pydt.date] = DatetimeConverter()
    units.registry[pydt.time] = TimeConverter()
    units.registry[np.datetime64] = DatetimeConverter()

so I'm wondering if this is an upstream fix that needs to happen on their end...

@jklymak
Copy link
Member

jklymak commented Nov 9, 2017

@story645 Hmmm, very confused. Do we call their converter? It looks like they rewrote matplotlib's for their plotting?

@jklymak
Copy link
Member

jklymak commented Nov 9, 2017

No, I don't think so, though we refer to their converter as a good way to get from np.datetime64 to datetime. https://matplotlib.org/faq/howto_faq.html#plot-numpy-datetime64-values

I wonder why we don't just use this ourselves?

@efiring
Copy link
Member

efiring commented Nov 9, 2017

The point is that they used to register their converter when pandas was imported; now they do it only when they need it for their plotting. This makes sense in that having our code changed by the simple act of importing pandas is intrusive.
Their converter is their own, not a rewrite of ours. We have never added datetime64 support of our own.

@story645
Copy link
Member

story645 commented Nov 9, 2017

The don't only import it on plotting because they import their convertor in their timeseries code

from pandas.plotting._converter import (register, time2num,
                                        TimeConverter, TimeFormatter,
                                        PeriodConverter, get_datevalue,
                                        DatetimeConverter,
                                        PandasAutoDateFormatter,
                                        PandasAutoDateLocator,
                                        MilliSecondLocator, get_finder,
                                        TimeSeries_DateLocator,
                                        TimeSeries_DateFormatter)

so I think we're still hitting it. Otherwise I think plotting would be totally broken for pandas timeseries objects if not done through their plotting api.

Eta: matplotlib's dates.py only supports datetime objects

units.registry[datetime.date] = DateConverter()
units.registry[datetime.datetime] = DateConverter()

@jklymak
Copy link
Member

jklymak commented Nov 9, 2017

OK, I'm slowly understanding...

I run

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

time = pd.date_range('2000-01-01', periods=20)
depth = np.arange(20)
data = np.random.rand(20, 20)

fig, ax = plt.subplots()
ax.pcolormesh(time, depth, data)

For pandas-0.20 things run fine, and the units registry has keys:

dict_keys([<class 'datetime.date'>, <class 'datetime.datetime'>, 
<class 'pandas._libs.tslib.Timestamp'>, <class 'pandas._libs.period.Period'>, 
<class 'datetime.time'>, <class 'numpy.datetime64'>, <class 'str'>, <class 'numpy.str_'>, 
<class 'bytes'>, <class 'numpy.bytes_'>])

For pandas-0.21, we get the error, and the units registry only has keys:

dict_keys([<class 'str'>, <class 'numpy.str_'>, <class 'bytes'>, <class 'numpy.bytes_'>, 
<class 'datetime.date'>, <class 'datetime.datetime'>])

If I include

import pandas.plotting._converter as pandacnv
pandacnv.register()

then I get back the pandas stuff and the example runs fine.

dict_keys([<class 'str'>, <class 'numpy.str_'>, <class 'bytes'>, <class 'numpy.bytes_'>,
 <class 'datetime.date'>, <class 'datetime.datetime'>, <class 'pandas._libs.tslib.Timestamp'>, 
<class 'pandas._libs.period.Period'>, <class 'datetime.time'>, <class 'numpy.datetime64'>])

So, as @efiring said, what happened here was that pandas 0.21 moved the import of their registers out of the base import, and into their plotting area. More to the point, they now explicitly require the call of register().

FWIW, our pandas-0.21 plotting is indeed broken for other examples:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

time = pd.date_range('2000-01-01', periods=20)
depth = np.arange(20)
data = np.random.rand(20, 20)

fig, ax = plt.subplots()
ax.plot(time, data[3,:])
plt.show()

... doesn't outright fail but gets:

figure_1

Whereas it is decoded correctly in pandas-0.20.

@jklymak
Copy link
Member

jklymak commented Nov 9, 2017

Yeah, pandas.plotting.__init__ used to call https://github.com/pandas-dev/pandas/blob/3a7f956c30528736beaae5784f509a76d892e229/pandas/plotting/__init__.py#L7

    from pandas.plotting import _converter
    _converter.register()  # needs to override so set_xlim works with str/number

But that was removed in 0.21 (pandas-dev/pandas#17710)

Not at all sure what to do here.

  1. We could call _converter.register() from dates.py and load up all their calls. Pandas older than 0.20 will not work.
  2. We could implement our own handling of datetime64 support.
  3. We could tell panda's users to run _converter.register() themselves (and update our examples/tests).
  4. I guess we could ask Pandas to go back to making the import pandas register their units.

Option 1 May break examples and code that didn't load pandas.
Option 2 May break some pandas examples
Option 3 Is inelegant for pandas users who are bypassing panda's plotting functions.

@jklymak
Copy link
Member

jklymak commented Nov 9, 2017

See also pandas-dev/pandas#18153 Looks like the advice there (@TomAugspurger) is to use the extra verbiage.

@jklymak jklymak changed the title pcolormesh with pandas date_range broken with pandas 0.21 Plotting with pandas date_range broken with pandas 0.21 Nov 9, 2017
@story645
Copy link
Member

story645 commented Nov 9, 2017

I'm not sure how to do one without making pandas an optional dependency of matplotlib, unless the plan is to only do it in tests. looks like in #18153 that their plan is to eventually basically implement 2.

@TomAugspurger
Copy link
Contributor

Sorry about the headaches here.

If matplotlib is willing to accept it, I can put in the work to port the relevant bits of https://github.com/pandas-dev/pandas/blob/8dac633142daa8d5bcd0cf77ad89b97628d474eb/pandas/plotting/_converter.py over to matplotlib. Basically: does NumPy want a formatter, locator, etc. for datetime64s? Would adding that formatter be considered an API change that people would need to opt in to?

@jklymak
Copy link
Member

jklymak commented Nov 9, 2017

@TomAugspurger. No problem! Most of the flail on my part was just not knowing what was going on

After thinking about it I realized what @efiring said above is absolutely correct and what you did in pandas is correct. Import pandas should not have silently registered new units for exactly the reason that it’s very confusing. I changed the tests that depend on pandas to simply call register

With respect to datetime64 I personally think adding that would be fabulous. I do wonder about taking a step backwards and deciding what to do about matplotlibs date handling. One could even imagine moving the date handling out of pandas altogether and having what pandas has as the matpltolib default or at least the non pandas dependent parts.

@story645
Copy link
Member

story645 commented Nov 9, 2017

Import pandas should not have silently registered new units for exactly the reason that it’s very confusing.

But I think kind of the point of the units framework is to silently handle library specific data types so the user doesn't explicitly have to deal with it. In this case, moving the code to mpl kind of makes sense because matplotlib imports numpy, but then what's the solution for datatypes matplotlib doesn't import?
Making users explicitly convert importers seems to kind of defeat the purpose, but the framework is also kinda designed so that the convertors should live in the third party library.

@jklymak
Copy link
Member

jklymak commented Nov 9, 2017

@story645 So you think pandas should keep the register in their top-level import path?

Right now we silently register our date handling and categoricals. I can see your point, but I actually think it makes things confusing. How do I know, as a user, what units handling is present?

Right now, using pandas 0.20, I have no idea what the units handling and tick handling etc is for a datetime.datetime object. Does it use the default documented Matplotlib date handling? Does it use pandas? I'm not sure what takes precedence if I have import pandas. Whats worse is that there is very little user-facing documentation of all this. I've been using Matplotlib since 2012, and I use dates all the time. I frankly found the date handling to be in such a state that 99% of the time I just switch to Julian-days early in my data processing so I don't have to think about it.

Ref #9713

@story645
Copy link
Member

story645 commented Nov 9, 2017

So you think pandas should keep the register in their top-level import path?

I dunno honestly. I agree that pandas shouldn't have a required mpl dependency, so I understand why they don't want it in their top-level import path, but I also think it's unwieldy to require from pandas.plotting import _converter; _converter.register() at the top of any script that plots timeseries unless the plan is push people to always use the pandas plotting routines for dates.

Right now, using pandas 0.20, I have no idea what the units handling and tick handling etc is for a datetime.datetime object. Does it use the default documented Matplotlib date handling? Does it use pandas

I sort of think that's a fair point, except I'm not sure non-dev users would care, and Pandas doesn't register the same datatypes as data.py so devs can easily find the answer.

I frankly found the date handling to be in such a state that 99% of the time I just switch to Julian-days early in my data processing so I don't have to think about it.

My very first patch was for scikits-timeseries in like 2011, so I remember too. But it's gotten way way better, in large part 'cause of pandas and the stuff it does semi-automagically.

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Nov 9, 2017

So you think pandas should keep the register in their top-level import path?

FWIW, the change in pandas 0.21.0 was to not do a try: import matplotlib; except: ... on an import pandas. The motivation was to reduce the import time of pandas. I don't think we'll be changing it back.

Not registering the converters is a side-effect of that change. I think it's for the best. We've had questions in the past about "why does importing pandas mess up my plots, when I don't even use it. Now we just have questions on the other side :)

And, I think moving pandas' locator / converter for datetimes to matplotlib would be a good thing, so that everyone benefits from them, not just pandas.

@jklymak
Copy link
Member

jklymak commented Nov 9, 2017

@TomAugspurger I personally agree - for now we need to add a specific register() to the tests that rely on pandas (#9726) , and then assume pandas has documented how to use matplotlib with their library (because we never mention pandas in our docs, except for the FAQ entry you linked).

I think a PR to add more date handling would be most welcome, but its not urgent (i.e. would likely be a 2.2 milestone)

I'm not particularly happy with the units/converter handling as it is now (see #9713) but that seems a bigger issue. I think advice from the pandas side would be very helpful as your use of this framework is very complete and seems very robust.

@story645
Copy link
Member

story645 commented Nov 9, 2017

And, I think moving pandas' locator / converter for datetimes to matplotlib would be a good thing, so that everyone benefits from them, not just pandas.

Agree here, but what happens to the pandas specific data types?

@dstansby
Copy link
Member Author

Is there a reason this is closed?

@story645
Copy link
Member

Accidentally/automagically 'cause #9726 got merged and fixes it on the test side.

@story645 story645 reopened this Nov 13, 2017
@jklymak
Copy link
Member

jklymak commented Nov 13, 2017

What is the remaining issue?

@dstansby
Copy link
Member Author

dstansby commented Nov 13, 2017

That

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

time = pd.date_range('2000-01-01', periods=20)
depth = np.arange(20)
data = np.random.rand(20, 20)

fig, ax = plt.subplots()
ax.plot(time, data[3,:])
plt.show()

doesn't work without manually registering pandas' converter. (I just spent a while trying to work out why basically all my plotting code was broken before realising it was this bug)

@jklymak
Copy link
Member

jklymak commented Nov 13, 2017

MPL never handled registering the pandas units converter. It was pandas in import pandas. They've decided it should move into tseries instead and be a manual register. You can try and change their mind, but its in their court not ours. The only place we discuss this is in the FAQ: https://matplotlib.org/faq/howto_faq.html#plot-numpy-datetime64-values and in the tests

@dstansby
Copy link
Member Author

I agree it's in our court, but I also think that doing ax.plot(data['Displacement']) where data is a DataFrame has a datetime index should be possible without faffing around with registers, since lots of scientific data is time series.

@efiring
Copy link
Member

efiring commented Nov 13, 2017

I think this should be re-framed as datetime64 support in mpl, not pandas support. Pandas uses one very restricted flavor of datetime64.

@tacaswell tacaswell modified the milestones: v2.1.1, v2.2 Nov 13, 2017
@tacaswell tacaswell added Difficulty: Medium https://matplotlib.org/devdocs/devel/contribute.html#good-first-issues and removed Release critical For bugs that make the library unusable (segfaults, incorrect plots, etc) and major regressions. labels Nov 13, 2017
@tacaswell tacaswell changed the title Plotting with pandas date_range broken with pandas 0.21 provide converters for datetime64 types Nov 13, 2017
@tacaswell
Copy link
Member

Changed the title, moved to 2.2 and updated the labels.

@jorisvandenbossche
Copy link

There actually already was an issue for "datetime64 support in mpl": #1097, and which was closed recently deferring to pandas ... (but of course, situation has changed in the meantime)

@tacaswell I looked into doing that a while ago and iirc it very quickly ended up in the pandas c-extensions and internal details of the time objects in pandas....

@TomAugspurger I also once looked at it, and think agree with this that it is not that easy to port the full pandas functionality (although it is a long time ago).
But I think adding a basic Converter that just understands numpy datetime64 and converts this to matplotlib's internal float date numbers should be rather easy. The bulk of the code in the pandas implementation is the 'smarter' locator/formatter for fixed frequency data. But that might not be necessary for matplotlib (or at least not for a first basic support)

@jorisvandenbossche
Copy link

FYI, given the many feedback we got about this, we are considering on the pandas side to temporarily undo this change in the upcoming bug fix release to have a smoother deprecation period of the automatic registering of our converters: pandas-dev/pandas#18301

@tacaswell
Copy link
Member

Closed by #9779

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Difficulty: Medium https://matplotlib.org/devdocs/devel/contribute.html#good-first-issues third-party integration topic: date handling
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants