Thanks to visit codestin.com
Credit goes to github.com

Skip to content

problems with timestamps in matplotlib seems related to bug #9779 #11649

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
danieljmr opened this issue Jul 12, 2018 · 20 comments
Closed

problems with timestamps in matplotlib seems related to bug #9779 #11649

danieljmr opened this issue Jul 12, 2018 · 20 comments

Comments

@danieljmr
Copy link

danieljmr commented Jul 12, 2018

Bug report

Bug summary

I am gettin the following error when trying to plot dates in X-axis. It seems that the same bug was reported in #9779 bug as I already have the fix I am wondering why I am still getting the error. Maybe something I am doing wrong?

The error is:
TypeError: ufunc subtract cannot use operands with types dtype('<M8[ns]') and dtype('float64')

input data is a csv with format:

TIMESTAMP,Call Waiting,Stand Waiting,Sat Waiting,Blocked,Send email,Break,Printing Document,in Meeting,Sending Mail,Driving car,Driving bike,Driving truck,Typing,Traffic Light Wait,Enter garage,Sleeping,on Call,Browsing
2018-05-04 17:00:13.254403,,,,,,,,,,,,,,,,,,
2018-05-04 17:00:20.486367,,,8,,,,7,,,1,,,47,4,1,,,
2018-05-04 17:00:28.915123,,,8,,,,,,,,,,,,,,,
2018-05-04 17:00:30.476002,,,8,,,,,,,,,,,,,,,
2018-05-04 17:00:30.504923,,,8,,,,2,,1,,,,69,1,1,,,
2018-05-04 17:00:33.612147,,,8,,,,11,,,,1,,47,2,,,,
2018-05-04 17:00:46.782256,,,8,,,,3,,,1,,,64,1,1,,,
2018-05-04 17:00:51.550413,,,7,,,,4,,,4,,,119,3,1,,,
2018-05-04 17:00:51.746899,,,8,,,,,,,,,,,,,,,
2018-05-04 17:00:56.773144,,,8,,,,,,,,,,1,,,,,
2018-05-04 17:01:01.164004,,,8,,,,,,,,,,,,,,,
2018-05-04 17:01:14.989166,1,,8,1,,,15,,,,,,91,2,,,,
2018-05-04 17:01:26.347999,1,,8,,,,14,,,1,,,61,2,1,,,

If TIMESTAMP is not converted to np.datenum64 the X-axis will be converted to numbers but it will dispay the graph ok

Code for reproduction

data['TIMESTAMP'] = pd.to_datetime(data['TIMESTAMP'])
bar = axes.bar(data['TIMESTAMP'], data[a], bottom=margin_bottom, label=a, color=palet[a], width=0.03)
margin_bottom += data[a]

import collections
from collections import OrderedDict
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as dates
import mpld3
from mpld3 import plugins
import datetime as dt
from datetime import datetime


input_file = 'activity.csv'

palet = OrderedDict({
'Call Waiting' : 'limegreen',
'Stand Waiting' : 'gold',
'Sat Waiting' : 'goldenrod',
'Blocked' : 'darkorchid',
'Send email' : 'coral',
'Break' : 'lightcoral',
'Printing Document' : 'orangered',
'in Meeting' : 'olivedrab',
'Sending Mail' : 'plum',
'Driving car' : 'orange',
'Driving bike' : 'c',
'Driving truck' : 'skyblue',
'Typing' : 'mediumslateblue',
'Traffic Light Wait' : 'darkturquoise',
'Enter garage' : 'orchid',
'Sleeping' : 'firebrick',
'on Call' : 'indianred',
'Browsing' : 'silver'
})

# Define CSS to control tooltips
css = """
.tooltip {
    position: relative;
    display: inline-block;
}
.tooltip .tooltiptext {
    background-color: #4c4c4c;
    color: #fff;
    text-align: center;
    padding: 5px 0;
    border-radius: 6px;
}
.tooltip .tooltiptext::after {
    content: " ";
    position: absolute;
    top: 50%;
    right: 100%; /* To the left of the tooltip */
    margin-top: -5px;
    border-width: 5px;
    border-style: solid;
    border-color: transparent #4c4c4c transparent transparent;
}
"""

data = pd.read_csv(input_file,sep=',')
data = data.fillna( value = 0, axis = 0 )
data = data.sort_values('TIMESTAMP')
data['TIMESTAMP'] = pd.to_datetime(data['TIMESTAMP'])

fig = plt.figure(figsize=(20,8))
axes = fig.add_subplot(111)

margin_bottom = np.zeros(len(data.TIMESTAMP.drop_duplicates()))

for a in palet.keys():
        bar = axes.bar(data['TIMESTAMP'], data[a], bottom=margin_bottom, label=a, color=palet[a], width=0.03)
        margin_bottom += data[a]
        for rectangle in bar.get_children():
                if rectangle.get_height() > 0.0:
                        tooltip = mpld3.plugins.LineHTMLTooltip(rectangle, label="<div class=\"tooltip\"><span class=\"tooltiptext\">"+ a + ' : ' + str(rectangle.get_height()) + "</span></div>", voffset=-10, hoffset=10, css=css )
                        mpld3.plugins.connect(plt.gcf(), tooltip)

plugins.connect(fig, plugins.MousePosition(fontsize=14))

axes.legend()

mpld3.show()

Actual outcome

TypeError: ufunc subtract cannot use operands with types dtype('<M8[ns]') and dtype('float64')

Traceback (most recent call last):
  File "C:\MyDIR\MyStest.py", line 94, in <module>
    bar = axes.bar(data['TIMESTAMP'], data[a], bottom=margin_bottom, label=a, color=palet[a], width=0.03)
  File "C:\Users\ThatIsMe\AppData\Local\Programs\Python\Python36\lib\site-packages\matplotlib\__init__.py", line 1855, in inner
    return func(ax, *args, **kwargs)
  File "C:\Users\ThatIsMe\AppData\Local\Programs\Python\Python36\lib\site-packages\matplotlib\axes\_axes.py", line 2259, in bar
    left = x - width / 2
TypeError: ufunc subtract cannot use operands with types dtype('<M8[ns]') and dtype('float64')

Expected outcome

X-axis showing dates. Dates can be plotted when using pandas directly but I need to perform customization like fixed colors for data types plotted as bars and sequence of bar stack

Matplotlib version

INSTALLED VERSIONS

commit: None
python: 3.6.5.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 78 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: English_United States.1252

pandas: 0.22.0
pytest: None
pip: 10.0.1
setuptools: 39.0.1
Cython: None
numpy: 1.14.3
scipy: 1.1.0
pyarrow: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.7.2
pytz: 2018.4
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.2.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

pip

@jklymak
Copy link
Member

jklymak commented Jul 13, 2018

You'll get more help w/ this if you minimize it and get rid of extra stuff like mpld3 (whatever that is). Its hard for me to tell if the issue is matplotlib, or that pd.to_datetime is not returning what you think it is...

@ImportanceOfBeingErnest
Copy link
Member

#9779 is about the plot function. Here you use the bar function.

A code for reproduction seems to be the following:

import matplotlib.pyplot as plt
import numpy as np
from datetime import datetime
import pandas as pd

dates = [datetime(2018,7,i) for i in range(1,15)]
values = np.cumsum(np.random.rand(len(dates)))

df = pd.DataFrame({"dates":dates, "values" : values})
df["dates"] = pd.to_datetime(df["dates"])

plt.bar(df["dates"],  df["values"].values)

plt.show()

This fails with the same error TypeError: ufunc subtract cannot use operands with types dtype('<M8[ns]') and dtype('float64').

The solution (for now) is to use the numpy array provided by .values instead of the pandas.Series.

plt.bar(df["dates"].values,  df["values"].values)

works as expected.

@ImportanceOfBeingErnest
Copy link
Member

But I totally agree that it is somehow confusing that

plt.plot(df["dates"], df["values"].values)

works fine, while

plt.bar(df["dates"], df["values"].values)
# or
plt.scatter(df["dates"], df["values"].values)

does not. Essentially the same issue, but for scatter, is raised in #11391.

@jklymak
Copy link
Member

jklymak commented Jul 13, 2018

Okay....

plot eventually does:

print('pre x', x, type(x), type(x[0]))
if x.ndim == 1:
    x = x[:, np.newaxis]
print('x after', x, type(x), type(x[0, 0]))

and this converts x from a pandas time series to a numpy datetime64 array:

pre x 0   2018-07-01
1   2018-07-02
2   2018-07-03
3   2018-07-04
Name: dates, dtype: datetime64[ns] <class 'pandas.core.series.Series'> <class 'pandas._libs.tslib.Timestamp'>

final x [['2018-07-01T00:00:00.000000000']
 ['2018-07-02T00:00:00.000000000']
 ['2018-07-03T00:00:00.000000000']
 ['2018-07-04T00:00:00.000000000']] <class 'numpy.ndarray'> <class 'numpy.datetime64'>

So for plot the pandas time series actually gets converted to numpy.datetime64 and we use our converter.

For scatter, bar, and everything else, we use panda's converter, which doesn't appear to convert to matplotlib datetimes (at least not natively).

So, not sure what the right thing to do here is. We could easily pre-process x and y the way plot does, but that seems risky. In fact I'd kind of argue plot is doing the wrong thing and should not be automatically converting to a np array so early. But I don't know why the processing is done that way.

@jklymak
Copy link
Member

jklymak commented Jul 13, 2018

Checked blame, and the numpy typecast came into plot before the great split of axes.py, so its pretty hard to know where it came from...

@ImportanceOfBeingErnest
Copy link
Member

Wow it's kind of depressing to see what I always considered as a great feature turning out to be pure coincidence.

In any case, this "feature" is used a lot in the wild, so even though I somehow agree that the conversion may be wrong in fact, I would argue that one should not break many people's codes by removing it.

In general, I could imagine that adding a couple of pandas examples to the matplotlib gallery would be really helpful; but the problem is that due to pandas not being a dependency, those couldn't be run through sphinx-gallery, producing any output figures.

@danieljmr
Copy link
Author

Thanks,

The solution proposed by Elan Ernest indeed converted dates correctly but as a side effect I got got weird behavior with the bars.

Before (not using .values, not doing df["dates"] = pd.to_datetime(df["dates"])
image

After
image

However this seems to be a problem out of the scope of this topic.

Does the workaround proposed by Elan Ernest deserves an entry in https://matplotlib.org/gallery/recipes/common_date_problems.html? I was trying to find a solution for this problem there before opening this issue

Thank you very much again

@danieljmr
Copy link
Author

Anyway I would appreciate if someone has a tip for solving both problems as I used to have this problem before and this was one of the reasons I stopped using values :)

@jklymak
Copy link
Member

jklymak commented Jul 13, 2018

@ImportanceOfBeingErnest - I guess I think the behaviour of plot versus scatter (for instance) was just masking the fact that the Pandas date converter isn't doing its job. When it gets called, it needs to take the pandas time series and turn it into a numpy array of Matplotlib datenums. It really doesn't seem to be doing that, so maybe someone should open a bug report there (I'm hesitant to, just because I don't use pandas)

If that is fixed, then maybe we can talk about making sure plot preserves its x-data in the original form until convert is called, rather than casting to a numpy data type so early.

Getting this right should all be thought about as part of the units MEP. ping @dopplershift @tacaswell

@jklymak
Copy link
Member

jklymak commented Jul 13, 2018

Does the workaround proposed by Elan Ernest deserves an entry in https://matplotlib.org/gallery/recipes/common_date_problems.html? I was trying to find a solution for this problem there before opening this issue

@danieljmr you are using pandas... As you can probably tell, pandas is not a native part of matplotlib; its a little hard for us to specify "common problems" for all the downstream packages (though of course we can/should try and smooth out any problems if we can help on our end).

@tacaswell
Copy link
Member

@danieljmr you are using pandas... As you can probably tell, pandas is not a native part of matplotlib; its a little hard for us to specify "common problems" for all the downstream packages (though of course we can/should try and smooth out any problems if we can help on our end).

Pandas and tabular data is a big use case for our users. Even though we do not directly depend on pandas, we need to make sure we work well with it. Pandas is defiantly a special case of our down-stream consumers.

In any case, this "feature" is used a lot in the wild, so even though I somehow agree that the conversion may be wrong in fact, I would argue that one should not break many people's codes by removing it.

I do not understand the details of this bug report, but in general I agree. There is a difference between the "official public" API and the "what users rely on so it is defacto public" API which is one of the biggest challenges of long-term maintenance of Matplotlib.

@jklymak
Copy link
Member

jklymak commented Jul 13, 2018

I agree it’d be nice for us to work well w pandas; I guess the question is how much do we do natively and how much do we ask them to do. On this bug, which is pretty long standing from what I can tell, I think the ball is in pandas court because it is their converter that isn’t converting. That plot works is because we never give their converter the chance to work in plot.

@tacaswell
Copy link
Member

This needs to be a collaborative effort with pandas (users don't care whos problem it is, they just want it to work!). I have added this to the to-do list for the scipy sprints. Hopefully we can got a mpl dev and a pandas dev to sit together and sort out where this is going sideways.

attn @TomAugspurger

@TomAugspurger
Copy link
Contributor

I won't be available during sprints, but if people need pandas questions answered they can post here and I'll try to answer when I'm back online.

@ImportanceOfBeingErnest
Copy link
Member

@danieljmr To solve the remaining problem of your bars being too wide, you need to set a useful width for them. The units of matplotlib datetime plots are days, so the width needs to be set in fractions of a day. E.g. if you wanted to have hourly wide bars, you'd set width=1./24, if you wanted minute wide bars width=1./24/60 etc. You may also use an array for different width bars.

@TomAugspurger There isn't really any question to be answered here; it's more that it needs to be discussed where matplotlib and pandas can meet when it comes to datetime plotting.
I'd say ideally all of the following should produce the same plot

  • plt.scatter(df.x.values,df.y.values)
  • plt.scatter(df.x,df.y)
  • plt.scatter(x="x", y="y", data=df)
  • df.plot.scatter(x="x", y="y")

even with datetimes involved.
In principle the same with bars, although this is a bit harder, because pandas treats those as categorical per se (which is also a major point of confusion judging from the numerous stackoverflow questions on that topic - half the people wonder why matplotlib bar plots are numeric, the other half wonders why pandas bar plots are categorical - on all levels of understanding).
One point to start would be to discuss in whose responsibility the datetime conversion lies when it comes to plt.bar(df["dates"], df["values"].values) failing as in this issue.

@danieljmr
Copy link
Author

Thank you Elan Ernest 🥇 ,

Passing width=0.00001 (seconds scale) to axes.bar worked! Now I got the dates converted and my bars were nicely rendered.
image

Thank you contributors for your support.
pandas and matplotlib is a powerful combination. A good interaction between both libraries will be beneficial for both ends... and of course for us... the users

@danieljmr
Copy link
Author

Gentlemen,

At this point, having all my questions answered, I will proceed with the closure.

Great support!

@jklymak jklymak reopened this Jul 14, 2018
@jklymak
Copy link
Member

jklymak commented Jul 14, 2018

This still identifies a bug that we need to talk about.

@jklymak
Copy link
Member

jklymak commented Jul 15, 2018

Note that if we deregister the pandas converters this works fine:

import matplotlib.pyplot as plt
import numpy as np
from datetime import datetime
import pandas as pd

pd.plotting.deregister_matplotlib_converters()

dates = [datetime(2018,7,i) for i in range(1, 5)]
values = np.cumsum(np.random.rand(len(dates)))

df = pd.DataFrame({"dates":dates, "values" : values})

plt.plot(df["dates"],  df["values"])
plt.scatter(df["dates"],  df["values"])

plt.show()

@jklymak
Copy link
Member

jklymak commented Jul 15, 2018

Sorry, I will close this in favour of #11391 where the use cross section is more complete....

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants