Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Axes.violinplot has small issue in using pandas.DataFrame without index 0. #15272

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wolfsonliu opened this issue Sep 16, 2019 · 7 comments · Fixed by #16530
Closed

Axes.violinplot has small issue in using pandas.DataFrame without index 0. #15272

wolfsonliu opened this issue Sep 16, 2019 · 7 comments · Fixed by #16530

Comments

@wolfsonliu
Copy link

Bug report

Bug summary

When using pandas.DataFrame without index 0, the violinplot would raise KeyError.

Code for reproduction
In Axes method violinplot:

def violinplot(self, dataset, positions=None, vert=True, widths=0.5,
                   showmeans=False, showextrema=True, showmedians=False,
                   points=100, bw_method=None):

there a helper function using index 0, which is not convenient for pandas.DataFrame without index 0.

        def _kde_method(X, coords):
            # fallback gracefully if the vector contains only one value
            if np.all(X[0] == X):
                return (X[0] == coords).astype(float)
            kde = mlab.GaussianKDE(X, bw_method)
            return kde.evaluate(coords)

Actual outcome

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-251-219523b1f9f5> in <module>
----> 1 plt.violinplot(rdata)

/usr/lib64/python3.7/site-packages/matplotlib/pyplot.py in violinplot(dataset, positions, vert, widths, showmeans, showextrema, showmedians, points, bw_method, data)
   3021         showmeans=showmeans, showextrema=showextrema,
   3022         showmedians=showmedians, points=points, bw_method=bw_method,
-> 3023         **({"data": data} if data is not None else {}))
   3024 
   3025 

/usr/lib64/python3.7/site-packages/matplotlib/__init__.py in inner(ax, data, *args, **kwargs)
   1796                         "the Matplotlib list!)" % (label_namer, func.__name__),
   1797                         RuntimeWarning, stacklevel=2)
-> 1798             return func(ax, *args, **kwargs)
   1799 
   1800         inner.__doc__ = _add_data_doc(inner.__doc__,

/usr/lib64/python3.7/site-packages/matplotlib/axes/_axes.py in violinplot(self, dataset, positions, vert, widths, showmeans, showextrema, showmedians, points, bw_method)
   7915             return kde.evaluate(coords)
   7916 
-> 7917         vpstats = cbook.violin_stats(dataset, _kde_method, points=points)
   7918         return self.violin(vpstats, positions=positions, vert=vert,
   7919                            widths=widths, showmeans=showmeans,

Expected outcome
A violin plot.

Matplotlib version

  • Operating system: Fedora 30
  • Matplotlib version: matplotlib: v3.0.3
  • Matplotlib backend (print(matplotlib.get_backend())): module://ipykernel.pylab.backend_inline
  • Python version: 3.7
  • Jupyter version (if applicable): 4.4.0
  • Other libraries:
@jklymak
Copy link
Member

jklymak commented Sep 16, 2019

This is pretty similar to #15162

@jklymak
Copy link
Member

jklymak commented Sep 16, 2019

Can you give us a self-contained example?

@jklymak
Copy link
Member

jklymak commented Sep 16, 2019

Probably should be changed to

 np.all(cbook.safe_first_element(X) == X):

@timhoffm
Copy link
Member

Probably should be changed to

 np.all(cbook.safe_first_element(X) == X):

Not really. Iterating over a DataFrame yields the column names.

@jklymak
Copy link
Member

jklymak commented Sep 16, 2019

I assume somehow the column names have been resolved by this point? But I'm not familiar enough w/ pandas to concoct my own example

@timhoffm
Copy link
Member

Let's wait for an example.

@jklymak jklymak added the status: needs clarification Issues that need more information to resolve. label Sep 16, 2019
@wolfsonliu
Copy link
Author

wolfsonliu commented Sep 17, 2019

Sorry for the delay of reply. I have tried hard to make code and data small enough.

The following is the code and data that might result in the error.

import pandas as pd
import matplotlib.pyplot as plt

rdata = list()
for rname in ['one', 'two']:
    tmp = pd.read_csv(rname + '.txt', sep='\t', header=None).set_index(0)
    rdata.append(
        tmp[1]
    )

plt.violinplot(rdata)

data in the code.

one.txt
two.txt


However, if I reset the data index after reading, the error will not appear at all.

import pandas as pd
import matplotlib.pyplot as plt

rdata = list()
for rname in ['one', 'two']:
    tmp = pd.read_csv(rname + '.txt', sep='\t', header=None).set_index(0)
    rdata.append(
        tmp[1].reset_index(drop=True)
    )

plt.violinplot(rdata)

@tacaswell tacaswell removed the status: needs clarification Issues that need more information to resolve. label Sep 24, 2019
@tacaswell tacaswell added this to the v3.3.0 milestone Sep 24, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants