Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit b8db6d2

Browse files
committed
Refactored axes.violinplot into cbook.violin_stats (arranges violin plot data for drawing), axes.violin (draws pre-arranged violin plot data), and axes.violinplot (uses cbook.violin_stats to draw violin plots via axes.violin)
Updated whats_new.rst. Updated CHANGELOG.
1 parent ad0bcd4 commit b8db6d2

File tree

4 files changed

+230
-55
lines changed

4 files changed

+230
-55
lines changed

CHANGELOG

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,10 @@
3030
interpolation = 'none' and interpolation = 'nearest' in
3131
`imshow()` when saving vector graphics files.
3232

33+
2014-04-22 Added violin plotting functions. See `Axes.violinplot`,
34+
`Axes.violin`, `cbook.violin_stats` and `mlab.GaussianKDE` for
35+
details.
36+
3337
2014-04-10 Fixed the triangular marker rendering error. The "Up" triangle was
3438
rendered instead of "Right" triangle and vice-versa.
3539

doc/users/whats_new.rst

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -172,6 +172,27 @@ Added the Axes method :meth:`~matplotlib.axes.Axes.add_image` to put image
172172
handling on a par with artists, collections, containers, lines, patches,
173173
and tables.
174174

175+
Violin Plots
176+
````````````
177+
Per Parker, Gregory Kelsie, Adam Ortiz, Kevin Chan, Geoffrey Lee, Deokjae
178+
Donald Seo, and Taesu Terry Lim added a basic implementation for violin
179+
plots. Violin plots can be used to represent the distribution of sample data.
180+
They are similar to box plots, but use a kernel density estimation function to
181+
present a smooth approximation of the data sample used. The added features are:
182+
183+
:func:`~matplotlib.Axes.violin` - Renders a violin plot from a collection of
184+
statistics.
185+
:func:`~matplotlib.cbook.violin_stats` - Produces a collection of statistics
186+
suitable for rendering a violin plot.
187+
:func:`~matplotlib.pyplot.violinplot` - Creates a violin plot from a set of
188+
sample data. This method makes use of :func:`~matplotlib.cbook.violin_stats`
189+
to process the input data, and :func:`~matplotlib.cbook.violin_stats` to
190+
do the actual rendering. Users are also free to modify or replace the output of
191+
:func:`~matplotlib.cbook.violin_stats` in order to customize the violin plots
192+
to their liking.
193+
194+
This feature was implemented for a software engineering course at the
195+
University of Toronto, Scarborough, run in Winter 2014 by Anya Tafliovich.
175196

176197
More `markevery` options to show only a subset of markers
177198
`````````````````````````````````````````````````````````

lib/matplotlib/axes/_axes.py

Lines changed: 104 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -6739,8 +6739,8 @@ def violinplot(self, dataset, positions=None, vert=True, widths=0.5,
67396739
67406740
Make a violin plot for each column of *dataset* or each vector in
67416741
sequence *dataset*. Each filled area extends to represent the
6742-
entire data range, with three lines at the mean, the minimum, and
6743-
the maximum.
6742+
entire data range, with optional lines at the mean, the median,
6743+
the minimum, and the maximum.
67446744
67456745
Parameters
67466746
----------
@@ -6778,7 +6778,7 @@ def violinplot(self, dataset, positions=None, vert=True, widths=0.5,
67786778
The method used to calculate the estimator bandwidth. This can be
67796779
'scott', 'silverman', a scalar constant or a callable. If a
67806780
scalar, this will be used directly as `kde.factor`. If a
6781-
callable, it should take a `GaussianKDE` instance as only
6781+
callable, it should take a `GaussianKDE` instance as its only
67826782
parameter and return a scalar. If None (default), 'scott' is used.
67836783
67846784
Returns
@@ -6806,6 +6806,91 @@ def violinplot(self, dataset, positions=None, vert=True, widths=0.5,
68066806
68076807
"""
68086808

6809+
def _kde_method(X, coords):
6810+
kde = mlab.GaussianKDE(X, bw_method)
6811+
return kde.evaluate(coords)
6812+
6813+
vpstats = cbook.violin_stats(dataset, _kde_method, points=points)
6814+
return self.violin(vpstats, positions=positions, vert=vert,
6815+
widths=widths, showmeans=showmeans,
6816+
showextrema=showextrema, showmedians=showmedians)
6817+
6818+
def violin(self, vpstats, positions=None, vert=True, widths=0.5,
6819+
showmeans=False, showextrema=True, showmedians=False):
6820+
"""
6821+
Drawing function for violin plots.
6822+
6823+
Call signature::
6824+
6825+
violin(vpstats, positions=None, vert=True, widths=0.5,
6826+
showmeans=False, showextrema=True, showmedians=False):
6827+
6828+
Draw a violin plot for each column of `vpstats`. Each filled area
6829+
extends to represent the entire data range, with optional lines at the
6830+
mean, the median, the minimum, and the maximum.
6831+
6832+
Parameters
6833+
----------
6834+
6835+
vpstats : list of dicts
6836+
A list of dictionaries containing stats for each violin plot.
6837+
Required keys are:
6838+
- coords: A list of scalars containing the coordinates that
6839+
the violin's kernel density estimate were evaluated at.
6840+
- vals: A list of scalars containing the values of the kernel
6841+
density estimate at each of the coordinates given in `coords`.
6842+
- mean: The mean value for this violin's dataset.
6843+
- median: The median value for this violin's dataset.
6844+
- min: The minimum value for this violin's dataset.
6845+
- max: The maximum value for this violin's dataset.
6846+
6847+
positions : array-like, default = [1, 2, ..., n]
6848+
Sets the positions of the violins. The ticks and limits are
6849+
automatically set to match the positions.
6850+
6851+
vert : bool, default = True.
6852+
If true, plots the violins veritcally.
6853+
Otherwise, plots the violins horizontally.
6854+
6855+
widths : array-like, default = 0.5
6856+
Either a scalar or a vector that sets the maximal width of
6857+
each violin. The default is 0.5, which uses about half of the
6858+
available horizontal space.
6859+
6860+
showmeans : bool, default = False
6861+
If true, will toggle rendering of the means.
6862+
6863+
showextrema : bool, default = True
6864+
If true, will toggle rendering of the extrema.
6865+
6866+
showmedians : bool, default = False
6867+
If true, will toggle rendering of the medians.
6868+
6869+
Returns
6870+
-------
6871+
6872+
A dictionary mapping each component of the violinplot to a list of the
6873+
corresponding collection instances created. The dictionary has
6874+
the following keys:
6875+
6876+
- bodies: A list of the
6877+
:class:`matplotlib.collections.PolyCollection` instances
6878+
containing the filled area of each violin.
6879+
- means: A :class:`matplotlib.collections.LineCollection` instance
6880+
created to identify the mean values of each of the violin's
6881+
distribution.
6882+
- mins: A :class:`matplotlib.collections.LineCollection` instance
6883+
created to identify the bottom of each violin's distribution.
6884+
- maxes: A :class:`matplotlib.collections.LineCollection` instance
6885+
created to identify the top of each violin's distribution.
6886+
- bars: A :class:`matplotlib.collections.LineCollection` instance
6887+
created to identify the centers of each violin's distribution.
6888+
- medians: A :class:`matplotlib.collections.LineCollection`
6889+
instance created to identify the median values of each of the
6890+
violin's distribution.
6891+
6892+
"""
6893+
68096894
# Statistical quantities to be plotted on the violins
68106895
means = []
68116896
mins = []
@@ -6822,22 +6907,23 @@ def violinplot(self, dataset, positions=None, vert=True, widths=0.5,
68226907
'cmedians': None
68236908
}
68246909

6910+
N = len(vpstats)
68256911
datashape_message = ("List of violinplot statistics and `{0}` "
68266912
"values must have the same length")
68276913

68286914
# Validate positions
68296915
if positions is None:
6830-
positions = range(1, len(dataset) + 1)
6831-
elif len(positions) != len(dataset):
6916+
positions = range(1, N + 1)
6917+
elif len(positions) != N:
68326918
raise ValueError(datashape_message.format("positions"))
68336919

68346920
# Validate widths
68356921
if np.isscalar(widths):
6836-
widths = [widths] * len(dataset)
6837-
elif len(widths) != len(dataset):
6922+
widths = [widths] * N
6923+
elif len(widths) != N:
68386924
raise ValueError(datashape_message.format("widths"))
68396925

6840-
# Calculate mins and maxes for statistics lines
6926+
# Calculate ranges for statistics lines
68416927
pmins = -0.25 * np.array(widths) + positions
68426928
pmaxes = 0.25 * np.array(widths) + positions
68436929

@@ -6857,33 +6943,20 @@ def violinplot(self, dataset, positions=None, vert=True, widths=0.5,
68576943
blines = self.hlines
68586944

68596945
# Render violins
6860-
for data, pos, width in zip(dataset, positions, widths):
6861-
# Calculate the kernel density
6862-
kde = mlab.GaussianKDE(data, bw_method)
6863-
min_val = kde.dataset.min()
6864-
max_val = kde.dataset.max()
6865-
mean = np.mean(kde.dataset)
6866-
median = np.median(kde.dataset)
6867-
coords = np.linspace(min_val, max_val, points)
6868-
6869-
vals = kde.evaluate(coords)
6870-
6871-
# Since each data point p is plotted from v-p to v+p,
6872-
# we need to scale it by an additional 0.5 factor so that we get
6873-
# correct width in the end.
6874-
vals = 0.5 * width * vals/vals.max()
6875-
6876-
# create the violin bodies
6877-
artists['bodies'] += [fill(coords,
6946+
for stats, pos, width in zip(vpstats, positions, widths):
6947+
# The 0.5 factor reflects the fact that we plot from v-p to
6948+
# v+p
6949+
vals = np.array(stats['vals'])
6950+
vals = 0.5 * width * vals / vals.max()
6951+
artists['bodies'] += [fill(stats['coords'],
68786952
-vals + pos,
68796953
vals + pos,
68806954
facecolor='y',
68816955
alpha=0.3)]
6882-
6883-
means.append(mean)
6884-
mins.append(min_val)
6885-
maxes.append(max_val)
6886-
medians.append(median)
6956+
means.append(stats['mean'])
6957+
mins.append(stats['min'])
6958+
maxes.append(stats['max'])
6959+
medians.append(stats['median'])
68876960

68886961
# Render means
68896962
if showmeans:

lib/matplotlib/cbook.py

Lines changed: 101 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -1944,28 +1944,7 @@ def _compute_conf_interval(data, med, iqr, bootstrap):
19441944
bxpstats = []
19451945

19461946
# convert X to a list of lists
1947-
if hasattr(X, 'shape'):
1948-
# one item
1949-
if len(X.shape) == 1:
1950-
if hasattr(X[0], 'shape'):
1951-
X = list(X)
1952-
else:
1953-
X = [X, ]
1954-
1955-
# several items
1956-
elif len(X.shape) == 2:
1957-
nrows, ncols = X.shape
1958-
if nrows == 1:
1959-
X = [X]
1960-
elif ncols == 1:
1961-
X = [X.ravel()]
1962-
else:
1963-
X = [X[:, i] for i in xrange(ncols)]
1964-
else:
1965-
raise ValueError("input `X` must have 2 or fewer dimensions")
1966-
1967-
if not hasattr(X[0], '__len__'):
1968-
X = [X]
1947+
X = _reshape_2D(X)
19691948

19701949
ncols = len(X)
19711950
if labels is None:
@@ -1982,7 +1961,7 @@ def _compute_conf_interval(data, med, iqr, bootstrap):
19821961
stats['mean'] = np.mean(x)
19831962

19841963
# medians and quartiles
1985-
q1, med, q3 = np.percentile(x, [25, 50, 75])
1964+
q1, med, q3 = np.percentile(x, [25, 50, 75])
19861965

19871966
# interquartile range
19881967
stats['iqr'] = q3 - q1
@@ -2004,7 +1983,7 @@ def _compute_conf_interval(data, med, iqr, bootstrap):
20041983
hival = np.max(x)
20051984
else:
20061985
whismsg = ('whis must be a float, valid string, or '
2007-
'list of percentiles')
1986+
'list of percentiles')
20081987
raise ValueError(whismsg)
20091988
else:
20101989
loval = np.percentile(x, whis[0])
@@ -2157,6 +2136,104 @@ def is_math_text(s):
21572136
return even_dollars
21582137

21592138

2139+
def _reshape_2D(X):
2140+
if hasattr(X, 'shape'):
2141+
# one item
2142+
if len(X.shape) == 1:
2143+
if hasattr(X[0], 'shape'):
2144+
X = list(X)
2145+
else:
2146+
X = [X, ]
2147+
2148+
# several items
2149+
elif len(X.shape) == 2:
2150+
nrows, ncols = X.shape
2151+
if nrows == 1:
2152+
X = [X]
2153+
elif ncols == 1:
2154+
X = [X.ravel()]
2155+
else:
2156+
X = [X[:, i] for i in xrange(ncols)]
2157+
else:
2158+
raise ValueError("input `X` must have 2 or fewer dimensions")
2159+
2160+
if not hasattr(X[0], '__len__'):
2161+
X = [X]
2162+
2163+
return X
2164+
2165+
2166+
def violin_stats(X, method, points=100):
2167+
'''
2168+
Returns a list of dictionaries of data which can be used to draw a series
2169+
of violin plots. See the `Returns` section below to view the required keys
2170+
of the dictionary. Users can skip this function and pass a user-defined set
2171+
of dictionaries to the `axes.vplot` method instead of using MPL to do the
2172+
calculations.
2173+
2174+
Parameters
2175+
----------
2176+
X : array-like
2177+
Sample data that will be used to produce the gaussian kernel density
2178+
estimates. Must have 2 or fewer dimensions.
2179+
2180+
method : callable
2181+
The method used to calculate the kernel density estimate for each
2182+
column of data. When called via `method(v, coords)`, it should
2183+
return a vector of the values of the KDE evaluated at the values
2184+
specified in coords.
2185+
2186+
points : scalar, default = 100
2187+
Defines the number of points to evaluate each of the gaussian kernel
2188+
density estimates at.
2189+
2190+
Returns
2191+
-------
2192+
2193+
A list of dictionaries containing the results for each column of data.
2194+
The dictionaries contain at least the following:
2195+
2196+
- coords: A list of scalars containing the coordinates this particular
2197+
kernel density estimate was evaluated at.
2198+
- vals: A list of scalars containing the values of the kernel density
2199+
estimate at each of the coordinates given in `coords`.
2200+
- mean: The mean value for this column of data.
2201+
- median: The median value for this column of data.
2202+
- min: The minimum value for this column of data.
2203+
- max: The maximum value for this column of data.
2204+
'''
2205+
2206+
# List of dictionaries describing each of the violins.
2207+
vpstats = []
2208+
2209+
# Want X to be a list of data sequences
2210+
X = _reshape_2D(X)
2211+
2212+
for x in X:
2213+
# Dictionary of results for this distribution
2214+
stats = {}
2215+
2216+
# Calculate basic stats for the distribution
2217+
min_val = np.min(x)
2218+
max_val = np.max(x)
2219+
2220+
# Evaluate the kernel density estimate
2221+
coords = np.linspace(min_val, max_val, points)
2222+
stats['vals'] = method(x, coords)
2223+
stats['coords'] = coords
2224+
2225+
# Store additional statistics for this distribution
2226+
stats['mean'] = np.mean(x)
2227+
stats['median'] = np.median(x)
2228+
stats['min'] = min_val
2229+
stats['max'] = max_val
2230+
2231+
# Append to output
2232+
vpstats.append(stats)
2233+
2234+
return vpstats
2235+
2236+
21602237
class _NestedClassGetter(object):
21612238
# recipe from http://stackoverflow.com/a/11493777/741316
21622239
"""

0 commit comments

Comments
 (0)