Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit accd403

Browse files
authored
Merge pull request #7282 from phobson/MEP-bxp
Draft version of MEP28: Simplification of boxplots
2 parents 6595e4c + 5d2e6c3 commit accd403

File tree

2 files changed

+327
-0
lines changed

2 files changed

+327
-0
lines changed

doc/devel/MEP/MEP28.rst

Lines changed: 326 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,326 @@
1+
=============================================
2+
MEP 28: Remove Complexity from Axes.boxplot
3+
=============================================
4+
5+
.. contents::
6+
:local:
7+
8+
Status
9+
======
10+
11+
..
12+
.. MEPs go through a number of phases in their lifetime:
13+
14+
- **Discussion**
15+
..
16+
.. - **Progress**: Consensus was reached on the mailing list and
17+
.. implementation work has begun.
18+
..
19+
.. - **Completed**: The implementation has been merged into master.
20+
..
21+
.. - **Superseded**: This MEP has been abandoned in favor of another
22+
.. approach.
23+
24+
Branches and Pull requests
25+
==========================
26+
27+
Adding pre- & post-processing options to ``cbook.boxplot_stats``: https://github.com/phobson/matplotlib/tree/boxplot-stat-transforms
28+
Exposing ``cbook.boxplot_stats`` through ``Axes.boxplot`` kwargs: None
29+
Remove redundant statistical kwargs in ``Axes.boxplot``: None
30+
Remove redundant style options in ``Axes.boxplot``: None
31+
Remaining items that arise through discussion: None
32+
33+
Abstract
34+
========
35+
36+
Over the past few releases, the ``Axes.boxplot`` method has grown in
37+
complexity to support fully customizable artist styling and statistical
38+
computation. This lead to ``Axes.boxplot`` being split off into multiple
39+
parts. The statistics needed to draw a boxplot are computed in
40+
``cbook.boxplot_stats``, while the actual artists are drawn by ``Axes.bxp``.
41+
The original method, ``Axes.boxplot`` remains as the most public API that
42+
handles passing the user-supplied data to ``cbook.boxplot_stats``, feeding
43+
the results to ``Axes.bxp``, and pre-processing style information for
44+
each facet of the boxplot plots.
45+
46+
This MEP will outline a path forward to rollback the added complexity
47+
and simplify the API while maintaining reasonable backwards
48+
compatibility.
49+
50+
Detailed description
51+
====================
52+
53+
Currently, the ``Axes.boxplot`` method accepts parameters that allow the
54+
users to specify medians and confidence intervals for each box that
55+
will be drawn in the plot. These were provided so that avdanced users
56+
could provide statistics computed in a different fashion that the simple
57+
method provided by matplotlib. However, handling this input requires
58+
complex logic to make sure that the forms of the data structure match what
59+
needs to be drawn. At the moment, that logic contains 9 separate if/else
60+
statements nested up to 5 levels deep with a for loop, and may raise up to 2 errors.
61+
These parameters were added prior to the creation of the ``Axes.bxp`` method,
62+
which draws boxplots from a list of dictionaries containing the relevant
63+
statistics. Matplotlib also provides a function that computes these
64+
statistics via ``cbook.boxplot_stats``. Note that advanced users can now
65+
either a) write their own function to compute the stats required by
66+
``Axes.bxp``, or b) modify the output returned by ``cbook.boxplots_stats``
67+
to fully customize the position of the artists of the plots. With this
68+
flexibility, the parameters to manually specify only the medians and their
69+
confidences intervals remain for backwards compatibility.
70+
71+
Around the same time that the two roles of ``Axes.boxplot`` were split into
72+
``cbook.boxplot_stats`` for computation and ``Axes.bxp`` for drawing, both
73+
``Axes.boxplot`` and ``Axes.bxp`` were written to accept parameters that
74+
individually toggle the drawing of all components of the boxplots, and
75+
parameters that individually configure the style of those artists. However,
76+
to maintain backwards compatibility, the ``sym`` parameter (previously used
77+
to specify the symbol of the fliers) was retained. This parameter itself
78+
requires fairly complex logic to reconcile the ``sym`` parameters with the
79+
newer ``flierprops`` parameter at the default style specified by ``matplotlibrc``.
80+
81+
This MEP seeks to dramatically simplify the creation of boxplots for
82+
novice and advanced users alike. Importantly, the changes proposed here
83+
will also be available to downstream packages like seaborn, as seaborn
84+
smartly allows users to pass arbitrary dictionaries of parameters through
85+
the seaborn API to the underlying matplotlib functions.
86+
87+
This will be achieved in the following way:
88+
89+
1. ``cbook.boxplot_stats`` will be modified to allow pre- and post-
90+
computation transformation functions to be passed in (e.g., ``np.log``
91+
and ``np.exp`` for lognormally distributed data)
92+
2. ``Axes.boxplot`` will be modified to also accept and naïvely pass them
93+
to ``cbook.boxplots_stats`` (Alt: pass the stat function and a dict
94+
of its optional parameters).
95+
3. Outdated parameters from ``Axes.boxplot`` will be deprecated and
96+
later removed.
97+
98+
Implementation
99+
==============
100+
101+
Passing transform functions to ``cbook.boxplots_stats``
102+
-------------------------------------------------------
103+
104+
This MEP proposes that two parameters (e.g., ``transform_in`` and
105+
``transform_out`` be added to the cookbook function that computes the
106+
statistics for the boxplot function. These will be optional keyword-only
107+
arguments and can easily be set to ``lambda x: x`` as a no-op when omitted
108+
by the user. The ``transform_in`` function will be applied to the data
109+
as the ``boxplot_stats`` function loops through each subset of the data
110+
passed to it. After the list of statistics dictionaries are computed the
111+
``transform_out`` function is applied to each value in the dictionaries.
112+
113+
These transformations can then be added to the call signature of
114+
``Axes.boxplot`` with little impact to that method's complexity. This is
115+
because they can be directly passed to ``cbook.boxplot_stats``.
116+
Alternatively, ``Axes.boxplot`` could be modified to accept an optional
117+
statistical function kwarg and a dictionary of parameters to be direcly
118+
passed to it.
119+
120+
At this point in the implementation users and external libraries like
121+
seaborn would have complete control via the ``Axes.boxplot`` method. More
122+
importantly, at the very least, seaborn would require no changes to its
123+
API to allow users to take advantage of these new options.
124+
125+
Simplifications to the ``Axes.boxplot`` API and other functions
126+
---------------------------------------------------------------
127+
128+
Simplifying the boxplot method consists primarily of deprecating and then
129+
removing the redundant parameters. Optionally, a next step would include
130+
rectifying minor terminological inconsistencies between ``Axes.boxplot``
131+
and ``Axes.bxp``.
132+
133+
The parameters to be deprecated and removed include:
134+
135+
1. ``usermedians`` - processed by 10 SLOC, 3 ``if`` blocks, a ``for`` loop
136+
2. ``conf_intervals`` - handled by 15 SLOC, 6 ``if`` blocks, a ``for`` loop
137+
3. ``sym`` - processed by 12 SLOC, 4 ``if`` blocks
138+
139+
Removing the ``sym`` option allows all code in handling the remaining
140+
styling parameters to be moved to ``Axes.bxp``. This doesn't remove
141+
any complexity, but does reinforce the single responsibility principle
142+
among ``Axes.bxp``, ``cbook.boxplot_stats``, and ``Axes.boxplot``.
143+
144+
Additionally, the ``notch`` parameter could be renamed ``shownotches``
145+
to be consistent with ``Axes.bxp``. This kind of cleanup could be taken
146+
a step further and the ``whis``, ``bootstrap``, ``autorange`` could
147+
be rolled into the kwargs passed to the new ``statfxn`` parameter.
148+
149+
Backward compatibility
150+
======================
151+
152+
Implementation of this MEP would eventually result in the backwards
153+
incompatible deprecation and then removal of the keyword parameters
154+
``usermedians``, ``conf_intervals``, and ``sym``. Cursory searches on
155+
GitHub indicated that ``usermedians``, ``conf_intervals`` are used by
156+
few users, who all seem to have a very strong knowledge of matplotlib.
157+
A robust deprecation cycle should provide sufficient time for these
158+
users to migrate to a new API.
159+
160+
Deprecation of ``sym`` however, may have a much broader reach into
161+
the matplotlib userbase.
162+
163+
Schedule
164+
--------
165+
An accelerated timeline could look like the following:
166+
167+
#. v2.0.1 add transforms to ``cbook.boxplots_stats``, expose in ``Axes.boxplot``
168+
#. v2.1.0 deprecate ``usermedians``, ``conf_intervals``, ``sym`` parameters
169+
#. v2.2.0 make deprecations noisier
170+
#. v2.3.0 remove ``usermedians``, ``conf_intervals``, ``sym`` parameters
171+
#. v2.3.0 deprecate ``notch`` in favor of ``shownotches`` to be consistent with other parameters and ``Axes.bxp``
172+
#. v2.4.0 remove ``notch`` parameter, move all style and artist toggling logic to ``Axes.bxp``. ``Axes.boxplot`` is little more than a broker between ``Axes.bxp`` and ``cbook.boxplots_stats``
173+
174+
175+
Anticipated Impacts to Users
176+
----------------------------
177+
178+
As described above deprecating ``usermedians`` and ``conf_intervals``
179+
will likely impact few users. Those who will be impacted are almost
180+
certainly advanced users who will be able to adapt to the change.
181+
182+
Deprecating the ``sym`` option may import more users and effort should
183+
be taken to collect community feedback on this.
184+
185+
Anticipated Impacts to Downstream Libraries
186+
-------------------------------------------
187+
188+
The source code (GitHub master as of 2016-10-17) was inspected for
189+
seaborn and python-ggplot to see if these changes would impact their
190+
use. None of the parameters nominated for removal in this MEP are used by
191+
seaborn. The seaborn APIs that use matplotlib's boxplot function allow
192+
user's to pass arbitrary ``**kwargs`` through to matplotlib's API. Thus
193+
seaborn users with modern matplotlib installations will be able to take
194+
full advantage of any new features added as a result of this MEP.
195+
196+
Python-ggplot has implemented its own function to draw boxplots. Therefore,
197+
no impact can come to it as a result of implementing this MEP.
198+
199+
Alternatives
200+
============
201+
202+
Variations on the theme
203+
-----------------------
204+
205+
This MEP can be divided into a few loosely coupled components:
206+
207+
#. Allowing pre- and post-computation tranformation function in ``cbook.boxplot_stats``
208+
#. Exposing that transformation in the ``Axes.boxplot`` API
209+
#. Removing redundant statistical options in ``Axes.boxplot``
210+
#. Shifting all styling parameter processing from ``Axes.boxplot`` to ``Axes.bxp``.
211+
212+
213+
With this approach, #2 depends and #1, and #4 depends on #3.
214+
215+
There are two possible approaches to #2. The first and most direct would
216+
be to mirror the new ``transform_in`` and ``tranform_out`` parameters of
217+
``cbook.boxplot_stats`` in ``Axes.boxplot`` and pass them directly.
218+
219+
The second approach would be to add ``statfxn`` and ``statfxn_args``
220+
parameters to ``Axes.boxplot``. Under this implementation, the default
221+
value of ``statfxn`` would be ``cbook.boxplot_stats``, but users could
222+
pass their own function. Then ``transform_in`` and ``tranform_out`` would
223+
then be passed as elements of the ``statfxn_args`` parameter.
224+
225+
.. python:
226+
def boxplot_stats(data, ..., transform_in=None, transform_out=None):
227+
if transform_in is None:
228+
transform_in = lambda x: x
229+
230+
if transform_out is None:
231+
transform_out = lambda x: x
232+
233+
output = []
234+
for _d in data:
235+
d = transform_in(_d)
236+
stat_dict = do_stats(d)
237+
for key, value in stat_dict.item():
238+
if key != 'label':
239+
stat_dict[key] = transform_out(value)
240+
output.append(d)
241+
return output
242+
243+
244+
class Axes(...):
245+
def boxplot_option1(data, ..., transform_in=None, transform_out=None):
246+
stats = cbook.boxplot_stats(data, ...,
247+
transform_in=transform_in,
248+
transform_out=transform_out)
249+
return self.bxp(stats, ...)
250+
251+
def boxplot_option2(data, ..., statfxn=None, **statopts):
252+
if statfxn is None:
253+
statfxn = boxplot_stats
254+
stats = statfxn(data, **statopts)
255+
return self.bxp(stats, ...)
256+
257+
Both cases would allow users to do the following:
258+
259+
.. python:
260+
fig, ax1 = plt.subplots()
261+
artists1 = ax1.boxplot_optionX(data, transform_in=np.log,
262+
transform_out=np.exp)
263+
264+
265+
But Option Two lets a user write a completely custom stat function
266+
(e.g., ``my_box_stats``) with fancy BCA confidence intervals and the
267+
whiskers set differently depending on some attribute of the data.
268+
269+
This is available under the current API:
270+
271+
.. python:
272+
fig, ax1 = plt.subplots()
273+
my_stats = my_box_stats(data, bootstrap_method='BCA',
274+
whisker_method='dynamic')
275+
ax1.bxp(my_stats)
276+
277+
And would be more concise with Option Two
278+
279+
.. python:
280+
fig, ax = plt.subplots()
281+
statopts = dict(transform_in=np.log, transform_out=np.exp)
282+
ax.boxplot(data, ..., **statopts)
283+
284+
Users could also pass their own function to compute the stats:
285+
286+
.. python:
287+
fig, ax1 = plt.subplots()
288+
ax1.boxplot(data, statfxn=my_box_stats, bootstrap_method='BCA',
289+
whisker_method='dynamic')
290+
291+
From the examples above, Option Two seems to have only marginal benifit,
292+
but in the context of downstream libraries like seaborn, its advantage
293+
is more apparent as the following would be possible without any patches
294+
to seaborn:
295+
296+
.. python:
297+
import seaborn
298+
tips = seaborn.load_data('tips')
299+
g = seaborn.factorplot(x="day", y="total_bill", hue="sex", data=tips,
300+
kind='box', palette="PRGn", shownotches=True,
301+
statfxn=my_box_stats, bootstrap_method='BCA',
302+
whisker_method='dynamic')
303+
304+
This type of flexibility was the intention behind splitting the overall
305+
boxplot API in the current three functions. In practice however, downstream
306+
libraries like seaborn support versions of matplotlib dating back well
307+
before the split. Thus, adding just a bit more flexibility to the
308+
``Axes.boxplot`` could expose all the functionality to users of the
309+
downstream libraries with modern matplotlib installation without intervention
310+
from the downstream library maintainers.
311+
312+
Doing less
313+
----------
314+
315+
Another obvious alternative would be to omit the added pre- and post-
316+
computation transform functionality in ``cbook.boxplot_stats`` and
317+
``Axes.boxplot``, and simply remove the redundant statistical and style
318+
parameters as described above.
319+
320+
Doing nothing
321+
-------------
322+
323+
As with many things in life, doing nothing is an option here. This means
324+
we simply advocate for users and downstream libraries to take advantage
325+
of the split between ``cbook.boxplot_stats`` and ``Axes.bxp`` and let
326+
them decide how to provide an interface to that.

doc/devel/MEP/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,3 +29,4 @@ Matplotlib Enhancement Proposals
2929
MEP25
3030
MEP26
3131
MEP27
32+
MEP28

0 commit comments

Comments
 (0)