|
| 1 | +============================================= |
| 2 | + MEP 28: Remove Complexity from Axes.boxplot |
| 3 | +============================================= |
| 4 | + |
| 5 | +.. contents:: |
| 6 | + :local: |
| 7 | + |
| 8 | +Status |
| 9 | +====== |
| 10 | + |
| 11 | +.. |
| 12 | +.. MEPs go through a number of phases in their lifetime: |
| 13 | +
|
| 14 | + - **Discussion** |
| 15 | +.. |
| 16 | +.. - **Progress**: Consensus was reached on the mailing list and |
| 17 | +.. implementation work has begun. |
| 18 | +.. |
| 19 | +.. - **Completed**: The implementation has been merged into master. |
| 20 | +.. |
| 21 | +.. - **Superseded**: This MEP has been abandoned in favor of another |
| 22 | +.. approach. |
| 23 | +
|
| 24 | +Branches and Pull requests |
| 25 | +========================== |
| 26 | + |
| 27 | +Adding pre- & post-processing options to ``cbook.boxplot_stats``: https://github.com/phobson/matplotlib/tree/boxplot-stat-transforms |
| 28 | +Exposing ``cbook.boxplot_stats`` through ``Axes.boxplot`` kwargs: None |
| 29 | +Remove redundant statistical kwargs in ``Axes.boxplot``: None |
| 30 | +Remove redundant style options in ``Axes.boxplot``: None |
| 31 | +Remaining items that arise through discussion: None |
| 32 | + |
| 33 | +Abstract |
| 34 | +======== |
| 35 | + |
| 36 | +Over the past few releases, the ``Axes.boxplot`` method has grown in |
| 37 | +complexity to support fully customizable artist styling and statistical |
| 38 | +computation. This lead to ``Axes.boxplot`` being split off into multiple |
| 39 | +parts. The statistics needed to draw a boxplot are computed in |
| 40 | +``cbook.boxplot_stats``, while the actual artists are drawn by ``Axes.bxp``. |
| 41 | +The original method, ``Axes.boxplot`` remains as the most public API that |
| 42 | +handles passing the user-supplied data to ``cbook.boxplot_stats``, feeding |
| 43 | +the results to ``Axes.bxp``, and pre-processing style information for |
| 44 | +each facet of the boxplot plots. |
| 45 | + |
| 46 | +This MEP will outline a path forward to rollback the added complexity |
| 47 | +and simplify the API while maintaining reasonable backwards |
| 48 | +compatibility. |
| 49 | + |
| 50 | +Detailed description |
| 51 | +==================== |
| 52 | + |
| 53 | +Currently, the ``Axes.boxplot`` method accepts parameters that allow the |
| 54 | +users to specify medians and confidence intervals for each box that |
| 55 | +will be drawn in the plot. These were provided so that avdanced users |
| 56 | +could provide statistics computed in a different fashion that the simple |
| 57 | +method provided by matplotlib. However, handling this input requires |
| 58 | +complex logic to make sure that the forms of the data structure match what |
| 59 | +needs to be drawn. At the moment, that logic contains 9 separate if/else |
| 60 | +statements nested up to 5 levels deep with a for loop, and may raise up to 2 errors. |
| 61 | +These parameters were added prior to the creation of the ``Axes.bxp`` method, |
| 62 | +which draws boxplots from a list of dictionaries containing the relevant |
| 63 | +statistics. Matplotlib also provides a function that computes these |
| 64 | +statistics via ``cbook.boxplot_stats``. Note that advanced users can now |
| 65 | +either a) write their own function to compute the stats required by |
| 66 | +``Axes.bxp``, or b) modify the output returned by ``cbook.boxplots_stats`` |
| 67 | +to fully customize the position of the artists of the plots. With this |
| 68 | +flexibility, the parameters to manually specify only the medians and their |
| 69 | +confidences intervals remain for backwards compatibility. |
| 70 | + |
| 71 | +Around the same time that the two roles of ``Axes.boxplot`` were split into |
| 72 | +``cbook.boxplot_stats`` for computation and ``Axes.bxp`` for drawing, both |
| 73 | +``Axes.boxplot`` and ``Axes.bxp`` were written to accept parameters that |
| 74 | +individually toggle the drawing of all components of the boxplots, and |
| 75 | +parameters that individually configure the style of those artists. However, |
| 76 | +to maintain backwards compatibility, the ``sym`` parameter (previously used |
| 77 | +to specify the symbol of the fliers) was retained. This parameter itself |
| 78 | +requires fairly complex logic to reconcile the ``sym`` parameters with the |
| 79 | +newer ``flierprops`` parameter at the default style specified by ``matplotlibrc``. |
| 80 | + |
| 81 | +This MEP seeks to dramatically simplify the creation of boxplots for |
| 82 | +novice and advanced users alike. Importantly, the changes proposed here |
| 83 | +will also be available to downstream packages like seaborn, as seaborn |
| 84 | +smartly allows users to pass arbitrary dictionaries of parameters through |
| 85 | +the seaborn API to the underlying matplotlib functions. |
| 86 | + |
| 87 | +This will be achieved in the following way: |
| 88 | + |
| 89 | + 1. ``cbook.boxplot_stats`` will be modified to allow pre- and post- |
| 90 | + computation transformation functions to be passed in (e.g., ``np.log`` |
| 91 | + and ``np.exp`` for lognormally distributed data) |
| 92 | + 2. ``Axes.boxplot`` will be modified to also accept and naïvely pass them |
| 93 | + to ``cbook.boxplots_stats`` (Alt: pass the stat function and a dict |
| 94 | + of its optional parameters). |
| 95 | + 3. Outdated parameters from ``Axes.boxplot`` will be deprecated and |
| 96 | + later removed. |
| 97 | + |
| 98 | +Implementation |
| 99 | +============== |
| 100 | + |
| 101 | +Passing transform functions to ``cbook.boxplots_stats`` |
| 102 | +------------------------------------------------------- |
| 103 | + |
| 104 | +This MEP proposes that two parameters (e.g., ``transform_in`` and |
| 105 | +``transform_out`` be added to the cookbook function that computes the |
| 106 | +statistics for the boxplot function. These will be optional keyword-only |
| 107 | +arguments and can easily be set to ``lambda x: x`` as a no-op when omitted |
| 108 | +by the user. The ``transform_in`` function will be applied to the data |
| 109 | +as the ``boxplot_stats`` function loops through each subset of the data |
| 110 | +passed to it. After the list of statistics dictionaries are computed the |
| 111 | +``transform_out`` function is applied to each value in the dictionaries. |
| 112 | + |
| 113 | +These transformations can then be added to the call signature of |
| 114 | +``Axes.boxplot`` with little impact to that method's complexity. This is |
| 115 | +because they can be directly passed to ``cbook.boxplot_stats``. |
| 116 | +Alternatively, ``Axes.boxplot`` could be modified to accept an optional |
| 117 | +statistical function kwarg and a dictionary of parameters to be direcly |
| 118 | +passed to it. |
| 119 | + |
| 120 | +At this point in the implementation users and external libraries like |
| 121 | +seaborn would have complete control via the ``Axes.boxplot`` method. More |
| 122 | +importantly, at the very least, seaborn would require no changes to its |
| 123 | +API to allow users to take advantage of these new options. |
| 124 | + |
| 125 | +Simplifications to the ``Axes.boxplot`` API and other functions |
| 126 | +--------------------------------------------------------------- |
| 127 | + |
| 128 | +Simplifying the boxplot method consists primarily of deprecating and then |
| 129 | +removing the redundant parameters. Optionally, a next step would include |
| 130 | +rectifying minor terminological inconsistencies between ``Axes.boxplot`` |
| 131 | +and ``Axes.bxp``. |
| 132 | + |
| 133 | +The parameters to be deprecated and removed include: |
| 134 | + |
| 135 | + 1. ``usermedians`` - processed by 10 SLOC, 3 ``if`` blocks, a ``for`` loop |
| 136 | + 2. ``conf_intervals`` - handled by 15 SLOC, 6 ``if`` blocks, a ``for`` loop |
| 137 | + 3. ``sym`` - processed by 12 SLOC, 4 ``if`` blocks |
| 138 | + |
| 139 | +Removing the ``sym`` option allows all code in handling the remaining |
| 140 | +styling parameters to be moved to ``Axes.bxp``. This doesn't remove |
| 141 | +any complexity, but does reinforce the single responsibility principle |
| 142 | +among ``Axes.bxp``, ``cbook.boxplot_stats``, and ``Axes.boxplot``. |
| 143 | + |
| 144 | +Additionally, the ``notch`` parameter could be renamed ``shownotches`` |
| 145 | +to be consistent with ``Axes.bxp``. This kind of cleanup could be taken |
| 146 | +a step further and the ``whis``, ``bootstrap``, ``autorange`` could |
| 147 | +be rolled into the kwargs passed to the new ``statfxn`` parameter. |
| 148 | + |
| 149 | +Backward compatibility |
| 150 | +====================== |
| 151 | + |
| 152 | +Implementation of this MEP would eventually result in the backwards |
| 153 | +incompatible deprecation and then removal of the keyword parameters |
| 154 | +``usermedians``, ``conf_intervals``, and ``sym``. Cursory searches on |
| 155 | +GitHub indicated that ``usermedians``, ``conf_intervals`` are used by |
| 156 | +few users, who all seem to have a very strong knowledge of matplotlib. |
| 157 | +A robust deprecation cycle should provide sufficient time for these |
| 158 | +users to migrate to a new API. |
| 159 | + |
| 160 | +Deprecation of ``sym`` however, may have a much broader reach into |
| 161 | +the matplotlib userbase. |
| 162 | + |
| 163 | +Schedule |
| 164 | +-------- |
| 165 | +An accelerated timeline could look like the following: |
| 166 | + |
| 167 | +#. v2.0.1 add transforms to ``cbook.boxplots_stats``, expose in ``Axes.boxplot`` |
| 168 | +#. v2.1.0 deprecate ``usermedians``, ``conf_intervals``, ``sym`` parameters |
| 169 | +#. v2.2.0 make deprecations noisier |
| 170 | +#. v2.3.0 remove ``usermedians``, ``conf_intervals``, ``sym`` parameters |
| 171 | +#. v2.3.0 deprecate ``notch`` in favor of ``shownotches`` to be consistent with other parameters and ``Axes.bxp`` |
| 172 | +#. v2.4.0 remove ``notch`` parameter, move all style and artist toggling logic to ``Axes.bxp``. ``Axes.boxplot`` is little more than a broker between ``Axes.bxp`` and ``cbook.boxplots_stats`` |
| 173 | + |
| 174 | + |
| 175 | +Anticipated Impacts to Users |
| 176 | +---------------------------- |
| 177 | + |
| 178 | +As described above deprecating ``usermedians`` and ``conf_intervals`` |
| 179 | +will likely impact few users. Those who will be impacted are almost |
| 180 | +certainly advanced users who will be able to adapt to the change. |
| 181 | + |
| 182 | +Deprecating the ``sym`` option may import more users and effort should |
| 183 | +be taken to collect community feedback on this. |
| 184 | + |
| 185 | +Anticipated Impacts to Downstream Libraries |
| 186 | +------------------------------------------- |
| 187 | + |
| 188 | +The source code (GitHub master as of 2016-10-17) was inspected for |
| 189 | +seaborn and python-ggplot to see if these changes would impact their |
| 190 | +use. None of the parameters nominated for removal in this MEP are used by |
| 191 | +seaborn. The seaborn APIs that use matplotlib's boxplot function allow |
| 192 | +user's to pass arbitrary ``**kwargs`` through to matplotlib's API. Thus |
| 193 | +seaborn users with modern matplotlib installations will be able to take |
| 194 | +full advantage of any new features added as a result of this MEP. |
| 195 | + |
| 196 | +Python-ggplot has implemented its own function to draw boxplots. Therefore, |
| 197 | +no impact can come to it as a result of implementing this MEP. |
| 198 | + |
| 199 | +Alternatives |
| 200 | +============ |
| 201 | + |
| 202 | +Variations on the theme |
| 203 | +----------------------- |
| 204 | + |
| 205 | +This MEP can be divided into a few loosely coupled components: |
| 206 | + |
| 207 | +#. Allowing pre- and post-computation tranformation function in ``cbook.boxplot_stats`` |
| 208 | +#. Exposing that transformation in the ``Axes.boxplot`` API |
| 209 | +#. Removing redundant statistical options in ``Axes.boxplot`` |
| 210 | +#. Shifting all styling parameter processing from ``Axes.boxplot`` to ``Axes.bxp``. |
| 211 | + |
| 212 | + |
| 213 | +With this approach, #2 depends and #1, and #4 depends on #3. |
| 214 | + |
| 215 | +There are two possible approaches to #2. The first and most direct would |
| 216 | +be to mirror the new ``transform_in`` and ``tranform_out`` parameters of |
| 217 | +``cbook.boxplot_stats`` in ``Axes.boxplot`` and pass them directly. |
| 218 | + |
| 219 | +The second approach would be to add ``statfxn`` and ``statfxn_args`` |
| 220 | +parameters to ``Axes.boxplot``. Under this implementation, the default |
| 221 | +value of ``statfxn`` would be ``cbook.boxplot_stats``, but users could |
| 222 | +pass their own function. Then ``transform_in`` and ``tranform_out`` would |
| 223 | +then be passed as elements of the ``statfxn_args`` parameter. |
| 224 | + |
| 225 | +.. python: |
| 226 | + def boxplot_stats(data, ..., transform_in=None, transform_out=None): |
| 227 | + if transform_in is None: |
| 228 | + transform_in = lambda x: x |
| 229 | +
|
| 230 | + if transform_out is None: |
| 231 | + transform_out = lambda x: x |
| 232 | +
|
| 233 | + output = [] |
| 234 | + for _d in data: |
| 235 | + d = transform_in(_d) |
| 236 | + stat_dict = do_stats(d) |
| 237 | + for key, value in stat_dict.item(): |
| 238 | + if key != 'label': |
| 239 | + stat_dict[key] = transform_out(value) |
| 240 | + output.append(d) |
| 241 | + return output |
| 242 | +
|
| 243 | +
|
| 244 | + class Axes(...): |
| 245 | + def boxplot_option1(data, ..., transform_in=None, transform_out=None): |
| 246 | + stats = cbook.boxplot_stats(data, ..., |
| 247 | + transform_in=transform_in, |
| 248 | + transform_out=transform_out) |
| 249 | + return self.bxp(stats, ...) |
| 250 | +
|
| 251 | + def boxplot_option2(data, ..., statfxn=None, **statopts): |
| 252 | + if statfxn is None: |
| 253 | + statfxn = boxplot_stats |
| 254 | + stats = statfxn(data, **statopts) |
| 255 | + return self.bxp(stats, ...) |
| 256 | +
|
| 257 | +Both cases would allow users to do the following: |
| 258 | + |
| 259 | +.. python: |
| 260 | + fig, ax1 = plt.subplots() |
| 261 | + artists1 = ax1.boxplot_optionX(data, transform_in=np.log, |
| 262 | + transform_out=np.exp) |
| 263 | +
|
| 264 | +
|
| 265 | +But Option Two lets a user write a completely custom stat function |
| 266 | +(e.g., ``my_box_stats``) with fancy BCA confidence intervals and the |
| 267 | +whiskers set differently depending on some attribute of the data. |
| 268 | + |
| 269 | +This is available under the current API: |
| 270 | + |
| 271 | +.. python: |
| 272 | + fig, ax1 = plt.subplots() |
| 273 | + my_stats = my_box_stats(data, bootstrap_method='BCA', |
| 274 | + whisker_method='dynamic') |
| 275 | + ax1.bxp(my_stats) |
| 276 | +
|
| 277 | +And would be more concise with Option Two |
| 278 | + |
| 279 | +.. python: |
| 280 | + fig, ax = plt.subplots() |
| 281 | + statopts = dict(transform_in=np.log, transform_out=np.exp) |
| 282 | + ax.boxplot(data, ..., **statopts) |
| 283 | +
|
| 284 | +Users could also pass their own function to compute the stats: |
| 285 | + |
| 286 | +.. python: |
| 287 | + fig, ax1 = plt.subplots() |
| 288 | + ax1.boxplot(data, statfxn=my_box_stats, bootstrap_method='BCA', |
| 289 | + whisker_method='dynamic') |
| 290 | +
|
| 291 | +From the examples above, Option Two seems to have only marginal benifit, |
| 292 | +but in the context of downstream libraries like seaborn, its advantage |
| 293 | +is more apparent as the following would be possible without any patches |
| 294 | +to seaborn: |
| 295 | + |
| 296 | +.. python: |
| 297 | + import seaborn |
| 298 | + tips = seaborn.load_data('tips') |
| 299 | + g = seaborn.factorplot(x="day", y="total_bill", hue="sex", data=tips, |
| 300 | + kind='box', palette="PRGn", shownotches=True, |
| 301 | + statfxn=my_box_stats, bootstrap_method='BCA', |
| 302 | + whisker_method='dynamic') |
| 303 | +
|
| 304 | +This type of flexibility was the intention behind splitting the overall |
| 305 | +boxplot API in the current three functions. In practice however, downstream |
| 306 | +libraries like seaborn support versions of matplotlib dating back well |
| 307 | +before the split. Thus, adding just a bit more flexibility to the |
| 308 | +``Axes.boxplot`` could expose all the functionality to users of the |
| 309 | +downstream libraries with modern matplotlib installation without intervention |
| 310 | +from the downstream library maintainers. |
| 311 | + |
| 312 | +Doing less |
| 313 | +---------- |
| 314 | + |
| 315 | +Another obvious alternative would be to omit the added pre- and post- |
| 316 | +computation transform functionality in ``cbook.boxplot_stats`` and |
| 317 | +``Axes.boxplot``, and simply remove the redundant statistical and style |
| 318 | +parameters as described above. |
| 319 | + |
| 320 | +Doing nothing |
| 321 | +------------- |
| 322 | + |
| 323 | +As with many things in life, doing nothing is an option here. This means |
| 324 | +we simply advocate for users and downstream libraries to take advantage |
| 325 | +of the split between ``cbook.boxplot_stats`` and ``Axes.bxp`` and let |
| 326 | +them decide how to provide an interface to that. |
0 commit comments