Thanks to visit codestin.com
Credit goes to github.com

Skip to content

RFC: new function-based API #14058

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from
Draft
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
266 changes: 266 additions & 0 deletions doc/devel/MEP/MEP30.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,266 @@
=======================================
MEP30: Revised pyplot / suggested API
=======================================

.. contents::
:local:


Status
======
**Discussion**


Branches and Pull requests
==========================

3 half-done attempts


Abstract
========

Matplotlib currently has two main entry points:

1. ``pyplot`` state machine API
2. the ``OO`` API

and two secondary entry points:

1. user written functions that may create (and may not directly
return) ``Axes`` / ``Figure`` objects
2. user written functions which may take an ``Axes`` or ``Figure``


This leads to a wide variety of not incorrect but conflicting
behaviors which inhibits interaction between libraries and greatly
confuses users.

The ``pyplot`` API is very convenient for quick work in an interactive
terminal working with a small number of single-``Axes`` graphs,
however for complex figures with multiple axes or many open ``Figure``
windows the global 'current axes' quickly becomes unmanageable and
results in plotting commands going to the wrong axes [dig up issue
requesting opting slider axes out of gca consideration].
Additionally, the set of plotting functions in the ``pyplot``
namespace is controlled by Matplotlib upstream and is somewhat
limited. There have been proposals to provide a 'register' hook [dig
up PR from 2 years ago] into the namespace, the only way for
third-party packages to add to the pyplot namespace is monkey
patching. Either way, this can be problematic if two packages try to
use the same name and which one gets it depends on import order. Uses
can write functions that 'feel' like ``pyplot`` functions by using
``pyplot.gca()`` in their code, however this then ties them very
tightly to ``pyplot`` which makes it more difficult to re-use the
functions in embedding applications where the user may not want to
import ``pyplot`` at all.


The ``OO`` API is a bit less concise to work with interactively, it
requires the user to explicitly create an ``Axes`` object to work
with, however it solves the problem of shared global state. The
``Axes`` objects serve as both nodes in the draw tree and as the
primary namespace for plotting functions. It is possible (and
encouraged) for users to write functions that take in ``Axes`` /
``Figure`` objects and internally use the ``OO`` API, however these
feel qualitatively different than the 'native' plotting routines which
are ``Axes`` methods. This can be overcome by ex sub-classing the
``Axes`` or monkey patching methods on to it, however this has the
same problem of clashing third-party packages.

Among third-party functions in the wild (ex ``seaborn``, ``pandas``,
and user code) the return types vary between returning the artist
created during the call, the ``Axes`` objects plotted to, custom
types, and nothing.

Matplotlib is being used in a applications that are well beyond the
original use cases and well beyond the expertise of the current
development team. To that end, we should make sure it is easy to
write plotting functions that feel "native". This can then be used to
support an eco-system of ``mplkit-*`` domain specific libraries. This
libraries will be able to depend on a wider range of libraries (ex
``scipy``, various scikits, and ``pandas``) than core Matplotlib and can
be built around the 'fundamental' data structures of the domain.

To better support the wide range of use cases, in a way that feels
'native', we propose the following changes:

1. move away from ``Axes`` as the primary namespace for plotting
routines
2. provide decorators to easily opt third-party code into the
``pyplot`` state machine without much boiler plate.


Signatures
----------

There are two obvious ways to write a function has a required as input
``Axes`` in a way that can be easily wrapped by a decorator::

def plotting_func(ax, *data_args, **style_kwargs):
...

or ::

def ploting_func(*data_args, ax, **style_kwargs):
...

The first case has the advantage that it works in both python2 and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hopefully py2 won't really be a thing anymore by the time this is implemented :p

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indeed! If you dig into the commits on this, I started writing this in 2016...

python3. Calling many functions explicitly passing *ax* it would look
something like ::

a1 = func1(ax, data1, ...)
a2 = func2(ax, data2, ...)
a3 = func3(ax, data3, ...)

which is only one extra space from the status quo of ``ax.func1(data1,
...)``. However wrapping this in a decorator to provide a default
*ax* requires type checking ::

def ensure_ax(func):
@wraps(func)
def inner(*args, **kwargs):
if not isinstance(args[0], AxesBase):
args = (gca(), ) + args
return func(*args, **kwargs)
return inner

Changing the contents of ``*args`` on the way through seems a bit
awkard and possibly a bit hard to explain. While we have been
advocating this signature in the docs for a few years, it is not the
pattern used by major third-party extensions.

On the other hand wrapping the second option is simpler to decorate ::

def ensure_ax(func):
@wraps(func)
def inner(*args, **kwargs):
if 'ax' not in kwargs:
kwargs['ax'] = gca()
return func(*args, **kwargs)
return inner

but is a bit more verbose when explicitly passing the *ax* argument ::

a1 = func1(data1, ..., ax=ax)
a2 = func2(data2, ..., ax=ax)
a3 = func3(data3, ..., ax=ax)

which is a few more characters and swaps ``.`` or ``,`` for ``=``.
The axes-as-kwarg pattern matches the API that many third-party
libraries (``pandas``, ``sklean``, ``seaborn``, ``skimage``) are
already using.

It is possible to support both at the user level via a decorator ::

def ensure_ax_arg(func):
# modulo signature and docstring hacking
@wraps(func)
def inner(*args, **kwargs):
ax = kwargs.pop('ax', None)

if len(args):
if not isinstance(args[0], AxesBase):
if ax is None:
ax = gca()
args = (ax, ) + args

elif ax is not None:
raise ValueError("passed in 2 axes")
else:
if ax is None:
ax = gca()
args = (ax, )
return func(*args, **kwargs)

return inner

def ensure_ax_kwarg(func):
# modulo signature and docstring hacking
@wraps(func):
def inner(*args, **kwargs):
if len(args) and isinstance(args[0], AxesBase):
ax, *args = args
else:
ax = None
if 'ax' in kwargs and ax is not None:
raise ValueError("passed in two axes")
elif 'ax' not in kwargs:
if ax is None:
ax = gca()
kwargs['ax'] = ax
return func(*args, **kwargs)
return inner

but it is not clear if the complexity is worth it. It would allow the end users to call
plotting functions three ways ::

a1 = func(*data_args, **style_kwargs)
a2 = func(ax, *data_args, **style_kwargs)
a3 = func(*data_args, ax=ax, **style_kwargs)


and allow libraries to internally organize them selves using either of
the above Axes-is-required API. This avoids bike-shedding over the
API and eliminates the first-party 'special' namespace, but is a bit
magical.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I find passing the axes as first argument muuuuuuch, much nicer (well, that may be because I essentially never use the pyplot layer which I understand may not be representative of most users).
Either we could use the "magic" decorator, or alternatively we could just have parallel namespaces plt.somefunc(..., ax=None) (None=gca()) & someothernamespace.somefunc(ax, ...) which would at least have the advantage of keeping reasonable signatures for all functions (with the "magic" decorator, inspect.signature can't represent the signature; which is not nice). Note that one namespace could be autogenerated from the other, e.g. in mod.py

@gen_pyplotlike  # registers to module_level registry
def func(ax, ...): ...

@gen_pyplotlike
def otherfunc(ax, ...): ...

pyplotlike = collect_pyplotlike()

and then one can do import mod; mod.func(ax, ...) or from mod import pyplotlike as mod; mod.func(..., ax=ax).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#4488 I tried something similar and it got rejected (and I sadly never followed up on making it its own package).

Copy link
Member

@timhoffm timhoffm May 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find all three calling patterns valid approaches:

plt.plot([1, 2, 3])
plt.plot(ax, [1, 2, 3])
plt.plot([1, 2, 3], ax=ax)

and I welcome supporting all of them. Each one has it's use:

  • simple interactive use
  • interactive use with multiple axes (less to type than ax=ax)
  • programmatic use, where the data should be the first arguments for better readability.

I'm against creating multiple namespaces just for the sake of different calling patterns. For one, it's conceptually more difficult to tell people: "Use pyplot.plot() if, or use posax.plot(ax, ...) or use kwargs.plot(..., ax=ax)". Also you would have to create multiple variants of the documentation. While that could be automated, you still have the problem which one to link. It's much easier to once state "axes can be automatically determined, or passed as the first positional arguement, or passed as kwarg."

As @tacaswell has demonstrated that can all be resolved with a decorator.

I'm not quite sure if the actual function definition should be

@ensure_ax
def func(ax, *data_args, **style_kwargs)

or

def func(*data_args, ax=ax, **style_kwargs)

I tend towards the latter because it's the syntactically correct signature for two out of the three cases. And it puts more emphasis on the data_args rather than on the axes. Also it has the big advantage, that it could be build into pyplot in a backward-compatible way. That way, we wouldn't need any new namespace.

Note also, that an axes context could be a valuable addition:

with plt.sca(ax):
    plt.plot([1, 2, 3])
    plt.xlabel('The x')
    plt.ylabel('The y')

(maybe using a more telling name than sca()).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if I understand the goal here.
Is the goal to have libraries add plt.my_new_plotting_thing(...) and ax.my_new_plotting_thing(...)?
That's all about entry points, right? Also: what exactly is the benefit of doing that?

Right now, my pattern is having an axes kwarg and if it's None I do plt.gca().
That's basically a single line, which might be slightly longer than adding the ensure_ax decorator in terms of characters but not by much, and seems much easier to understand.

Right now I'm reasonably happy to do some_plotting(ax=ax). Doing ax.some_plotting instead might be nice, but I'm not entirely sure if that is the main goal of this proposal? Doing plt.some_plotting(...) instead of just some_plotting(...) is just more characters, right? I guess it tells you by convention that if it starts with plt it'll modify the current axes? Though that's not even really true: plt.matshow creates a new figure.

Generally I prefer thinking about what code looks like when I use it first instead of thinking about the implementation first. Usually implementing whatever API we settle on is possible so it's more a question of what we want user code to look like.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the main goal is for matplotlib have plotting libraries that do

import matplotlib.basic as mbasic
import matplotlib.2d as m2d

fig, ax = plt.subplots()
mbasic.scatter(x, y, ax=ax)
m2d.pcolormesh(x, y, z, ax=ax)

so the matplotlib library looks more like what user and third party libraries look like.

I think the goal would then be for ax.scatter to just be a wrapper around mbasic.scatter.

But maybe I've completely misunderstood.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jklymak Xo you're saying you want to change the matplotlib api to no longer do ax.scatter and plt.scatter but do scatter(ax=ax).
That is very different from what I understood, but I'm also very confused ;)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think its meant to be a third way. Ahem, I don't particularly want this, but I think the point is to make third-party libraries more plug-and-play with the main library. It also would allow us to have more domain-specific sub libraries without polluting the ax.do_something name space. But maybe it has some deeper advantages as well?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok yeah I follow your interpretation. Let's see if that's what the others meant ;)



Factories
---------

A design principle which is applied to some parts of the library (ex
``contour`` and ``quiver``) is to separate the logic of create the
artists to be added to the draw tree and logic of adding them to the
draw tree more cleanly. Than is functions that look like ::

def artist_factory(*data_args, **style_kwargs):
...
return arts

It may be better to return these as a simple iterable ::

def artist_factory(*data_args, **style_kwargs) -> List[Artist]:
...
return arts

or as a dictionary::

def artist_factory(*data_args, **style_kwargs) -> Dict[str, Artist]:
...
return arts

The first case is simpler, but the second case exposes more semantics.

In either case, with a few exceptions where the plotting methods
change other properties of the axes (such as ``imshow`` which sets the
extents and may flip the y-axis), many plotting functions can be
implemented as simple wrappers ::

def add_to_axes(func):
# modulo signature and docstring hacking
@wraps(func)
def inner(*data_args, ax, **style_wkargs):
arts = func(*data_args, **style_kwargs)
for a in arts.values():
ax.add_artist(a)
return arts
return inner

Thus ::

@ensure_ax_kwarg
@add_to_axes
def art_factory(*data_args, **kwargs):
...
return arts

will produce a function which is a first-class. From a list of factories namespaces
for the three levels can easily be produced::

func_list = [...]
factory = SimpleNamespace(**{f.name: f for f in func_list})
explicit = SimpleNamespace(**{f.name: add_to_ax(getattr(factory, f.name))
for f in func_list})
implicit = SimpleNamespace(**{f.name: ensure_ax_kwarg(getattr(explicit, f.name))
for f in func_list})