Add Axes.ecdf() method. #24728

anntzer · 2022-12-14T18:56:39Z

PR Summary

See discussion at #16561 (attn @Wrzlprmft). I chose to implement remove_redundant (under the name "compress") as I can see cases where it would be helpful for performance, but left "absolute" ecdfs unimplemented, as I have never seen them (and https://stats.stackexchange.com/questions/451601/what-are-absolute-ecdfs-called-if-anybody-uses-them didn't attract too much activity).

I updated the histogram_cumulative example to showcase this (as one should basically always use ecdf() instead of hist(..., cumulative=True, density=True), but let's not overdo it and immediately consider removing that, as it's probably extremely widely used). Note that I removed the reference to astropy's bin selection examples, as it's already mentioned elsewhere (e.g. the histogram_features example) and also doesn't really apply as is to cumulative histograms.

Note: I am not overly convinced by silently dropping nans; we could also error on them. (+/-inf should clearly be supported, as they have a non-ambiguous interpretation, which I have relied on before).

PR Checklist

Documentation and Tests

Has pytest style unit tests (and pytest passes)
Documentation is sphinx and numpydoc compliant (the docs should build without error).
New plotting related features are documented with examples.

Release Notes

New features are marked with a .. versionadded:: directive in the docstring and documented in doc/users/next_whats_new/
API changes are marked with a .. versionchanged:: directive in the docstring and documented in doc/api/next_api_changes/
Release notes conform with instructions in next_whats_new/README.rst or next_api_changes/README.rst

oscargus · 2022-12-14T21:06:46Z

.. added...?

story645 · 2022-12-14T21:33:46Z

I'm ambivalent about adding a new computational plotting method, especially since seaborn provides it, but if it exists then there's a lovely open space for it in plot types: stats and it is the type of basic thing worth adding there.

jklymak · 2022-12-14T21:52:08Z

This is basically just ax.plot(np.sort(x), np.arange(len(x)))? Even with weights and normalizations, this seems pretty trivial for users to compute themselves, and I'm not a huge fan of providing folks with statistical black boxes in Matplotlib core.

anntzer · 2022-12-14T22:25:44Z

@oscargus @story645 I have addressed your comments.

@jklymak This was already argued at #16561 (comment) and #16561 (comment): the tricky part is not the calculation (which is indeed pretty trivial), but in selecting the correct drawstyle and adding the correct point at the correct end so that the ecdf is indeed a step plot (see https://en.wikipedia.org/wiki/Empirical_distribution_function, https://www.itl.nist.gov/div898/software/dataplot/refman1/auxillar/ecdfplot.htm, https://fr.mathworks.com/help/stats/ecdf.html, ...) instead of connecting the points with diagonal lines (which is actually wrong).

anntzer · 2022-12-15T07:59:56Z

attn @Phlya @jaroslawr @mwaskom who have also commented on the original thread.

jaroslawr · 2022-12-15T09:32:28Z

I think this is great, histograms, box plots and violin plots are already in matplotlib and ECDFs are a broadly useful type of plot in many different application areas. I would probably use pure matplotlib if only it had this feature. Libraries like seaborn come with the cost of a whole abstraction layer over matplotlib - you often have to understand both seaborn and matplotlib and how seaborn passes arguments to matplotlib etc. Filling in a few gaps like this PR does would make using pure matplotlib much more attractive.

doc/api/axes_api.rst

doc/users/next_whats_new/ecdf.rst

Wrzlprmft · 2022-12-15T13:25:11Z

Would it make sense to warn the users who are still using hist(...,cumulative=True)? At the very least, the documentation should point to the new function.

lib/matplotlib/axes/_axes.py

anntzer · 2022-12-15T14:14:45Z

@Wrzlprmft While I agree with you that there's rather few use cases of hist(..., cumulative=True) that are not better served than by ecdf(), let's first get the method in and decide on whether to include your warning later; I don't want to be derailed into a side-discussion.

lib/matplotlib/axes/_axes.py

anntzer · 2023-02-02T20:50:40Z

General agreement during call was to error on nans instead, and also error on any input that has masked values. Also modify the docstring to suggest various ways to handle nans (ignore them, map them to +/-inf).

jklymak · 2023-02-06T18:17:45Z

Draft until above is implemented

anntzer · 2023-02-08T20:40:45Z

Done.

jklymak · 2023-02-11T23:29:02Z

@oscargus did you have a chance to take a second look at this?

dstansby

Looks good to me - I have some minor suggestions, and the version change the docs is blocking, but feel free to self-merge when that's done

dstansby · 2023-02-23T18:11:36Z

lib/matplotlib/axes/_axes.py

+        """
+        Compute and plot the empirical cumulative distribution function of *x*.
+
+        .. versionadded:: 3.7


This needs updating to 3.8

dstansby · 2023-02-23T18:13:26Z

lib/matplotlib/axes/_axes.py

+
+        Returns
+        -------
+        Line2D


dstansby · 2023-02-23T18:16:18Z

lib/matplotlib/axes/_axes.py

+        x = x[argsort]
+        if weights is None:
+            # Ensure that we end at exactly 1, avoiding floating point errors.
+            cweights = (1 + np.arange(len(x))) / len(x)


I'm a bit confused as to what cweights is here? perhaps a comment explaining what it is would be good?

dstansby · 2023-02-23T18:17:30Z

lib/matplotlib/axes/_axes.py

+            # Ensure that we end at exactly 1, avoiding floating point errors.
+            cweights = (1 + np.arange(len(x))) / len(x)
+        else:
+            weights = np.take(weights, argsort)


Suggested change

weights = np.take(weights, argsort)

# Sort weights

weights = np.take(weights, argsort)

dstansby · 2023-02-23T18:20:01Z

lib/matplotlib/axes/_axes.py

+            weights = np.take(weights, argsort)
+            cweights = np.cumsum(weights / np.sum(weights))
+        if compress:
+            compress_idxs = [0, *(x[:-1] != x[1:]).nonzero()[0] + 1]


Suggested change

compress_idxs = [0, *(x[:-1] != x[1:]).nonzero()[0] + 1]

# Get indices of unique values in x

compress_idxs = [0, *(x[:-1] != x[1:]).nonzero()[0] + 1]

anntzer · 2023-02-23T19:01:11Z

Thanks, I've handled all the comments.

doc/users/next_whats_new/ecdf.rst

anntzer added the topic: plotting methods label Dec 14, 2022

story645 added the New feature label Dec 14, 2022

anntzer force-pushed the ecdf branch 2 times, most recently from c97565c to c407a44 Compare December 14, 2022 20:34

anntzer force-pushed the ecdf branch from c407a44 to 45ec3d9 Compare December 15, 2022 07:52

oscargus reviewed Dec 15, 2022

View reviewed changes

doc/api/axes_api.rst Outdated Show resolved Hide resolved

oscargus reviewed Dec 15, 2022

View reviewed changes

doc/users/next_whats_new/ecdf.rst Show resolved Hide resolved

anntzer force-pushed the ecdf branch 2 times, most recently from 66f1b28 to bf9e787 Compare December 15, 2022 10:51

oscargus reviewed Dec 15, 2022

View reviewed changes

lib/matplotlib/axes/_axes.py Show resolved Hide resolved

anntzer force-pushed the ecdf branch from bf9e787 to 68aff61 Compare December 15, 2022 14:21

tacaswell added this to the v3.8.0 milestone Dec 15, 2022

github-actions bot added the status: needs rebase label Dec 22, 2022

anntzer force-pushed the ecdf branch from 68aff61 to 8cf71a3 Compare December 28, 2022 12:34

github-actions bot removed the status: needs rebase label Dec 28, 2022

story645 reviewed Dec 28, 2022

View reviewed changes

lib/matplotlib/axes/_axes.py Outdated Show resolved Hide resolved

anntzer force-pushed the ecdf branch from 8cf71a3 to 4ceec8b Compare December 28, 2022 23:04

anntzer force-pushed the ecdf branch from 4ceec8b to 04a3ab8 Compare January 18, 2023 21:48

QuLogic linked an issue Jan 24, 2023 that may be closed by this pull request

Feature request: proper ECDF #16561

Closed

jklymak marked this pull request as draft February 6, 2023 18:17

anntzer force-pushed the ecdf branch from 04a3ab8 to 2cfea70 Compare February 8, 2023 20:40

anntzer marked this pull request as ready for review February 8, 2023 20:40

jklymak approved these changes Feb 9, 2023

View reviewed changes

jklymak added the status: needs review label Feb 11, 2023

github-actions bot added the status: needs rebase label Feb 21, 2023

dstansby approved these changes Feb 23, 2023

View reviewed changes

anntzer force-pushed the ecdf branch from 2cfea70 to 5329e80 Compare February 23, 2023 19:01

github-actions bot removed the status: needs rebase label Feb 23, 2023

tacaswell reviewed Feb 23, 2023

View reviewed changes

doc/users/next_whats_new/ecdf.rst Show resolved Hide resolved

anntzer force-pushed the ecdf branch from 5329e80 to 86c84dd Compare February 23, 2023 20:14

github-actions bot added the status: needs rebase label Feb 24, 2023

Add Axes.ecdf() method.

7bbca68

anntzer force-pushed the ecdf branch from 86c84dd to 7bbca68 Compare March 26, 2023 13:55

github-actions bot removed the status: needs rebase label Mar 26, 2023

greglucas merged commit 5b85655 into matplotlib:main Mar 26, 2023

anntzer deleted the ecdf branch March 26, 2023 17:30

ksunden mentioned this pull request Mar 29, 2023

Initial implementation of type stubs (mypy/PEP484) #24976

Merged

10 tasks

QuLogic removed the status: needs review label May 12, 2023

	weights = np.take(weights, argsort)
	# Sort weights
	weights = np.take(weights, argsort)

	compress_idxs = [0, *(x[:-1] != x[1:]).nonzero()[0] + 1]
	# Get indices of unique values in x
	compress_idxs = [0, *(x[:-1] != x[1:]).nonzero()[0] + 1]

Uh oh!

Add Axes.ecdf() method. #24728

Add Axes.ecdf() method. #24728

Uh oh!

Conversation

anntzer commented Dec 14, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Summary

PR Checklist

Uh oh!

oscargus commented Dec 14, 2022

Uh oh!

story645 commented Dec 14, 2022

Uh oh!

jklymak commented Dec 14, 2022

Uh oh!

anntzer commented Dec 14, 2022

Uh oh!

anntzer commented Dec 15, 2022

Uh oh!

jaroslawr commented Dec 15, 2022

Uh oh!

Uh oh!

Uh oh!

Wrzlprmft commented Dec 15, 2022

Uh oh!

Uh oh!

anntzer commented Dec 15, 2022

Uh oh!

Uh oh!

anntzer commented Feb 2, 2023

Uh oh!

jklymak commented Feb 6, 2023

Uh oh!

anntzer commented Feb 8, 2023

Uh oh!

jklymak commented Feb 11, 2023

Uh oh!

dstansby left a comment

Choose a reason for hiding this comment

Uh oh!

dstansby Feb 23, 2023

Choose a reason for hiding this comment

Uh oh!

dstansby Feb 23, 2023

Choose a reason for hiding this comment

Uh oh!

dstansby Feb 23, 2023

Choose a reason for hiding this comment

Uh oh!

dstansby Feb 23, 2023

Choose a reason for hiding this comment

Uh oh!

dstansby Feb 23, 2023

Choose a reason for hiding this comment

Uh oh!

anntzer commented Feb 23, 2023

Uh oh!

Uh oh!

Uh oh!

anntzer commented Dec 14, 2022 •

edited

Loading