-
-
Notifications
You must be signed in to change notification settings - Fork 7.9k
Add Axes.ecdf() method. #24728
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Axes.ecdf() method. #24728
Conversation
c97565c
to
c407a44
Compare
|
I'm ambivalent about adding a new computational plotting method, especially since seaborn provides it, but if it exists then there's a lovely open space for it in plot types: stats and it is the type of basic thing worth adding there. |
This is basically just |
@oscargus @story645 I have addressed your comments. @jklymak This was already argued at #16561 (comment) and #16561 (comment): the tricky part is not the calculation (which is indeed pretty trivial), but in selecting the correct drawstyle and adding the correct point at the correct end so that the ecdf is indeed a step plot (see https://en.wikipedia.org/wiki/Empirical_distribution_function, https://www.itl.nist.gov/div898/software/dataplot/refman1/auxillar/ecdfplot.htm, https://fr.mathworks.com/help/stats/ecdf.html, ...) instead of connecting the points with diagonal lines (which is actually wrong). |
attn @Phlya @jaroslawr @mwaskom who have also commented on the original thread. |
I think this is great, histograms, box plots and violin plots are already in matplotlib and ECDFs are a broadly useful type of plot in many different application areas. I would probably use pure matplotlib if only it had this feature. Libraries like seaborn come with the cost of a whole abstraction layer over matplotlib - you often have to understand both seaborn and matplotlib and how seaborn passes arguments to matplotlib etc. Filling in a few gaps like this PR does would make using pure matplotlib much more attractive. |
66f1b28
to
bf9e787
Compare
Would it make sense to warn the users who are still using |
@Wrzlprmft While I agree with you that there's rather few use cases of hist(..., cumulative=True) that are not better served than by ecdf(), let's first get the method in and decide on whether to include your warning later; I don't want to be derailed into a side-discussion. |
General agreement during call was to error on nans instead, and also error on any input that has masked values. Also modify the docstring to suggest various ways to handle nans (ignore them, map them to +/-inf). |
Draft until above is implemented |
Done. |
@oscargus did you have a chance to take a second look at this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me - I have some minor suggestions, and the version change the docs is blocking, but feel free to self-merge when that's done
lib/matplotlib/axes/_axes.py
Outdated
""" | ||
Compute and plot the empirical cumulative distribution function of *x*. | ||
|
||
.. versionadded:: 3.7 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs updating to 3.8
lib/matplotlib/axes/_axes.py
Outdated
|
||
Returns | ||
------- | ||
Line2D |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Link this?
lib/matplotlib/axes/_axes.py
Outdated
x = x[argsort] | ||
if weights is None: | ||
# Ensure that we end at exactly 1, avoiding floating point errors. | ||
cweights = (1 + np.arange(len(x))) / len(x) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit confused as to what cweights
is here? perhaps a comment explaining what it is would be good?
lib/matplotlib/axes/_axes.py
Outdated
# Ensure that we end at exactly 1, avoiding floating point errors. | ||
cweights = (1 + np.arange(len(x))) / len(x) | ||
else: | ||
weights = np.take(weights, argsort) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
weights = np.take(weights, argsort) | |
# Sort weights | |
weights = np.take(weights, argsort) |
weights = np.take(weights, argsort) | ||
cweights = np.cumsum(weights / np.sum(weights)) | ||
if compress: | ||
compress_idxs = [0, *(x[:-1] != x[1:]).nonzero()[0] + 1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
compress_idxs = [0, *(x[:-1] != x[1:]).nonzero()[0] + 1] | |
# Get indices of unique values in x | |
compress_idxs = [0, *(x[:-1] != x[1:]).nonzero()[0] + 1] |
Thanks, I've handled all the comments. |
PR Summary
See discussion at #16561 (attn @Wrzlprmft). I chose to implement remove_redundant (under the name "compress") as I can see cases where it would be helpful for performance, but left "absolute" ecdfs unimplemented, as I have never seen them (and https://stats.stackexchange.com/questions/451601/what-are-absolute-ecdfs-called-if-anybody-uses-them didn't attract too much activity).
I updated the histogram_cumulative example to showcase this (as one should basically always use ecdf() instead of hist(..., cumulative=True, density=True), but let's not overdo it and immediately consider removing that, as it's probably extremely widely used). Note that I removed the reference to astropy's bin selection examples, as it's already mentioned elsewhere (e.g. the histogram_features example) and also doesn't really apply as is to cumulative histograms.
Note: I am not overly convinced by silently dropping nans; we could also error on them. (+/-inf should clearly be supported, as they have a non-ambiguous interpretation, which I have relied on before).
PR Checklist
Documentation and Tests
pytest
passes)Release Notes
.. versionadded::
directive in the docstring and documented indoc/users/next_whats_new/
.. versionchanged::
directive in the docstring and documented indoc/api/next_api_changes/
next_whats_new/README.rst
ornext_api_changes/README.rst