Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit 5b85655

Browse files
authored
Merge pull request #24728 from anntzer/ecdf
Add Axes.ecdf() method.
2 parents b7324b2 + 7bbca68 commit 5b85655

File tree

9 files changed

+234
-55
lines changed

9 files changed

+234
-55
lines changed

doc/api/axes_api.rst

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -110,11 +110,12 @@ Statistics
110110
:template: autosummary.rst
111111
:nosignatures:
112112

113+
Axes.ecdf
113114
Axes.boxplot
114115
Axes.violinplot
115116

116-
Axes.violin
117117
Axes.bxp
118+
Axes.violin
118119

119120
Binned
120121
------

doc/api/pyplot_summary.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -114,6 +114,7 @@ Statistics
114114
:template: autosummary.rst
115115
:nosignatures:
116116

117+
ecdf
117118
boxplot
118119
violinplot
119120

doc/users/next_whats_new/ecdf.rst

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
``Axes.ecdf``
2+
~~~~~~~~~~~~~
3+
A new Axes method, `~.Axes.ecdf`, allows plotting empirical cumulative
4+
distribution functions without any binning.
5+
6+
.. plot::
7+
:include-source:
8+
9+
import matplotlib.pyplot as plt
10+
import numpy as np
11+
12+
fig, ax = plt.subplots()
13+
ax.ecdf(np.random.randn(100))
Lines changed: 50 additions & 54 deletions
Original file line numberDiff line numberDiff line change
@@ -1,36 +1,27 @@
11
"""
2-
==================================================
3-
Using histograms to plot a cumulative distribution
4-
==================================================
5-
6-
This shows how to plot a cumulative, normalized histogram as a
7-
step function in order to visualize the empirical cumulative
8-
distribution function (CDF) of a sample. We also show the theoretical CDF.
9-
10-
A couple of other options to the ``hist`` function are demonstrated. Namely, we
11-
use the *density* parameter to normalize the histogram and a couple of different
12-
options to the *cumulative* parameter. The *density* parameter takes a boolean
13-
value. When ``True``, the bin heights are scaled such that the total area of
14-
the histogram is 1. The *cumulative* keyword argument is a little more nuanced.
15-
Like *density*, you can pass it True or False, but you can also pass it -1 to
16-
reverse the distribution.
17-
18-
Since we're showing a normalized and cumulative histogram, these curves
19-
are effectively the cumulative distribution functions (CDFs) of the
20-
samples. In engineering, empirical CDFs are sometimes called
21-
"non-exceedance" curves. In other words, you can look at the
22-
y-value for a given-x-value to get the probability of and observation
23-
from the sample not exceeding that x-value. For example, the value of
24-
225 on the x-axis corresponds to about 0.85 on the y-axis, so there's an
25-
85% chance that an observation in the sample does not exceed 225.
26-
Conversely, setting, ``cumulative`` to -1 as is done in the
27-
last series for this example, creates an "exceedance" curve.
28-
29-
Selecting different bin counts and sizes can significantly affect the
30-
shape of a histogram. The Astropy docs have a great section on how to
31-
select these parameters:
32-
http://docs.astropy.org/en/stable/visualization/histogram.html
33-
2+
=================================
3+
Plotting cumulative distributions
4+
=================================
5+
6+
This example shows how to plot the empirical cumulative distribution function
7+
(ECDF) of a sample. We also show the theoretical CDF.
8+
9+
In engineering, ECDFs are sometimes called "non-exceedance" curves: the y-value
10+
for a given x-value gives probability that an observation from the sample is
11+
below that x-value. For example, the value of 220 on the x-axis corresponds to
12+
about 0.80 on the y-axis, so there is an 80% chance that an observation in the
13+
sample does not exceed 220. Conversely, the empirical *complementary*
14+
cumulative distribution function (the ECCDF, or "exceedance" curve) shows the
15+
probability y that an observation from the sample is above a value x.
16+
17+
A direct method to plot ECDFs is `.Axes.ecdf`. Passing ``complementary=True``
18+
results in an ECCDF instead.
19+
20+
Alternatively, one can use ``ax.hist(data, density=True, cumulative=True)`` to
21+
first bin the data, as if plotting a histogram, and then compute and plot the
22+
cumulative sums of the frequencies of entries in each bin. Here, to plot the
23+
ECCDF, pass ``cumulative=-1``. Note that this approach results in an
24+
approximation of the E(C)CDF, whereas `.Axes.ecdf` is exact.
3425
"""
3526

3627
import matplotlib.pyplot as plt
@@ -40,33 +31,37 @@
4031

4132
mu = 200
4233
sigma = 25
43-
n_bins = 50
44-
x = np.random.normal(mu, sigma, size=100)
34+
n_bins = 25
35+
data = np.random.normal(mu, sigma, size=100)
4536

46-
fig, ax = plt.subplots(figsize=(8, 4))
37+
fig = plt.figure(figsize=(9, 4), layout="constrained")
38+
axs = fig.subplots(1, 2, sharex=True, sharey=True)
4739

48-
# plot the cumulative histogram
49-
n, bins, patches = ax.hist(x, n_bins, density=True, histtype='step',
50-
cumulative=True, label='Empirical')
51-
52-
# Add a line showing the expected distribution.
40+
# Cumulative distributions.
41+
axs[0].ecdf(data, label="CDF")
42+
n, bins, patches = axs[0].hist(data, n_bins, density=True, histtype="step",
43+
cumulative=True, label="Cumulative histogram")
44+
x = np.linspace(data.min(), data.max())
5345
y = ((1 / (np.sqrt(2 * np.pi) * sigma)) *
54-
np.exp(-0.5 * (1 / sigma * (bins - mu))**2))
46+
np.exp(-0.5 * (1 / sigma * (x - mu))**2))
5547
y = y.cumsum()
5648
y /= y[-1]
57-
58-
ax.plot(bins, y, 'k--', linewidth=1.5, label='Theoretical')
59-
60-
# Overlay a reversed cumulative histogram.
61-
ax.hist(x, bins=bins, density=True, histtype='step', cumulative=-1,
62-
label='Reversed emp.')
63-
64-
# tidy up the figure
65-
ax.grid(True)
66-
ax.legend(loc='right')
67-
ax.set_title('Cumulative step histograms')
68-
ax.set_xlabel('Annual rainfall (mm)')
69-
ax.set_ylabel('Likelihood of occurrence')
49+
axs[0].plot(x, y, "k--", linewidth=1.5, label="Theory")
50+
51+
# Complementary cumulative distributions.
52+
axs[1].ecdf(data, complementary=True, label="CCDF")
53+
axs[1].hist(data, bins=bins, density=True, histtype="step", cumulative=-1,
54+
label="Reversed cumulative histogram")
55+
axs[1].plot(x, 1 - y, "k--", linewidth=1.5, label="Theory")
56+
57+
# Label the figure.
58+
fig.suptitle("Cumulative distributions")
59+
for ax in axs:
60+
ax.grid(True)
61+
ax.legend()
62+
ax.set_xlabel("Annual rainfall (mm)")
63+
ax.set_ylabel("Probability of occurrence")
64+
ax.label_outer()
7065

7166
plt.show()
7267

@@ -78,3 +73,4 @@
7873
# in this example:
7974
#
8075
# - `matplotlib.axes.Axes.hist` / `matplotlib.pyplot.hist`
76+
# - `matplotlib.axes.Axes.ecdf` / `matplotlib.pyplot.ecdf`

galleries/plot_types/stats/ecdf.py

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
"""
2+
=======
3+
ecdf(x)
4+
=======
5+
6+
See `~matplotlib.axes.Axes.ecdf`.
7+
"""
8+
9+
import matplotlib.pyplot as plt
10+
import numpy as np
11+
12+
plt.style.use('_mpl-gallery')
13+
14+
# make data
15+
np.random.seed(1)
16+
x = 4 + np.random.normal(0, 1.5, 200)
17+
18+
# plot:
19+
fig, ax = plt.subplots()
20+
ax.ecdf(x)
21+
plt.show()

lib/matplotlib/axes/_axes.py

Lines changed: 102 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7112,6 +7112,108 @@ def hist2d(self, x, y, bins=10, range=None, density=False, weights=None,
71127112

71137113
return h, xedges, yedges, pc
71147114

7115+
@_preprocess_data(replace_names=["x", "weights"], label_namer="x")
7116+
@_docstring.dedent_interpd
7117+
def ecdf(self, x, weights=None, *, complementary=False,
7118+
orientation="vertical", compress=False, **kwargs):
7119+
"""
7120+
Compute and plot the empirical cumulative distribution function of *x*.
7121+
7122+
.. versionadded:: 3.8
7123+
7124+
Parameters
7125+
----------
7126+
x : 1d array-like
7127+
The input data. Infinite entries are kept (and move the relevant
7128+
end of the ecdf from 0/1), but NaNs and masked values are errors.
7129+
7130+
weights : 1d array-like or None, default: None
7131+
The weights of the entries; must have the same shape as *x*.
7132+
Weights corresponding to NaN data points are dropped, and then the
7133+
remaining weights are normalized to sum to 1. If unset, all
7134+
entries have the same weight.
7135+
7136+
complementary : bool, default: False
7137+
Whether to plot a cumulative distribution function, which increases
7138+
from 0 to 1 (the default), or a complementary cumulative
7139+
distribution function, which decreases from 1 to 0.
7140+
7141+
orientation : {"vertical", "horizontal"}, default: "vertical"
7142+
Whether the entries are plotted along the x-axis ("vertical", the
7143+
default) or the y-axis ("horizontal"). This parameter takes the
7144+
same values as in `~.Axes.hist`.
7145+
7146+
compress : bool, default: False
7147+
Whether multiple entries with the same values are grouped together
7148+
(with a summed weight) before plotting. This is mainly useful if
7149+
*x* contains many identical data points, to decrease the rendering
7150+
complexity of the plot. If *x* contains no duplicate points, this
7151+
has no effect and just uses some time and memory.
7152+
7153+
Other Parameters
7154+
----------------
7155+
data : indexable object, optional
7156+
DATA_PARAMETER_PLACEHOLDER
7157+
7158+
**kwargs
7159+
Keyword arguments control the `.Line2D` properties:
7160+
7161+
%(Line2D:kwdoc)s
7162+
7163+
Returns
7164+
-------
7165+
`.Line2D`
7166+
7167+
Notes
7168+
-----
7169+
The ecdf plot can be thought of as a cumulative histogram with one bin
7170+
per data entry; i.e. it reports on the entire dataset without any
7171+
arbitrary binning.
7172+
7173+
If *x* contains NaNs or masked entries, either remove them first from
7174+
the array (if they should not taken into account), or replace them by
7175+
-inf or +inf (if they should be sorted at the beginning or the end of
7176+
the array).
7177+
"""
7178+
_api.check_in_list(["horizontal", "vertical"], orientation=orientation)
7179+
if "drawstyle" in kwargs or "ds" in kwargs:
7180+
raise TypeError("Cannot pass 'drawstyle' or 'ds' to ecdf()")
7181+
if np.ma.getmask(x).any():
7182+
raise ValueError("ecdf() does not support masked entries")
7183+
x = np.asarray(x)
7184+
if np.isnan(x).any():
7185+
raise ValueError("ecdf() does not support NaNs")
7186+
argsort = np.argsort(x)
7187+
x = x[argsort]
7188+
if weights is None:
7189+
# Ensure that we end at exactly 1, avoiding floating point errors.
7190+
cum_weights = (1 + np.arange(len(x))) / len(x)
7191+
else:
7192+
weights = np.take(weights, argsort) # Reorder weights like we reordered x.
7193+
cum_weights = np.cumsum(weights / np.sum(weights))
7194+
if compress:
7195+
# Get indices of unique x values.
7196+
compress_idxs = [0, *(x[:-1] != x[1:]).nonzero()[0] + 1]
7197+
x = x[compress_idxs]
7198+
cum_weights = cum_weights[compress_idxs]
7199+
if orientation == "vertical":
7200+
if not complementary:
7201+
line, = self.plot([x[0], *x], [0, *cum_weights],
7202+
drawstyle="steps-post", **kwargs)
7203+
else:
7204+
line, = self.plot([*x, x[-1]], [1, *1 - cum_weights],
7205+
drawstyle="steps-pre", **kwargs)
7206+
line.sticky_edges.y[:] = [0, 1]
7207+
else: # orientation == "horizontal":
7208+
if not complementary:
7209+
line, = self.plot([0, *cum_weights], [x[0], *x],
7210+
drawstyle="steps-pre", **kwargs)
7211+
else:
7212+
line, = self.plot([1, *1 - cum_weights], [*x, x[-1]],
7213+
drawstyle="steps-post", **kwargs)
7214+
line.sticky_edges.x[:] = [0, 1]
7215+
return line
7216+
71157217
@_preprocess_data(replace_names=["x"])
71167218
@_docstring.dedent_interpd
71177219
def psd(self, x, NFFT=None, Fs=None, Fc=None, detrend=None,

lib/matplotlib/pyplot.py

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2515,6 +2515,17 @@ def csd(
25152515
**({"data": data} if data is not None else {}), **kwargs)
25162516

25172517

2518+
# Autogenerated by boilerplate.py. Do not edit as changes will be lost.
2519+
@_copy_docstring_and_deprecators(Axes.ecdf)
2520+
def ecdf(
2521+
x, weights=None, *, complementary=False,
2522+
orientation='vertical', compress=False, data=None, **kwargs):
2523+
return gca().ecdf(
2524+
x, weights=weights, complementary=complementary,
2525+
orientation=orientation, compress=compress,
2526+
**({"data": data} if data is not None else {}), **kwargs)
2527+
2528+
25182529
# Autogenerated by boilerplate.py. Do not edit as changes will be lost.
25192530
@_copy_docstring_and_deprecators(Axes.errorbar)
25202531
def errorbar(

lib/matplotlib/tests/test_axes.py

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8448,3 +8448,36 @@ def test_rc_axes_label_formatting():
84488448
assert ax.xaxis.label.get_color() == 'red'
84498449
assert ax.xaxis.label.get_fontsize() == 20
84508450
assert ax.xaxis.label.get_fontweight() == 'bold'
8451+
8452+
8453+
@check_figures_equal(extensions=["png"])
8454+
def test_ecdf(fig_test, fig_ref):
8455+
data = np.array([0, -np.inf, -np.inf, np.inf, 1, 1, 2])
8456+
weights = range(len(data))
8457+
axs_test = fig_test.subplots(1, 2)
8458+
for ax, orientation in zip(axs_test, ["vertical", "horizontal"]):
8459+
l0 = ax.ecdf(data, orientation=orientation)
8460+
l1 = ax.ecdf("d", "w", data={"d": np.ma.array(data), "w": weights},
8461+
orientation=orientation,
8462+
complementary=True, compress=True, ls=":")
8463+
assert len(l0.get_xdata()) == (~np.isnan(data)).sum() + 1
8464+
assert len(l1.get_xdata()) == len({*data[~np.isnan(data)]}) + 1
8465+
axs_ref = fig_ref.subplots(1, 2)
8466+
axs_ref[0].plot([-np.inf, -np.inf, -np.inf, 0, 1, 1, 2, np.inf],
8467+
np.arange(8) / 7, ds="steps-post")
8468+
axs_ref[0].plot([-np.inf, 0, 1, 2, np.inf, np.inf],
8469+
np.array([21, 20, 18, 14, 3, 0]) / 21,
8470+
ds="steps-pre", ls=":")
8471+
axs_ref[1].plot(np.arange(8) / 7,
8472+
[-np.inf, -np.inf, -np.inf, 0, 1, 1, 2, np.inf],
8473+
ds="steps-pre")
8474+
axs_ref[1].plot(np.array([21, 20, 18, 14, 3, 0]) / 21,
8475+
[-np.inf, 0, 1, 2, np.inf, np.inf],
8476+
ds="steps-post", ls=":")
8477+
8478+
8479+
def test_ecdf_invalid():
8480+
with pytest.raises(ValueError):
8481+
plt.ecdf([1, np.nan])
8482+
with pytest.raises(ValueError):
8483+
plt.ecdf(np.ma.array([1, 2], mask=[True, False]))

tools/boilerplate.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -246,6 +246,7 @@ def boilerplate_gen():
246246
'contour',
247247
'contourf',
248248
'csd',
249+
'ecdf',
249250
'errorbar',
250251
'eventplot',
251252
'fill',

0 commit comments

Comments
 (0)