Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit 8cf71a3

Browse files
committed
Add Axes.ecdf() method.
1 parent e9d1f9c commit 8cf71a3

File tree

9 files changed

+218
-55
lines changed

9 files changed

+218
-55
lines changed

doc/api/axes_api.rst

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -110,11 +110,12 @@ Statistics
110110
:template: autosummary.rst
111111
:nosignatures:
112112

113+
Axes.ecdf
113114
Axes.boxplot
114115
Axes.violinplot
115116

116-
Axes.violin
117117
Axes.bxp
118+
Axes.violin
118119

119120
Binned
120121
------

doc/api/pyplot_summary.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -114,6 +114,7 @@ Statistics
114114
:template: autosummary.rst
115115
:nosignatures:
116116

117+
ecdf
117118
boxplot
118119
violinplot
119120

doc/users/next_whats_new/ecdf.rst

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
``Axes.ecdf``
2+
~~~~~~~~~~~~~
3+
A new Axes method, `~.Axes.ecdf`, allows plotting empirical cumulative
4+
distribution functions without any binning.
5+
6+
.. plot::
7+
8+
import matplotlib.pyplot as plt
9+
import numpy as np
10+
11+
fig, ax = plt.subplots()
12+
ax.ecdf(np.random.randn(100))
Lines changed: 50 additions & 54 deletions
Original file line numberDiff line numberDiff line change
@@ -1,36 +1,27 @@
11
"""
2-
==================================================
3-
Using histograms to plot a cumulative distribution
4-
==================================================
5-
6-
This shows how to plot a cumulative, normalized histogram as a
7-
step function in order to visualize the empirical cumulative
8-
distribution function (CDF) of a sample. We also show the theoretical CDF.
9-
10-
A couple of other options to the ``hist`` function are demonstrated. Namely, we
11-
use the *normed* parameter to normalize the histogram and a couple of different
12-
options to the *cumulative* parameter. The *normed* parameter takes a boolean
13-
value. When ``True``, the bin heights are scaled such that the total area of
14-
the histogram is 1. The *cumulative* keyword argument is a little more nuanced.
15-
Like *normed*, you can pass it True or False, but you can also pass it -1 to
16-
reverse the distribution.
17-
18-
Since we're showing a normalized and cumulative histogram, these curves
19-
are effectively the cumulative distribution functions (CDFs) of the
20-
samples. In engineering, empirical CDFs are sometimes called
21-
"non-exceedance" curves. In other words, you can look at the
22-
y-value for a given-x-value to get the probability of and observation
23-
from the sample not exceeding that x-value. For example, the value of
24-
225 on the x-axis corresponds to about 0.85 on the y-axis, so there's an
25-
85% chance that an observation in the sample does not exceed 225.
26-
Conversely, setting, ``cumulative`` to -1 as is done in the
27-
last series for this example, creates an "exceedance" curve.
28-
29-
Selecting different bin counts and sizes can significantly affect the
30-
shape of a histogram. The Astropy docs have a great section on how to
31-
select these parameters:
32-
http://docs.astropy.org/en/stable/visualization/histogram.html
33-
2+
=================================
3+
Plotting cumulative distributions
4+
=================================
5+
6+
This example shows how to plot the empirical cumulative distribution function
7+
(ECDF) of a sample. We also show the theoretical CDF.
8+
9+
In engineering, ECDFs are sometimes called "non-exceedance" curves: the y-value
10+
for a given x-value gives probability that an observation from the sample is
11+
below that x-value. For example, the value of 220 on the x-axis corresponds to
12+
about 0.80 on the y-axis, so there is an 80% chance that an observation in the
13+
sample does not exceed 220. Conversely, the empirical *complementary*
14+
cumulative distribution function (the ECCDF, or "exceedance" curve) shows the
15+
probability y that an observation from the sample is above a value x.
16+
17+
A direct method to plot ECDFs is `.Axes.ecdf`. Passing ``complementary=True``
18+
results in an ECCDF instead.
19+
20+
Alternatively, one can use ``ax.hist(data, density=True, cumulative=True)`` to
21+
first bin the data, as if plotting a histogram, and then compute and plot the
22+
cumulative sums of the frequencies of entries in each bin. Here, to plot the
23+
ECCDF, pass ``cumulative=-1``. Note that this approach results in an
24+
approximation of the E(C)CDF, whereas `.Axes.ecdf` is exact.
3425
"""
3526

3627
import numpy as np
@@ -40,33 +31,37 @@
4031

4132
mu = 200
4233
sigma = 25
43-
n_bins = 50
44-
x = np.random.normal(mu, sigma, size=100)
34+
n_bins = 25
35+
data = np.random.normal(mu, sigma, size=100)
4536

46-
fig, ax = plt.subplots(figsize=(8, 4))
37+
fig = plt.figure(figsize=(9, 4), layout="constrained")
38+
axs = fig.subplots(1, 2, sharex=True, sharey=True)
4739

48-
# plot the cumulative histogram
49-
n, bins, patches = ax.hist(x, n_bins, density=True, histtype='step',
50-
cumulative=True, label='Empirical')
51-
52-
# Add a line showing the expected distribution.
40+
# Cumulative distributions.
41+
axs[0].ecdf(data, label="CDF")
42+
n, bins, patches = axs[0].hist(data, n_bins, density=True, histtype="step",
43+
cumulative=True, label="Cumulative histogram")
44+
x = np.linspace(data.min(), data.max())
5345
y = ((1 / (np.sqrt(2 * np.pi) * sigma)) *
54-
np.exp(-0.5 * (1 / sigma * (bins - mu))**2))
46+
np.exp(-0.5 * (1 / sigma * (x - mu))**2))
5547
y = y.cumsum()
5648
y /= y[-1]
57-
58-
ax.plot(bins, y, 'k--', linewidth=1.5, label='Theoretical')
59-
60-
# Overlay a reversed cumulative histogram.
61-
ax.hist(x, bins=bins, density=True, histtype='step', cumulative=-1,
62-
label='Reversed emp.')
63-
64-
# tidy up the figure
65-
ax.grid(True)
66-
ax.legend(loc='right')
67-
ax.set_title('Cumulative step histograms')
68-
ax.set_xlabel('Annual rainfall (mm)')
69-
ax.set_ylabel('Likelihood of occurrence')
49+
axs[0].plot(x, y, "k--", linewidth=1.5, label="Theory")
50+
51+
# Complementary cumulative distributions.
52+
axs[1].ecdf(data, complementary=True, label="CCDF")
53+
axs[1].hist(data, bins=bins, density=True, histtype="step", cumulative=-1,
54+
label="Reversed cumulative histogram")
55+
axs[1].plot(x, 1 - y, "k--", linewidth=1.5, label="Theory")
56+
57+
# Label the figure.
58+
fig.suptitle("Cumulative distributions")
59+
for ax in axs:
60+
ax.grid(True)
61+
ax.legend()
62+
ax.set_xlabel("Annual rainfall (mm)")
63+
ax.set_ylabel("Probability of occurrence")
64+
ax.label_outer()
7065

7166
plt.show()
7267

@@ -78,3 +73,4 @@
7873
# in this example:
7974
#
8075
# - `matplotlib.axes.Axes.hist` / `matplotlib.pyplot.hist`
76+
# - `matplotlib.axes.Axes.ecdf` / `matplotlib.pyplot.ecdf`

lib/matplotlib/axes/_axes.py

Lines changed: 94 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7146,6 +7146,100 @@ def hist2d(self, x, y, bins=10, range=None, density=False, weights=None,
71467146

71477147
return h, xedges, yedges, pc
71487148

7149+
@_preprocess_data(replace_names=["x", "weights"], label_namer="x")
7150+
@_docstring.dedent_interpd
7151+
def ecdf(self, x, weights=None, *, complementary=False,
7152+
orientation="vertical", compress=False, **kwargs):
7153+
"""
7154+
Compute and plot the empirical cumulative distribution function of *x*.
7155+
7156+
.. versionadded:: 3.7
7157+
7158+
Parameters
7159+
----------
7160+
x : 1d array-like
7161+
The input data. NaN entries are dropped, but infinite entries are
7162+
kept (and move the relevant end of the ecdf from 0/1).
7163+
7164+
weights : 1d array-like or None, default: None
7165+
The weights of the entries; must have the same shape as *x*.
7166+
Weights corresponding to NaN data points are dropped, and then the
7167+
remaining weights are normalized to sum to 1. If unset, all
7168+
entries have the same weight.
7169+
7170+
complementary : bool, default: False
7171+
Whether to plot a cumulative distribution function, which increases
7172+
from 0 to 1 (the default), or a complementary cumulative
7173+
distribution function, which decreases from 1 to 0.
7174+
7175+
orientation : {"vertical", "horizontal"}, default: "vertical"
7176+
Whether the entries are plotted along the x-axis ("vertical", the
7177+
default) or the y-axis ("horizontal"). This parameter takes the
7178+
same values as in `~.Axes.hist`.
7179+
7180+
compress : bool, default: False
7181+
Whether multiple entries with the same values are grouped together
7182+
(with a summed weight) before plotting. This is mainly useful if
7183+
*x* contains many identical data points, to decrease the rendering
7184+
complexity of the plot. If *x* contains no duplicate points, this
7185+
has no effect and just uses some time and memory.
7186+
7187+
Other Parameters
7188+
----------------
7189+
data : indexable object, optional
7190+
DATA_PARAMETER_PLACEHOLDER
7191+
7192+
**kwargs
7193+
Keyword arguments control the `.Line2D` properties:
7194+
7195+
%(Line2D:kwdoc)s
7196+
7197+
Returns
7198+
-------
7199+
Line2D
7200+
7201+
Notes
7202+
-----
7203+
The ecdf plot can be thought of as a cumulative histogram with one bin
7204+
per data entry; i.e. it reports on the entire dataset without any
7205+
arbitrary binning.
7206+
"""
7207+
_api.check_in_list(["horizontal", "vertical"], orientation=orientation)
7208+
if "drawstyle" in kwargs or "ds" in kwargs:
7209+
raise TypeError("Cannot pass 'drawstyle' or 'ds' to ecdf()")
7210+
x = np.asarray(x)
7211+
# Indices that sort x and drop nans.
7212+
sort_nonan_idxs = np.argsort(x)[:len(x) - np.isnan(x).sum()]
7213+
x = x[sort_nonan_idxs]
7214+
if weights is None:
7215+
# Ensure that we end at exactly 1, avoiding floating point errors.
7216+
cweights = (1 + np.arange(len(x))) / len(x)
7217+
else:
7218+
weights = np.take(weights, sort_nonan_idxs)
7219+
cweights = np.cumsum(weights / np.sum(weights))
7220+
if compress:
7221+
compress_idxs = [0, *(x[:-1] != x[1:]).nonzero()[0] + 1]
7222+
x = x[compress_idxs]
7223+
cweights = cweights[compress_idxs]
7224+
if orientation == "vertical":
7225+
if not complementary:
7226+
line, = self.plot([x[0], *x], [0, *cweights],
7227+
drawstyle="steps-post", **kwargs)
7228+
else:
7229+
line, = self.plot([*x, x[-1]], [1, *1 - cweights],
7230+
drawstyle="steps-pre", **kwargs)
7231+
line.sticky_edges.y[:] = [0, 1]
7232+
else: # orientation == "horizontal":
7233+
if not complementary:
7234+
line, = self.plot([0, *cweights], [x[0], *x],
7235+
drawstyle="steps-pre", **kwargs)
7236+
else:
7237+
line, = self.plot([1, *1 - cweights], [*x, x[-1]],
7238+
drawstyle="steps-post", **kwargs)
7239+
print(line.get_data())
7240+
line.sticky_edges.x[:] = [0, 1]
7241+
return line
7242+
71497243
@_preprocess_data(replace_names=["x"])
71507244
@_docstring.dedent_interpd
71517245
def psd(self, x, NFFT=None, Fs=None, Fc=None, detrend=None,

lib/matplotlib/pyplot.py

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2527,6 +2527,17 @@ def csd(
25272527
**({"data": data} if data is not None else {}), **kwargs)
25282528

25292529

2530+
# Autogenerated by boilerplate.py. Do not edit as changes will be lost.
2531+
@_copy_docstring_and_deprecators(Axes.ecdf)
2532+
def ecdf(
2533+
x, weights=None, *, complementary=False,
2534+
orientation='vertical', compress=False, data=None, **kwargs):
2535+
return gca().ecdf(
2536+
x, weights=weights, complementary=complementary,
2537+
orientation=orientation, compress=compress,
2538+
**({"data": data} if data is not None else {}), **kwargs)
2539+
2540+
25302541
# Autogenerated by boilerplate.py. Do not edit as changes will be lost.
25312542
@_copy_docstring_and_deprecators(Axes.errorbar)
25322543
def errorbar(

lib/matplotlib/tests/test_axes.py

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8457,3 +8457,29 @@ def test_zorder_and_explicit_rasterization():
84578457
ln, = ax.plot(range(5), rasterized=True, zorder=1)
84588458
with io.BytesIO() as b:
84598459
fig.savefig(b, format='pdf')
8460+
8461+
8462+
@check_figures_equal(extensions=["png"])
8463+
def test_ecdf(fig_test, fig_ref):
8464+
data = np.array([0, np.nan, -np.inf, -np.inf, np.inf, 1, 1, 2])
8465+
weights = range(len(data))
8466+
axs_test = fig_test.subplots(1, 2)
8467+
for ax, orientation in zip(axs_test, ["vertical", "horizontal"]):
8468+
l0 = ax.ecdf(data, orientation=orientation)
8469+
l1 = ax.ecdf("d", "w", data={"d": data, "w": weights},
8470+
orientation=orientation,
8471+
complementary=True, compress=True, ls=":")
8472+
assert len(l0.get_xdata()) == (~np.isnan(data)).sum() + 1
8473+
assert len(l1.get_xdata()) == len({*data[~np.isnan(data)]}) + 1
8474+
axs_ref = fig_ref.subplots(1, 2)
8475+
axs_ref[0].plot([-np.inf, -np.inf, -np.inf, 0, 1, 1, 2, np.inf],
8476+
np.arange(8) / 7, ds="steps-post")
8477+
axs_ref[0].plot([-np.inf, 0, 1, 2, np.inf, np.inf],
8478+
np.array([27, 25, 22, 17, 4, 0]) / 27,
8479+
ds="steps-pre", ls=":")
8480+
axs_ref[1].plot(np.arange(8) / 7,
8481+
[-np.inf, -np.inf, -np.inf, 0, 1, 1, 2, np.inf],
8482+
ds="steps-pre")
8483+
axs_ref[1].plot(np.array([27, 25, 22, 17, 4, 0]) / 27,
8484+
[-np.inf, 0, 1, 2, np.inf, np.inf],
8485+
ds="steps-post", ls=":")

plot_types/stats/ecdf.py

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
"""
2+
=======
3+
ecdf(x)
4+
=======
5+
6+
See `~matplotlib.axes.Axes.ecdf`.
7+
"""
8+
9+
import matplotlib.pyplot as plt
10+
import numpy as np
11+
12+
plt.style.use('_mpl-gallery')
13+
14+
# make data
15+
np.random.seed(1)
16+
x = 4 + np.random.normal(0, 1.5, 200)
17+
18+
# plot:
19+
fig, ax = plt.subplots()
20+
ax.ecdf(x)
21+
plt.show()

tools/boilerplate.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -246,6 +246,7 @@ def boilerplate_gen():
246246
'contour',
247247
'contourf',
248248
'csd',
249+
'ecdf',
249250
'errorbar',
250251
'eventplot',
251252
'fill',

0 commit comments

Comments
 (0)