Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit 042d366

Browse files
committed
DOC: normalizing histograms
1 parent 3e98181 commit 042d366

File tree

1 file changed

+21
-10
lines changed

1 file changed

+21
-10
lines changed

galleries/examples/statistics/histogram_normalization.py

Lines changed: 21 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -86,18 +86,30 @@
8686

8787
fig, ax = plt.subplots()
8888
ax.hist(xdata, bins=xbins, density=True, **style)
89-
89+
ax.set_ylabel('Probability (per dx)')
90+
ax.set_xlabel('x bins (dx=0.5)')
9091

9192
# %%
9293
# This normalization can be a little hard to interpret when just exploring the
93-
# data. The value attached to each bar is divided by the total number of data
94-
# points _and_ the width of the bin, and the values _integrate_ to one when
95-
# integrating across the full range of data.
94+
# data. The value attached to each bar is divided by the total number of data
95+
# points _and_ the width of the bin, and thus the values _integrate_ to one
96+
# when integrating across the full range of data.
97+
# e.g. (``density = counts / (sum(counts) * np.diff(bins))``),
98+
# and (``np.sum(density * np.diff(bins)) == 1``).
99+
#
100+
# This normalization is how `probability density functions
101+
# <https://en.wikipedia.org/wiki/Probability_density_function>`_ are
102+
# defined in statistics. If :math:`X` is a random variable on :math:`x`, then
103+
# :math:`f_X` is is the probability density function if :math:`P[a<X<b] =
104+
# \int_a^b f_X dx`. Note that if the units of x are Volts (for instance), then
105+
# the units of :math:`f_X` are :math:`V^{-1}` or probability per change in
106+
# voltage.
96107
#
97108
# The usefulness of this normalization is a little more clear when we draw from
98109
# a known distribution and try to compare with theory. So, choose 1000 points
99-
# from a normal distribution, and also calculate the known probability density
100-
# function
110+
# from a `normal distribution
111+
# <https://en.wikipedia.org/wiki/Normal_distribution>`_, and also calculate the
112+
# known probability density function:
101113

102114
xdata = rng.normal(size=1000)
103115
xpdf = np.arange(-4, 4, 0.1)
@@ -118,10 +130,9 @@
118130

119131
ax['True'].hist(xdata, bins=xbins, density=True, histtype='step')
120132
ax['True'].plot(xpdf, pdf)
121-
ax['True'].set_ylabel('Probability per x')
133+
ax['True'].set_ylabel('Probability (per dx)')
122134
ax['True'].set_xlabel('x bins (below -1.25 bins are wider)')
123135

124-
125136
# %%
126137
# Using *density* also makes it easier to compare histograms with different bin
127138
# widths. Note that in order to get the theoretical distribution, we must
@@ -143,7 +154,7 @@
143154
# Labels:
144155
ax['False'].set_xlabel('x bins')
145156
ax['False'].set_ylabel('Count per bin')
146-
ax['True'].set_ylabel('Probability per x')
157+
ax['True'].set_ylabel('Probability (per dx)')
147158
ax['True'].set_xlabel('x bins')
148159
ax['True'].legend(fontsize='small')
149160

@@ -182,7 +193,7 @@
182193

183194
ax['density'].hist(xdata, bins=xbins, histtype='step', density=True)
184195
ax['density'].hist(xdata2, bins=xbins, histtype='step', density=True)
185-
ax['density'].set_ylabel('Probabilty per x')
196+
ax['density'].set_ylabel('Probability (per dx)')
186197
ax['density'].set_title('Density=True')
187198
ax['density'].set_xlabel('x bins')
188199

0 commit comments

Comments
 (0)