Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit 2d232ac

Browse files
NicolasHugogrisel
authored andcommitted
[MRG] Add Yeo-Johnson transform to PowerTransformer (#11520)
1 parent 1984dac commit 2d232ac

File tree

10 files changed

+558
-216
lines changed

10 files changed

+558
-216
lines changed

doc/glossary.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -294,7 +294,7 @@ General Concepts
294294
convergence of the training loss, to avoid over-fitting. This is
295295
generally done by monitoring the generalization score on a validation
296296
set. When available, it is activated through the parameter
297-
``early_stopping`` or by setting a postive :term:`n_iter_no_change`.
297+
``early_stopping`` or by setting a positive :term:`n_iter_no_change`.
298298

299299
estimator instance
300300
We sometimes use this terminology to distinguish an :term:`estimator`

doc/modules/preprocessing.rst

Lines changed: 30 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -309,20 +309,34 @@ Power transforms are a family of parametric, monotonic transformations that aim
309309
to map data from any distribution to as close to a Gaussian distribution as
310310
possible in order to stabilize variance and minimize skewness.
311311

312-
:class:`PowerTransformer` currently provides one such power transformation,
313-
the Box-Cox transform. The Box-Cox transform is given by:
312+
:class:`PowerTransformer` currently provides two such power transformations,
313+
the Yeo-Johnson transform and the Box-Cox transform.
314+
315+
The Yeo-Johnson transform is given by:
314316

315317
.. math::
316-
y_i^{(\lambda)} =
318+
x_i^{(\lambda)} =
317319
\begin{cases}
318-
\dfrac{y_i^\lambda - 1}{\lambda} & \text{if } \lambda \neq 0, \\[8pt]
319-
\ln{(y_i)} & \text{if } \lambda = 0,
320+
[(x_i + 1)^\lambda - 1] / \lambda & \text{if } \lambda \neq 0, x_i \geq 0, \\[8pt]
321+
\ln{(x_i) + 1} & \text{if } \lambda = 0, x_i \geq 0 \\[8pt]
322+
-[(-x_i + 1)^{2 - \lambda} - 1] / (2 - \lambda) & \text{if } \lambda \neq 2, x_i < 0, \\[8pt]
323+
- \ln (- x_i + 1) & \text{if } \lambda = 2, x_i < 0
320324
\end{cases}
321325
322-
Box-Cox can only be applied to strictly positive data. The transformation is
323-
parameterized by :math:`\lambda`, which is determined through maximum likelihood
324-
estimation. Here is an example of using Box-Cox to map samples drawn from a
325-
lognormal distribution to a normal distribution::
326+
while the Box-Cox transform is given by:
327+
328+
.. math::
329+
x_i^{(\lambda)} =
330+
\begin{cases}
331+
\dfrac{x_i^\lambda - 1}{\lambda} & \text{if } \lambda \neq 0, \\[8pt]
332+
\ln{(x_i)} & \text{if } \lambda = 0,
333+
\end{cases}
334+
335+
336+
Box-Cox can only be applied to strictly positive data. In both methods, the
337+
transformation is parameterized by :math:`\lambda`, which is determined through
338+
maximum likelihood estimation. Here is an example of using Box-Cox to map
339+
samples drawn from a lognormal distribution to a normal distribution::
326340

327341
>>> pt = preprocessing.PowerTransformer(method='box-cox', standardize=False)
328342
>>> X_lognormal = np.random.RandomState(616).lognormal(size=(3, 3))
@@ -339,13 +353,14 @@ While the above example sets the `standardize` option to `False`,
339353
:class:`PowerTransformer` will apply zero-mean, unit-variance normalization
340354
to the transformed output by default.
341355

342-
Below are examples of Box-Cox applied to various probability distributions.
343-
Note that when applied to certain distributions, Box-Cox achieves very
344-
Gaussian-like results, but with others, it is ineffective. This highlights
345-
the importance of visualizing the data before and after transformation.
356+
Below are examples of Box-Cox and Yeo-Johnson applied to various probability
357+
distributions. Note that when applied to certain distributions, the power
358+
transforms achieve very Gaussian-like results, but with others, they are
359+
ineffective. This highlights the importance of visualizing the data before and
360+
after transformation.
346361

347-
.. figure:: ../auto_examples/preprocessing/images/sphx_glr_plot_power_transformer_001.png
348-
:target: ../auto_examples/preprocessing/plot_power_transformer.html
362+
.. figure:: ../auto_examples/preprocessing/images/sphx_glr_plot_map_data_to_normal_001.png
363+
:target: ../auto_examples/preprocessing/plot_map_data_to_normal.html
349364
:align: center
350365
:scale: 100
351366

doc/whats_new/v0.20.rst

Lines changed: 8 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -136,12 +136,15 @@ Preprocessing
136136
DataFrames. :issue:`9012` by `Andreas Müller`_ and `Joris Van den Bossche`_,
137137
and :issue:`11315` by :user:`Thomas Fan <thomasjpfan>`.
138138

139-
- Added :class:`preprocessing.PowerTransformer`, which implements the Box-Cox
140-
power transformation, allowing users to map data from any distribution to a
141-
Gaussian distribution. This is useful as a variance-stabilizing transformation
142-
in situations where normality and homoscedasticity are desirable.
139+
- Added :class:`preprocessing.PowerTransformer`, which implements the
140+
Yeo-Johnson and Box-Cox power transformations. Power transformations try to
141+
find a set of feature-wise parametric transformations to approximately map
142+
data to a Gaussian distribution centered at zero and with unit variance.
143+
This is useful as a variance-stabilizing transformation in situations where
144+
normality and homoscedasticity are desirable.
143145
:issue:`10210` by :user:`Eric Chang <ericchang00>` and
144-
:user:`Maniteja Nandana <maniteja123>`.
146+
:user:`Maniteja Nandana <maniteja123>`, and :issue:`11520` by :user:`Nicolas
147+
Hug <nicolashug>`.
145148

146149
- Added the :class:`compose.TransformedTargetRegressor` which transforms
147150
the target y before fitting a regression model. The predictions are mapped

examples/preprocessing/plot_all_scaling.py

Lines changed: 17 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -87,6 +87,8 @@
8787
MaxAbsScaler().fit_transform(X)),
8888
('Data after robust scaling',
8989
RobustScaler(quantile_range=(25, 75)).fit_transform(X)),
90+
('Data after power transformation (Yeo-Johnson)',
91+
PowerTransformer(method='yeo-johnson').fit_transform(X)),
9092
('Data after power transformation (Box-Cox)',
9193
PowerTransformer(method='box-cox').fit_transform(X)),
9294
('Data after quantile transformation (gaussian pdf)',
@@ -294,21 +296,21 @@ def make_plot(item_idx):
294296
make_plot(4)
295297

296298
##############################################################################
297-
# PowerTransformer (Box-Cox)
298-
# --------------------------
299+
# PowerTransformer
300+
# ----------------
299301
#
300-
# ``PowerTransformer`` applies a power transformation to each
301-
# feature to make the data more Gaussian-like. Currently,
302-
# ``PowerTransformer`` implements the Box-Cox transform. The Box-Cox transform
303-
# finds the optimal scaling factor to stabilize variance and mimimize skewness
304-
# through maximum likelihood estimation. By default, ``PowerTransformer`` also
305-
# applies zero-mean, unit variance normalization to the transformed output.
306-
# Note that Box-Cox can only be applied to positive, non-zero data. Income and
307-
# number of households happen to be strictly positive, but if negative values
308-
# are present, a constant can be added to each feature to shift it into the
309-
# positive range - this is known as the two-parameter Box-Cox transform.
302+
# ``PowerTransformer`` applies a power transformation to each feature to make
303+
# the data more Gaussian-like. Currently, ``PowerTransformer`` implements the
304+
# Yeo-Johnson and Box-Cox transforms. The power transform finds the optimal
305+
# scaling factor to stabilize variance and mimimize skewness through maximum
306+
# likelihood estimation. By default, ``PowerTransformer`` also applies
307+
# zero-mean, unit variance normalization to the transformed output. Note that
308+
# Box-Cox can only be applied to strictly positive data. Income and number of
309+
# households happen to be strictly positive, but if negative values are present
310+
# the Yeo-Johnson transformed is to be preferred.
310311

311312
make_plot(5)
313+
make_plot(6)
312314

313315
##############################################################################
314316
# QuantileTransformer (Gaussian output)
@@ -319,7 +321,7 @@ def make_plot(item_idx):
319321
# Note that this non-parametetric transformer introduces saturation artifacts
320322
# for extreme values.
321323

322-
make_plot(6)
324+
make_plot(7)
323325

324326
###################################################################
325327
# QuantileTransformer (uniform output)
@@ -337,7 +339,7 @@ def make_plot(item_idx):
337339
# any outlier by setting them to the a priori defined range boundaries (0 and
338340
# 1).
339341

340-
make_plot(7)
342+
make_plot(8)
341343

342344
##############################################################################
343345
# Normalizer
@@ -350,6 +352,6 @@ def make_plot(item_idx):
350352
# transformed data only lie in the positive quadrant. This would not be the
351353
# case if some original features had a mix of positive and negative values.
352354

353-
make_plot(8)
355+
make_plot(9)
354356

355357
plt.show()
Lines changed: 137 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,137 @@
1+
"""
2+
=================================
3+
Map data to a normal distribution
4+
=================================
5+
6+
This example demonstrates the use of the Box-Cox and Yeo-Johnson transforms
7+
through :class:`preprocessing.PowerTransformer` to map data from various
8+
distributions to a normal distribution.
9+
10+
The power transform is useful as a transformation in modeling problems where
11+
homoscedasticity and normality are desired. Below are examples of Box-Cox and
12+
Yeo-Johnwon applied to six different probability distributions: Lognormal,
13+
Chi-squared, Weibull, Gaussian, Uniform, and Bimodal.
14+
15+
Note that the transformations successfully map the data to a normal
16+
distribution when applied to certain datasets, but are ineffective with others.
17+
This highlights the importance of visualizing the data before and after
18+
transformation.
19+
20+
Also note that even though Box-Cox seems to perform better than Yeo-Johnson for
21+
lognormal and chi-squared distributions, keep in mind that Box-Cox does not
22+
support inputs with negative values.
23+
24+
For comparison, we also add the output from
25+
:class:`preprocessing.QuantileTransformer`. It can force any arbitrary
26+
distribution into a gaussian, provided that there are enough training samples
27+
(thousands). Because it is a non-parametric method, it is harder to interpret
28+
than the parametric ones (Box-Cox and Yeo-Johnson).
29+
30+
On "small" datasets (less than a few hundred points), the quantile transformer
31+
is prone to overfitting. The use of the power transform is then recommended.
32+
"""
33+
34+
# Author: Eric Chang <[email protected]>
35+
# Nicolas Hug <[email protected]>
36+
# License: BSD 3 clause
37+
38+
import numpy as np
39+
import matplotlib.pyplot as plt
40+
41+
from sklearn.preprocessing import PowerTransformer
42+
from sklearn.preprocessing import QuantileTransformer
43+
from sklearn.model_selection import train_test_split
44+
45+
print(__doc__)
46+
47+
48+
N_SAMPLES = 1000
49+
FONT_SIZE = 6
50+
BINS = 30
51+
52+
53+
rng = np.random.RandomState(304)
54+
bc = PowerTransformer(method='box-cox')
55+
yj = PowerTransformer(method='yeo-johnson')
56+
qt = QuantileTransformer(output_distribution='normal', random_state=rng)
57+
size = (N_SAMPLES, 1)
58+
59+
60+
# lognormal distribution
61+
X_lognormal = rng.lognormal(size=size)
62+
63+
# chi-squared distribution
64+
df = 3
65+
X_chisq = rng.chisquare(df=df, size=size)
66+
67+
# weibull distribution
68+
a = 50
69+
X_weibull = rng.weibull(a=a, size=size)
70+
71+
# gaussian distribution
72+
loc = 100
73+
X_gaussian = rng.normal(loc=loc, size=size)
74+
75+
# uniform distribution
76+
X_uniform = rng.uniform(low=0, high=1, size=size)
77+
78+
# bimodal distribution
79+
loc_a, loc_b = 100, 105
80+
X_a, X_b = rng.normal(loc=loc_a, size=size), rng.normal(loc=loc_b, size=size)
81+
X_bimodal = np.concatenate([X_a, X_b], axis=0)
82+
83+
84+
# create plots
85+
distributions = [
86+
('Lognormal', X_lognormal),
87+
('Chi-squared', X_chisq),
88+
('Weibull', X_weibull),
89+
('Gaussian', X_gaussian),
90+
('Uniform', X_uniform),
91+
('Bimodal', X_bimodal)
92+
]
93+
94+
colors = ['firebrick', 'darkorange', 'goldenrod',
95+
'seagreen', 'royalblue', 'darkorchid']
96+
97+
fig, axes = plt.subplots(nrows=8, ncols=3, figsize=plt.figaspect(2))
98+
axes = axes.flatten()
99+
axes_idxs = [(0, 3, 6, 9), (1, 4, 7, 10), (2, 5, 8, 11), (12, 15, 18, 21),
100+
(13, 16, 19, 22), (14, 17, 20, 23)]
101+
axes_list = [(axes[i], axes[j], axes[k], axes[l])
102+
for (i, j, k, l) in axes_idxs]
103+
104+
105+
for distribution, color, axes in zip(distributions, colors, axes_list):
106+
name, X = distribution
107+
X_train, X_test = train_test_split(X, test_size=.5)
108+
109+
# perform power transforms and quantile transform
110+
X_trans_bc = bc.fit(X_train).transform(X_test)
111+
lmbda_bc = round(bc.lambdas_[0], 2)
112+
X_trans_yj = yj.fit(X_train).transform(X_test)
113+
lmbda_yj = round(yj.lambdas_[0], 2)
114+
X_trans_qt = qt.fit(X_train).transform(X_test)
115+
116+
ax_original, ax_bc, ax_yj, ax_qt = axes
117+
118+
ax_original.hist(X_train, color=color, bins=BINS)
119+
ax_original.set_title(name, fontsize=FONT_SIZE)
120+
ax_original.tick_params(axis='both', which='major', labelsize=FONT_SIZE)
121+
122+
for ax, X_trans, meth_name, lmbda in zip(
123+
(ax_bc, ax_yj, ax_qt),
124+
(X_trans_bc, X_trans_yj, X_trans_qt),
125+
('Box-Cox', 'Yeo-Johnson', 'Quantile transform'),
126+
(lmbda_bc, lmbda_yj, None)):
127+
ax.hist(X_trans, color=color, bins=BINS)
128+
title = 'After {}'.format(meth_name)
129+
if lmbda is not None:
130+
title += '\n$\lambda$ = {}'.format(lmbda)
131+
ax.set_title(title, fontsize=FONT_SIZE)
132+
ax.tick_params(axis='both', which='major', labelsize=FONT_SIZE)
133+
ax.set_xlim([-3.5, 3.5])
134+
135+
136+
plt.tight_layout()
137+
plt.show()

0 commit comments

Comments
 (0)