scikit-learn · ogrisel · Jul 20, 2018 · Jul 14, 2018 · Jul 14, 2018 · Jul 15, 2018
diff --git a/doc/glossary.rst b/doc/glossary.rst
@@ -294,7 +294,7 @@ General Concepts
         convergence of the training loss, to avoid over-fitting. This is
         generally done by monitoring the generalization score on a validation
         set. When available, it is activated through the parameter
-        ``early_stopping`` or by setting a postive :term:`n_iter_no_change`.
+        ``early_stopping`` or by setting a positive :term:`n_iter_no_change`.
 
     estimator instance
         We sometimes use this terminology to distinguish an :term:`estimator`

diff --git a/doc/modules/preprocessing.rst b/doc/modules/preprocessing.rst
@@ -309,20 +309,34 @@ Power transforms are a family of parametric, monotonic transformations that aim
 to map data from any distribution to as close to a Gaussian distribution as
 possible in order to stabilize variance and minimize skewness.
 
-:class:`PowerTransformer` currently provides one such power transformation,
-the Box-Cox transform. The Box-Cox transform is given by:
+:class:`PowerTransformer` currently provides two such power transformations,
+the Yeo-Johnson transform and the Box-Cox transform.
+
+The Yeo-Johnson transform is given by:
 
 .. math::
-    y_i^{(\lambda)} =
+    x_i^{(\lambda)} =
     \begin{cases}
-    \dfrac{y_i^\lambda - 1}{\lambda} & \text{if } \lambda \neq 0, \\[8pt]
-    \ln{(y_i)} & \text{if } \lambda = 0,
+     [(x_i + 1)^\lambda - 1] / \lambda & \text{if } \lambda \neq 0, x_i \geq 0, \\[8pt]
+    \ln{(x_i) + 1} & \text{if } \lambda = 0, x_i \geq 0 \\[8pt]
+    -[(-x_i + 1)^{2 - \lambda} - 1] / (2 - \lambda) & \text{if } \lambda \neq 2, x_i < 0, \\[8pt]
+     - \ln (- x_i + 1) & \text{if } \lambda = 2, x_i < 0
     \end{cases}
 
-Box-Cox can only be applied to strictly positive data. The transformation is
-parameterized by :math:`\lambda`, which is determined through maximum likelihood
-estimation. Here is an example of using Box-Cox to map samples drawn from a
-lognormal distribution to a normal distribution::
+while the Box-Cox transform is given by:
+
+.. math::
+    x_i^{(\lambda)} =
+    \begin{cases}
+    \dfrac{x_i^\lambda - 1}{\lambda} & \text{if } \lambda \neq 0, \\[8pt]
+    \ln{(x_i)} & \text{if } \lambda = 0,
+    \end{cases}
+
+
+Box-Cox can only be applied to strictly positive data. In both methods, the
+transformation is parameterized by :math:`\lambda`, which is determined through
+maximum likelihood estimation. Here is an example of using Box-Cox to map
+samples drawn from a lognormal distribution to a normal distribution::
 
   >>> pt = preprocessing.PowerTransformer(method='box-cox', standardize=False)
   >>> X_lognormal = np.random.RandomState(616).lognormal(size=(3, 3))
@@ -339,13 +353,14 @@ While the above example sets the `standardize` option to `False`,
 :class:`PowerTransformer` will apply zero-mean, unit-variance normalization
 to the transformed output by default.
 
-Below are examples of Box-Cox applied to various probability distributions.
-Note that when applied to certain distributions, Box-Cox achieves very
-Gaussian-like results, but with others, it is ineffective. This highlights
-the importance of visualizing the data before and after transformation.
+Below are examples of Box-Cox and Yeo-Johnson applied to various probability
+distributions.  Note that when applied to certain distributions, the power
+transforms achieve very Gaussian-like results, but with others, they are
+ineffective. This highlights the importance of visualizing the data before and
+after transformation.
 
-.. figure:: ../auto_examples/preprocessing/images/sphx_glr_plot_power_transformer_001.png
-   :target: ../auto_examples/preprocessing/plot_power_transformer.html
+.. figure:: ../auto_examples/preprocessing/images/sphx_glr_plot_map_data_to_normal_001.png
+   :target: ../auto_examples/preprocessing/plot_map_data_to_normal.html
    :align: center
    :scale: 100
 

diff --git a/doc/whats_new/v0.20.rst b/doc/whats_new/v0.20.rst
@@ -136,12 +136,15 @@ Preprocessing
   DataFrames. :issue:`9012` by `Andreas Müller`_ and `Joris Van den Bossche`_,
   and :issue:`11315` by :user:`Thomas Fan <thomasjpfan>`.
 
-- Added :class:`preprocessing.PowerTransformer`, which implements the Box-Cox
-  power transformation, allowing users to map data from any distribution to a
-  Gaussian distribution. This is useful as a variance-stabilizing transformation
-  in situations where normality and homoscedasticity are desirable.
+- Added :class:`preprocessing.PowerTransformer`, which implements the
+  Yeo-Johnson and Box-Cox power transformations. Power transformations try to
+  find a set of feature-wise parametric transformations to approximately map
+  data to a Gaussian distribution centered at zero and with unit variance.
+  This is useful as a variance-stabilizing transformation in situations where
+  normality and homoscedasticity are desirable.
   :issue:`10210` by :user:`Eric Chang <ericchang00>` and
-  :user:`Maniteja Nandana <maniteja123>`.
+  :user:`Maniteja Nandana <maniteja123>`, and :issue:`11520` by :user:`Nicolas
+  Hug <nicolashug>`.
 
 - Added the :class:`compose.TransformedTargetRegressor` which transforms
   the target y before fitting a regression model. The predictions are mapped

diff --git a/examples/preprocessing/plot_all_scaling.py b/examples/preprocessing/plot_all_scaling.py
@@ -87,6 +87,8 @@
         MaxAbsScaler().fit_transform(X)),
     ('Data after robust scaling',
         RobustScaler(quantile_range=(25, 75)).fit_transform(X)),
+    ('Data after power transformation (Yeo-Johnson)',
+     PowerTransformer(method='yeo-johnson').fit_transform(X)),
     ('Data after power transformation (Box-Cox)',
      PowerTransformer(method='box-cox').fit_transform(X)),
     ('Data after quantile transformation (gaussian pdf)',
@@ -294,21 +296,21 @@ def make_plot(item_idx):
 make_plot(4)
 
 ##############################################################################
-# PowerTransformer (Box-Cox)
-# --------------------------
+# PowerTransformer
+# ----------------
 #
-# ``PowerTransformer`` applies a power transformation to each
-# feature to make the data more Gaussian-like. Currently,
-# ``PowerTransformer`` implements the Box-Cox transform. The Box-Cox transform
-# finds the optimal scaling factor to stabilize variance and mimimize skewness
-# through maximum likelihood estimation. By default, ``PowerTransformer`` also
-# applies zero-mean, unit variance normalization to the transformed output.
-# Note that Box-Cox can only be applied to positive, non-zero data. Income and
-# number of households happen to be strictly positive, but if negative values
-# are present, a constant can be added to each feature to shift it into the
-# positive range - this is known as the two-parameter Box-Cox transform.
+# ``PowerTransformer`` applies a power transformation to each feature to make
+# the data more Gaussian-like. Currently, ``PowerTransformer`` implements the
+# Yeo-Johnson and Box-Cox transforms. The power transform finds the optimal
+# scaling factor to stabilize variance and mimimize skewness through maximum
+# likelihood estimation. By default, ``PowerTransformer`` also applies
+# zero-mean, unit variance normalization to the transformed output. Note that
+# Box-Cox can only be applied to strictly positive data. Income and number of
+# households happen to be strictly positive, but if negative values are present
+# the Yeo-Johnson transformed is to be preferred.
 
 make_plot(5)
+make_plot(6)
 
 ##############################################################################
 # QuantileTransformer (Gaussian output)
@@ -319,7 +321,7 @@ def make_plot(item_idx):
 # Note that this non-parametetric transformer introduces saturation artifacts
 # for extreme values.
 
-make_plot(6)
+make_plot(7)
 
 ###################################################################
 # QuantileTransformer (uniform output)
@@ -337,7 +339,7 @@ def make_plot(item_idx):
 # any outlier by setting them to the a priori defined range boundaries (0 and
 # 1).
 
-make_plot(7)
+make_plot(8)
 
 ##############################################################################
 # Normalizer
@@ -350,6 +352,6 @@ def make_plot(item_idx):
 # transformed data only lie in the positive quadrant. This would not be the
 # case if some original features had a mix of positive and negative values.
 
-make_plot(8)
+make_plot(9)
 
 plt.show()
diff --git a/examples/preprocessing/plot_map_data_to_normal.py b/examples/preprocessing/plot_map_data_to_normal.py
@@ -0,0 +1,137 @@
+"""
+=================================
+Map data to a normal distribution
+=================================
+
+This example demonstrates the use of the Box-Cox and Yeo-Johnson transforms
+through :class:`preprocessing.PowerTransformer` to map data from various
+distributions to a normal distribution.
+
+The power transform is useful as a transformation in modeling problems where
+homoscedasticity and normality are desired. Below are examples of Box-Cox and
+Yeo-Johnwon applied to six different probability distributions: Lognormal,
+Chi-squared, Weibull, Gaussian, Uniform, and Bimodal.
+
+Note that the transformations successfully map the data to a normal
+distribution when applied to certain datasets, but are ineffective with others.
+This highlights the importance of visualizing the data before and after
+transformation.
+
+Also note that even though Box-Cox seems to perform better than Yeo-Johnson for
+lognormal and chi-squared distributions, keep in mind that Box-Cox does not
+support inputs with negative values.
+
+For comparison, we also add the output from
+:class:`preprocessing.QuantileTransformer`. It can force any arbitrary
+distribution into a gaussian, provided that there are enough training samples
+(thousands). Because it is a non-parametric method, it is harder to interpret
+than the parametric ones (Box-Cox and Yeo-Johnson).
+
+On "small" datasets (less than a few hundred points), the quantile transformer
+is prone to overfitting. The use of the power transform is then recommended.
+"""
+
+# Author: Eric Chang <[email protected]>
+#         Nicolas Hug <[email protected]>
+# License: BSD 3 clause
+
+import numpy as np
+import matplotlib.pyplot as plt
+
+from sklearn.preprocessing import PowerTransformer
+from sklearn.preprocessing import QuantileTransformer
+from sklearn.model_selection import train_test_split
+
+print(__doc__)
+
+
+N_SAMPLES = 1000
+FONT_SIZE = 6
+BINS = 30
+
+
+rng = np.random.RandomState(304)
+bc = PowerTransformer(method='box-cox')
+yj = PowerTransformer(method='yeo-johnson')
+qt = QuantileTransformer(output_distribution='normal', random_state=rng)
+size = (N_SAMPLES, 1)
+
+
+# lognormal distribution
+X_lognormal = rng.lognormal(size=size)
+
+# chi-squared distribution
+df = 3
+X_chisq = rng.chisquare(df=df, size=size)
+
+# weibull distribution
+a = 50
+X_weibull = rng.weibull(a=a, size=size)
+
+# gaussian distribution
+loc = 100
+X_gaussian = rng.normal(loc=loc, size=size)
+
+# uniform distribution
+X_uniform = rng.uniform(low=0, high=1, size=size)
+
+# bimodal distribution
+loc_a, loc_b = 100, 105
+X_a, X_b = rng.normal(loc=loc_a, size=size), rng.normal(loc=loc_b, size=size)
+X_bimodal = np.concatenate([X_a, X_b], axis=0)
+
+
+# create plots
+distributions = [
+    ('Lognormal', X_lognormal),
+    ('Chi-squared', X_chisq),
+    ('Weibull', X_weibull),
+    ('Gaussian', X_gaussian),
+    ('Uniform', X_uniform),
+    ('Bimodal', X_bimodal)
+]
+
+colors = ['firebrick', 'darkorange', 'goldenrod',
+          'seagreen', 'royalblue', 'darkorchid']
+
+fig, axes = plt.subplots(nrows=8, ncols=3, figsize=plt.figaspect(2))
+axes = axes.flatten()
+axes_idxs = [(0, 3, 6, 9), (1, 4, 7, 10), (2, 5, 8, 11), (12, 15, 18, 21),
+             (13, 16, 19, 22), (14, 17, 20, 23)]
+axes_list = [(axes[i], axes[j], axes[k], axes[l])
+             for (i, j, k, l) in axes_idxs]
+
+
+for distribution, color, axes in zip(distributions, colors, axes_list):
+    name, X = distribution
+    X_train, X_test = train_test_split(X, test_size=.5)
+
+    # perform power transforms and quantile transform
+    X_trans_bc = bc.fit(X_train).transform(X_test)
+    lmbda_bc = round(bc.lambdas_[0], 2)
+    X_trans_yj = yj.fit(X_train).transform(X_test)
+    lmbda_yj = round(yj.lambdas_[0], 2)
+    X_trans_qt = qt.fit(X_train).transform(X_test)
+
+    ax_original, ax_bc, ax_yj, ax_qt = axes
+
+    ax_original.hist(X_train, color=color, bins=BINS)
+    ax_original.set_title(name, fontsize=FONT_SIZE)
+    ax_original.tick_params(axis='both', which='major', labelsize=FONT_SIZE)
+
+    for ax, X_trans, meth_name, lmbda in zip(
+            (ax_bc, ax_yj, ax_qt),
+            (X_trans_bc, X_trans_yj, X_trans_qt),
+            ('Box-Cox', 'Yeo-Johnson', 'Quantile transform'),
+            (lmbda_bc, lmbda_yj, None)):
+        ax.hist(X_trans, color=color, bins=BINS)
+        title = 'After {}'.format(meth_name)
+        if lmbda is not None:
+            title += '\n$\lambda$ = {}'.format(lmbda)
+        ax.set_title(title, fontsize=FONT_SIZE)
+        ax.tick_params(axis='both', which='major', labelsize=FONT_SIZE)
+        ax.set_xlim([-3.5, 3.5])
+
+
+plt.tight_layout()
+plt.show()