scikit-learn · nmayorov · Oct 23, 2013 · Dec 12, 2015 · Dec 12, 2015 · Dec 14, 2015
diff --git a/doc/modules/classes.rst b/doc/modules/classes.rst
@@ -534,6 +534,8 @@ From text
    feature_selection.chi2
    feature_selection.f_classif
    feature_selection.f_regression
+   feature_selection.mutual_info_classif
+   feature_selection.mutual_info_regression
 
 
 .. _gaussian_process_ref:

diff --git a/doc/modules/feature_selection.rst b/doc/modules/feature_selection.rst
@@ -67,8 +67,8 @@ as objects that implement the ``transform`` method:
    :class:`SelectFdr`, or family wise error :class:`SelectFwe`.
 
  * :class:`GenericUnivariateSelect` allows to perform univariate feature
-    selection with a configurable strategy. This allows to select the best
-    univariate selection strategy with hyper-parameter search estimator.
+   selection with a configurable strategy. This allows to select the best
+   univariate selection strategy with hyper-parameter search estimator.
 
 For instance, we can perform a :math:`\chi^2` test to the samples
 to retrieve only the two best features as follows:
@@ -84,17 +84,24 @@ to retrieve only the two best features as follows:
   >>> X_new.shape
   (150, 2)
 
-These objects take as input a scoring function that returns
-univariate p-values:
+These objects take as input a scoring function that returns univariate scores
+and p-values (or only scores for :class:`SelectKBest` and
+:class:`SelectPercentile`):
 
- * For regression: :func:`f_regression`
+ * For regression: :func:`f_regression`, :func:`mutual_info_regression`
 
- * For classification: :func:`chi2` or :func:`f_classif`
+ * For classification: :func:`chi2`, :func:`f_classif`, :func:`mutual_info_classif`
+
+The methods based on F-test estimate the degree of linear dependency between
+two random variables. On the other hand, mutual information methods can capture
+any kind of statistical dependency, but being nonparametric, they require more
+samples for accurate estimation.
 
 .. topic:: Feature selection with sparse data
 
    If you use sparse data (i.e. data represented as sparse matrices),
-   only :func:`chi2` will deal with the data without making it dense.
+   :func:`chi2`, :func:`mutual_info_regression`, :func:`mutual_info_classif`
+   will deal with the data without making it dense.
 
 .. warning::
 
@@ -103,7 +110,9 @@ univariate p-values:
 
 .. topic:: Examples:
 
-    :ref:`example_feature_selection_plot_feature_selection.py`
+    * :ref:`example_feature_selection_plot_feature_selection.py`
+
+    * :ref:`example_feature_selection_plot_f_test_vs_mi.py`
 
 .. _rfe:
 
@@ -315,4 +324,4 @@ Then, a :class:`sklearn.ensemble.RandomForestClassifier` is trained on the
 transformed output, i.e. using only relevant features. You can perform
 similar operations with the other feature selection methods and also
 classifiers that provide a way to evaluate feature importances of course.
-See the :class:`sklearn.pipeline.Pipeline` examples for more details.
+See the :class:`sklearn.pipeline.Pipeline` examples for more details.
diff --git a/doc/whats_new.rst b/doc/whats_new.rst
@@ -15,6 +15,13 @@ Changelog
 New features
 ............
 
+   - Added two functions for mutual information estimation:
+     :func:`feature_selection.mutual_info_classif` and 
+     :func:`feature_selection.mutual_info_regression`. These functions can be
+     used in :class:`feature_selection.SelectKBest` and
+     :class:`feature_selection.SelectPercentile`, which now accept callable
+     returning only `scores`. By `Andrea Bravi`_ and `Nikolay Mayorov`_.
+
    - The Gaussian Process module has been reimplemented and now offers classification
      and regression estimators through :class:`gaussian_process.GaussianProcessClassifier`
      and  :class:`gaussian_process.GaussianProcessRegressor`. Among other things, the new
@@ -4037,3 +4044,6 @@ David Huard, Dave Morrill, Ed Schofield, Travis Oliphant, Pearu Peterson.
 .. _Imaculate: https://github.com/Imaculate
 
 .. _Bernardo Stein: https://github.com/DanielSidhion
+
+.. _Andrea Bravi: https://github.com/AndreaBravi
+
diff --git a/examples/feature_selection/plot_f_test_vs_mi.py b/examples/feature_selection/plot_f_test_vs_mi.py
@@ -0,0 +1,49 @@
+"""
+===========================================
+Comparison of F-test and mutual information
+===========================================
+
+This example illustrates the differences between univariate F-test statistics
+and mutual information.
+
+We consider 3 features x_1, x_2, x_3 distributed uniformly over [0, 1], the
+target depends on them as follows:
+
+y = x_1 + sin(6 * pi * x_2) + 0.1 * N(0, 1), that is the third features is completely irrelevant.
+
+The code below plots the dependency of y against individual x_i and normalized
+values of univariate F-tests statistics and mutual information.
+
+As F-test captures only linear dependency, it rates x_1 as the most
+discriminative feature. On the other hand, mutual information can capture any
+kind of dependency between variables and it rates x_2 as the most
+discriminative feature, which probably agrees better with our intuitive
+perception for this example. Both methods correctly marks x_3 as irrelevant.
+"""
+print(__doc__)
+
+import numpy as np
+import matplotlib.pyplot as plt
+from sklearn.feature_selection import f_regression, mutual_info_regression
+
+np.random.seed(0)
+X = np.random.rand(1000, 3)
+y = X[:, 0] + np.sin(6 * np.pi * X[:, 1]) + 0.1 * np.random.randn(1000)
+
+f_test, _ = f_regression(X, y)
+f_test /= np.max(f_test)
+
+mi = mutual_info_regression(X, y)
+mi /= np.max(mi)
+
+plt.figure(figsize=(15, 5))
+for i in range(3):
+    plt.subplot(1, 3, i + 1)
+    plt.scatter(X[:, i], y)
+    plt.xlabel("$x_{}$".format(i + 1), fontsize=14)
+    if i == 0:
+        plt.ylabel("$y$", fontsize=14)
+    plt.title("F-test={:.2f}, MI={:.2f}".format(f_test[i], mi[i]),
+              fontsize=16)
+plt.show()
+
diff --git a/examples/feature_selection/plot_rfe_digits.py b/examples/feature_selection/plot_rfe_digits.py
@@ -33,4 +33,4 @@
 plt.matshow(ranking, cmap=plt.cm.Blues)
 plt.colorbar()
 plt.title("Ranking of pixels with RFE")
-plt.show()
+plt.show()
diff --git a/sklearn/feature_selection/__init__.py b/sklearn/feature_selection/__init__.py
@@ -22,17 +22,22 @@
 
 from .from_model import SelectFromModel
 
+from .mutual_info_ import mutual_info_regression, mutual_info_classif
+
+
 __all__ = ['GenericUnivariateSelect',
            'RFE',
            'RFECV',
            'SelectFdr',
            'SelectFpr',
            'SelectFwe',
            'SelectKBest',
+           'SelectFromModel',
            'SelectPercentile',
            'VarianceThreshold',
            'chi2',
            'f_classif',
            'f_oneway',
            'f_regression',
-           'SelectFromModel']
+           'mutual_info_classif',
+           'mutual_info_regression']