Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit 86301ac

Browse files
glemaitrebetatimthomasjpfanjeremiedbb
authored
FIX always scale continuous features to unit variance in mutual info (#24747)
Co-authored-by: Tim Head <[email protected]> Co-authored-by: Thomas J. Fan <[email protected]> Co-authored-by: jeremie du boisberranger <[email protected]>
1 parent 40748d1 commit 86301ac

File tree

3 files changed

+38
-4
lines changed

3 files changed

+38
-4
lines changed

doc/whats_new/v1.2.rst

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -359,6 +359,12 @@ Changelog
359359
:mod:`sklearn.feature_selection`
360360
................................
361361

362+
- |Fix| Fix a bug in :func:`feature_selection.mutual_info_regression` and
363+
:func:`feature_selction.mutual_info_classif`, where the continuous features
364+
in `X` should be scaled to a unit variance independently if the target `y` is
365+
continuous or discrete.
366+
:pr:`24747` by :user:`Guillaume Lemaitre <glemaitre>`
367+
362368
:mod:`sklearn.gaussian_process`
363369
...............................
364370

sklearn/feature_selection/_mutual_info.py

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -280,10 +280,9 @@ def _estimate_mi(
280280
if copy:
281281
X = X.copy()
282282

283-
if not discrete_target:
284-
X[:, continuous_mask] = scale(
285-
X[:, continuous_mask], with_mean=False, copy=False
286-
)
283+
X[:, continuous_mask] = scale(
284+
X[:, continuous_mask], with_mean=False, copy=False
285+
)
287286

288287
# Add small noise to continuous features as advised in Kraskov et. al.
289288
X = X.astype(np.float64, copy=False)

sklearn/feature_selection/tests/test_mutual_info.py

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -207,3 +207,32 @@ def test_mutual_info_options(global_dtype):
207207
assert_allclose(mi_5, mi_6)
208208

209209
assert not np.allclose(mi_1, mi_3)
210+
211+
212+
@pytest.mark.parametrize("correlated", [True, False])
213+
def test_mutual_information_symmetry_classif_regression(correlated, global_random_seed):
214+
"""Check that `mutual_info_classif` and `mutual_info_regression` are
215+
symmetric by switching the target `y` as `feature` in `X` and vice
216+
versa.
217+
218+
Non-regression test for:
219+
https://github.com/scikit-learn/scikit-learn/issues/23720
220+
"""
221+
rng = np.random.RandomState(global_random_seed)
222+
n = 100
223+
d = rng.randint(10, size=n)
224+
225+
if correlated:
226+
c = d.astype(np.float64)
227+
else:
228+
c = rng.normal(0, 1, size=n)
229+
230+
mi_classif = mutual_info_classif(
231+
c[:, None], d, discrete_features=[False], random_state=global_random_seed
232+
)
233+
234+
mi_regression = mutual_info_regression(
235+
d[:, None], c, discrete_features=[True], random_state=global_random_seed
236+
)
237+
238+
assert mi_classif == pytest.approx(mi_regression)

0 commit comments

Comments
 (0)