FIX: deprecate integer valued numerical features for PDP #30409

ogrisel · 2024-12-04T17:39:37Z

This is a tentative fix for #30315 and #30378.

The problem is that new pandas versions refuse to assign floating-point values into integer dtyped columns. In PDP, such fractional floating-point values can be generated when creating the grid, even if the original numerical feature is integer valued.

In a first (and later abandoned) version of this PR, I tried to transparently change the dtype of X_eval (or at least the columns where we want to insert those values when X is a dataframe). However, this leads to very pandas and numpy specific code paths and this would need to be expanded for polars later. I have the feeling that this will become unmanageable quickly.

Furthermore, changing the dtype of (some columns of) X_eval means that we are calling the response method of the estimator with different dtypes than the X_train used to fit the estimator. I have the feeling that this can cause weird bugs.

So this PR tries to explicitly deprecate the support of integer values (used as numerical features) in PDPs and later raise a ValueError instead: the error message should be explicit enough to let the user know what to do to update their code. I thought this would be simple, but as you can see in this PR, this is a bit more complex than anticipated.

After creating this PR I am thinking of a final alternative: instead of changing the dtype in X_eval, we could round the fractional

This last alternative will be less intrusive (and in particular will match the implicit behavior of numpy and old pandas versions). But it can be surprising: we generate a fine grid but then it is implicitly coarsified when computing the PDP value. This should be not to complex to implement in a somewhat container agnostic way (using _safe_indexing / _safe_assign and .astype). However, this might be a bit magic. Not sure what is best.

Would love to get feedback from @glemaitre or @lesteve for instance.

github-actions · 2024-12-04T17:40:52Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: ca7fb2e. Link to the linter CI: here}

thomasjpfan

I'm okay with warning and erroring in the future here.

glemaitre

Thanks @ogrisel, I should provide my finding, this could have maybe helped.

I agree with the general idea of the PR. I was leaning towards the same solution and either consider integer dtype as a categorical feature or otherwise, use explicit floating point.

glemaitre · 2024-12-08T13:43:53Z

I just see that I merged this PR but it targets 1.6 indeed. We could always add it in 1.6.1 because this is kind of a fix + deprecation. @jeremiedbb do you think that we still have time to backport the commit?

jeremiedbb · 2024-12-08T13:48:33Z

I merged the release PR but I haven't pushed the tag yet so I guess it's doable.

…#30409)

FIX: deprecate integer valued numerical features for PDP

15c305a

github-actions bot added the module:inspection label Dec 4, 2024

ogrisel added 2 commits December 4, 2024 19:18

Fix warning raised in test_plot_partial_dependence_legend

b9097c9

Changelog entry

ca7fb2e

thomasjpfan approved these changes Dec 6, 2024

View reviewed changes

glemaitre approved these changes Dec 8, 2024

View reviewed changes

glemaitre merged commit 4a7f96e into scikit-learn:main Dec 8, 2024
30 checks passed

glemaitre mentioned this pull request Dec 8, 2024

MAINT postpone erroring in 1.9 when dealing with integer in PDP #30432

Merged

ogrisel deleted the fix-reject-integer-as-numeric-features-for-pdp branch December 9, 2024 09:11

lesteve mentioned this pull request Dec 9, 2024

⚠️ CI failed on Linux_Nightly.pylatest_pip_scipy_dev (last failure: Dec 05, 2024) ⚠️ #30315

Closed

stefanogaspari pushed a commit to stefanogaspari/scikit-learn that referenced this pull request Dec 9, 2024

FIX deprecate integer valued numerical features for PDP (scikit-learn…

681b3f5

…#30409)

virchan pushed a commit to virchan/scikit-learn that referenced this pull request Dec 9, 2024

FIX deprecate integer valued numerical features for PDP (scikit-learn…

f9144b8

…#30409)

ogrisel mentioned this pull request Mar 4, 2025

Partial dependence broken in sklearn 1.6.1 when grid has only two values #30938

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

FIX: deprecate integer valued numerical features for PDP #30409

FIX: deprecate integer valued numerical features for PDP #30409

Uh oh!

ogrisel commented Dec 4, 2024

Uh oh!

github-actions bot commented Dec 4, 2024 •

edited

Loading

Uh oh!

thomasjpfan left a comment

Uh oh!

glemaitre left a comment

Uh oh!

Uh oh!

glemaitre commented Dec 8, 2024

Uh oh!

jeremiedbb commented Dec 8, 2024

Uh oh!

Uh oh!

Uh oh!

FIX: deprecate integer valued numerical features for PDP #30409

FIX: deprecate integer valued numerical features for PDP #30409

Uh oh!

Conversation

ogrisel commented Dec 4, 2024

Uh oh!

github-actions bot commented Dec 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✔️ Linting Passed

Uh oh!

thomasjpfan left a comment

Choose a reason for hiding this comment

Uh oh!

glemaitre left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

glemaitre commented Dec 8, 2024

Uh oh!

jeremiedbb commented Dec 8, 2024

Uh oh!

Uh oh!

github-actions bot commented Dec 4, 2024 •

edited

Loading