Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@cakedev0
Copy link
Contributor

@cakedev0 cakedev0 commented Nov 13, 2025

Reference Issues/PRs

Towards #32700 (deprecation before complete removal).

Fixes:

What does this implement/fix? Explain your changes.

  • Remove class FriedmanMSE(MSE) in sklearn/tree/_criterion.pyx
  • Deprecate "friedman_mse" for trees & forests (if criterion="friedman_mse": criterion="squared_error" + deprecation warning)
  • Deprecate criterion param for gradient boosting
  • Adapt the doc/docstrings/tests/... accordingly

@github-actions
Copy link

github-actions bot commented Nov 13, 2025

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: bd8187a. Link to the linter CI: here

Comment on lines 1128 to 1129
Training using "absolute_error" is significantly slower
than when using "squared_error".
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated to this PR, but now that MAE criterion is at much 10x slower than the MSE, and usually more like 5x slower. Is it really significantly slower? It will still fit fairly fast for most tabular datasets (less than let's say 10M points).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am inclined to remove this sentence. The reason for choosing either squared error or absolute error should not be fit time, but use case / statistically driven.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to 😄

@cakedev0 cakedev0 marked this pull request as ready for review November 14, 2025 16:59
@cakedev0 cakedev0 changed the title MNT: start removing FriedmanMSE class MNT: trees/forests/GBT: deprecate "friedman_mse" criterion Nov 14, 2025
Comment on lines 1554 to 1555
@pytest.mark.skip("Skip for now")
def test_huber_exact_backward_compat():
Copy link
Contributor Author

@cakedev0 cakedev0 Nov 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I propose to delete this test, as the changes proposed by this PR are not exactly backward compatible, given that the criterion calculations now use a different but equivalent formula.

Another option would be to update it with the current values, but I feel that a such test prevents legitimate changes/improvements that slightly affect any calculation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't skip. What's the difference after the change in this test? What should be the tolerance for it to pass?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What should be the tolerance for it to pass?

rtol=0.1, which doesn't make much sense for a test named test_huber_exact_backward_compat

In this test, the model 100% overfits, so the values checked in the asserts are mostly 0 + some float precision noise I think 😅 , at least the one for which rtol=0.1 would be needed.

So I propose to transform this test into test_huber_overfit, see my commit 4009fd3

Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise LGTM

Comment on lines 1554 to 1555
@pytest.mark.skip("Skip for now")
def test_huber_exact_backward_compat():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't skip. What's the difference after the change in this test? What should be the tolerance for it to pass?

Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Maybe @lorentzenchr would like to have a look.

@lorentzenchr
Copy link
Member

lorentzenchr commented Nov 19, 2025

-1 for me at the moment. We need to clarify the issue first, see my comment there #32700 (comment).

Edit: Discussion resolved.

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since, the concerns raised by #32700 (comment) have been addressed in the follow-up discussion, I think we can finalize the review and merge of this PR.

Besides the following points, it looks good to me.

... )
>>> cross_val_score(estimator, X, y, cv=5, scoring=mean_pinball_loss_95p)
array([13.6, 9.7, 23.3, 9.5, 10.4])
array([14.3, 9.8, 23.9, 9.4, 10.8])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change cause by rounding error discrepancies when switching from criterion="friedman_mse" to criterion="squared_error"? Or is this a consequence of #32707?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's caused by rounding error discrepancies. #32707 only affects the scale of the impurity I think, see the fix I made temporarily in PR #32699, and here the scale of impurity shouldn't impact the results (trees are controlled by max_depth only).

But the calculation of impurity (criterion.impurity_improvement) being implemented differently, the rounding errors differ, and when you have several features with very similar (or the same) splits, the selected split might differ.

This happens for all the losses, except for loss="absolute_error": I think the gradient being -1 or 1 for this loss, there are no rounding errors (integers everywhere, and integers small enough to fit in the mantissa of a float64).

Comment on lines 3 to 4
and :class:`ensemble.GradientBoostingRegressor`,
as it had no actual effect.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should rather document this a fix and rephrase this entry accordingly, since this PR changes the default behavior of the estimators by changing the underlying implementation from the buggy "friedman_mse" implementation to the correct "squared_error" implementation (#32707).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hum... I wrote both a .fix.rst and a .api.rst, each one focuses on a different aspect. Let me know what you think.

Comment on lines -1554 to -1559
@skip_if_32bit
def test_huber_exact_backward_compat():
"""Test huber GBT backward compat on a simple dataset.
The results to compare against are taken from scikit-learn v1.2.0.
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please undo the changes of this test. I should still pass. If not, this PR introduces a regression.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test doesn't pass after the changes in this PR because of round errors. As mentioned above, in this test (test_huber_exact_backward_compat) the model 100% overfits, so the values checked in the asserts are mostly 0/<some_int> + some float precision noise. Checking the exact values means checking the exact float precision noise, preventing almost any change, even meaningful and valid, from passing it.

If not, this PR introduces a regression

No. This test was just not a meaningful test (as I already found and fixed many since I started working on scikit-learn...).

Copy link
Contributor Author

@cakedev0 cakedev0 Dec 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typically, this test fails when you change:

median = _weighted_percentile(y_true, sample_weight, 50) to median = _weighted_percentile(y_true, sample_weight, 50, average=True) in HuberLoss.fit_intercept_only. Even though I think we consider that average=True is a better option for computing the weighted median. At least that's what I assumed when I wrote the PR #32100 and was, in a sense, confirmed by @ogrisel who asked me to test my logic against _weighted_percentile(..., average=True).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that we will need to open another PR for to use average=True in. HuberLoss.fit_intercept_only and check that optimizing the HuberLoss on a dataset with symmetrically distributed target data and constant features returns the same as np.median, whether the sum of integer sample weights is even or odd and remove this arbitrary bias.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be happy to do that 😄 We have the same thing for the losses AbsoluteError and PinballLoss.

export_graphviz(clf, out, class_names=[])


def test_friedman_mse_in_graphviz():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this test be modified instead of removed? Or is it tested elsewhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test was mostly testing that the string "friedman_mse" was present in appropriate places in the output. We have some much more complete tests in this file (for the squared error criterion). So I think it's fine removing it.

Alternatively, I can keep this test and just ignore the deprecation warnings. And we'll remove it when we remove "friedman_mse" completely.

Copy link
Member

@ogrisel ogrisel Dec 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rename the test to test_criterion_in_gradient_boosting_graphviz and check that the name of the new criterion ("squared_error") is present in the nodes.

Copy link
Contributor Author

@cakedev0 cakedev0 Dec 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would be redundant with test_graphviz_toy where we compare with exact outputs (including examples including "squared_error".

Edit: ah no, this test fits GradientBoostingClassifier, ok let's do what you suggest.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though TBH, I don't think this test makes a lot of sense: why testing the export of GB.estimators_[0] and not RF.estimators_[0], ET.estimators_[0], etc.?

I would rather suggest a test (parametrized by the criterion) that fits a decision tree, and tests the presence of the criterion name in the exported string. So we test all criteria. And we avoid importing from sklearn.ensemble in a test from sklearn.tree which looks like a bad pattern in most cases.

Co-authored-by: Christian Lorentzen <[email protected]>
@github-actions github-actions bot added the CI:Linter failure The linter CI is failing on this PR label Dec 28, 2025
@github-actions github-actions bot removed the CI:Linter failure The linter CI is failing on this PR label Dec 28, 2025
@ogrisel
Copy link
Member

ogrisel commented Dec 29, 2025

The HTTP 403 on fetch_olivetti_faces on circle ci is another occurrence of IP deny-listing by figshare.com we experience in the past. Its resolution will be tracked on #32961.

@ogrisel
Copy link
Member

ogrisel commented Dec 29, 2025

I pushed a few cosmetic fixes in examples/ensemble/plot_gradient_boosting_quantile.py to trigger the CI on them as part of this PR, hopefully make it easier to check that the change of rounding errors induced by the swap to the "squared_error" criterion in this PR does not meaningfully impact its message.

However, I think that the rendering of that example by the CI will not be possible as long as #32961 is not resolved. So here is the output of a local run tun on my machine:

gb_interval_1 image image gb_interval_2

to be compared with the plots and tables in https://scikit-learn.org/dev/auto_examples/ensemble/plot_gradient_boosting_quantile.html as currently rendered on main.

While we do observe some discrepancies at the third digits for some of the loss values, I am willing to believe that they are caused by the same cause as explained in #32708 (comment) and the qualitative message of the example remains unchanged.

@ogrisel
Copy link
Member

ogrisel commented Dec 29, 2025

Similar observations and conclusion for:

  • examples/ensemble/plot_gradient_boosting_oob.py
image

https://scikit-learn.org/stable/auto_examples/ensemble/plot_gradient_boosting_oob.html

  • examples/ensemble/plot_gradient_boosting_regression.py
image image

https://scikit-learn.org/stable/auto_examples/ensemble/plot_gradient_boosting_regression.html

  • examples/ensemble/plot_gradient_boosting_regularization.py
image

https://scikit-learn.org/stable/auto_examples/ensemble/plot_gradient_boosting_regularization.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

4 participants