-
-
Notifications
You must be signed in to change notification settings - Fork 26.6k
MNT: trees/forests/GBT: deprecate "friedman_mse" criterion
#32708
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
sklearn/tree/_classes.py
Outdated
| Training using "absolute_error" is significantly slower | ||
| than when using "squared_error". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated to this PR, but now that MAE criterion is at much 10x slower than the MSE, and usually more like 5x slower. Is it really significantly slower? It will still fit fairly fast for most tabular datasets (less than let's say 10M points).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am inclined to remove this sentence. The reason for choosing either squared error or absolute error should not be fit time, but use case / statistically driven.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's remove.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy to 😄
"friedman_mse" criterion
| @pytest.mark.skip("Skip for now") | ||
| def test_huber_exact_backward_compat(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I propose to delete this test, as the changes proposed by this PR are not exactly backward compatible, given that the criterion calculations now use a different but equivalent formula.
Another option would be to update it with the current values, but I feel that a such test prevents legitimate changes/improvements that slightly affect any calculation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't skip. What's the difference after the change in this test? What should be the tolerance for it to pass?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What should be the tolerance for it to pass?
rtol=0.1, which doesn't make much sense for a test named test_huber_exact_backward_compat
In this test, the model 100% overfits, so the values checked in the asserts are mostly 0 + some float precision noise I think 😅 , at least the one for which rtol=0.1 would be needed.
So I propose to transform this test into test_huber_overfit, see my commit 4009fd3
adrinjalali
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise LGTM
| @pytest.mark.skip("Skip for now") | ||
| def test_huber_exact_backward_compat(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't skip. What's the difference after the change in this test? What should be the tolerance for it to pass?
adrinjalali
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Maybe @lorentzenchr would like to have a look.
|
Edit: Discussion resolved. |
ogrisel
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since, the concerns raised by #32700 (comment) have been addressed in the follow-up discussion, I think we can finalize the review and merge of this PR.
Besides the following points, it looks good to me.
| ... ) | ||
| >>> cross_val_score(estimator, X, y, cv=5, scoring=mean_pinball_loss_95p) | ||
| array([13.6, 9.7, 23.3, 9.5, 10.4]) | ||
| array([14.3, 9.8, 23.9, 9.4, 10.8]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this change cause by rounding error discrepancies when switching from criterion="friedman_mse" to criterion="squared_error"? Or is this a consequence of #32707?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's caused by rounding error discrepancies. #32707 only affects the scale of the impurity I think, see the fix I made temporarily in PR #32699, and here the scale of impurity shouldn't impact the results (trees are controlled by max_depth only).
But the calculation of impurity (criterion.impurity_improvement) being implemented differently, the rounding errors differ, and when you have several features with very similar (or the same) splits, the selected split might differ.
This happens for all the losses, except for loss="absolute_error": I think the gradient being -1 or 1 for this loss, there are no rounding errors (integers everywhere, and integers small enough to fit in the mantissa of a float64).
| and :class:`ensemble.GradientBoostingRegressor`, | ||
| as it had no actual effect. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should rather document this a fix and rephrase this entry accordingly, since this PR changes the default behavior of the estimators by changing the underlying implementation from the buggy "friedman_mse" implementation to the correct "squared_error" implementation (#32707).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hum... I wrote both a .fix.rst and a .api.rst, each one focuses on a different aspect. Let me know what you think.
| @skip_if_32bit | ||
| def test_huber_exact_backward_compat(): | ||
| """Test huber GBT backward compat on a simple dataset. | ||
| The results to compare against are taken from scikit-learn v1.2.0. | ||
| """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please undo the changes of this test. I should still pass. If not, this PR introduces a regression.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test doesn't pass after the changes in this PR because of round errors. As mentioned above, in this test (test_huber_exact_backward_compat) the model 100% overfits, so the values checked in the asserts are mostly 0/<some_int> + some float precision noise. Checking the exact values means checking the exact float precision noise, preventing almost any change, even meaningful and valid, from passing it.
If not, this PR introduces a regression
No. This test was just not a meaningful test (as I already found and fixed many since I started working on scikit-learn...).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typically, this test fails when you change:
median = _weighted_percentile(y_true, sample_weight, 50) to median = _weighted_percentile(y_true, sample_weight, 50, average=True) in HuberLoss.fit_intercept_only. Even though I think we consider that average=True is a better option for computing the weighted median. At least that's what I assumed when I wrote the PR #32100 and was, in a sense, confirmed by @ogrisel who asked me to test my logic against _weighted_percentile(..., average=True).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that we will need to open another PR for to use average=True in. HuberLoss.fit_intercept_only and check that optimizing the HuberLoss on a dataset with symmetrically distributed target data and constant features returns the same as np.median, whether the sum of integer sample weights is even or odd and remove this arbitrary bias.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be happy to do that 😄 We have the same thing for the losses AbsoluteError and PinballLoss.
| export_graphviz(clf, out, class_names=[]) | ||
|
|
||
|
|
||
| def test_friedman_mse_in_graphviz(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this test be modified instead of removed? Or is it tested elsewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test was mostly testing that the string "friedman_mse" was present in appropriate places in the output. We have some much more complete tests in this file (for the squared error criterion). So I think it's fine removing it.
Alternatively, I can keep this test and just ignore the deprecation warnings. And we'll remove it when we remove "friedman_mse" completely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rename the test to test_criterion_in_gradient_boosting_graphviz and check that the name of the new criterion ("squared_error") is present in the nodes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would be redundant with test_graphviz_toy where we compare with exact outputs (including examples including "squared_error".
Edit: ah no, this test fits GradientBoostingClassifier, ok let's do what you suggest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though TBH, I don't think this test makes a lot of sense: why testing the export of GB.estimators_[0] and not RF.estimators_[0], ET.estimators_[0], etc.?
I would rather suggest a test (parametrized by the criterion) that fits a decision tree, and tests the presence of the criterion name in the exported string. So we test all criteria. And we avoid importing from sklearn.ensemble in a test from sklearn.tree which looks like a bad pattern in most cases.
Co-authored-by: Christian Lorentzen <[email protected]>
Co-authored-by: Christian Lorentzen <[email protected]>
|
The HTTP 403 on |
|
I pushed a few cosmetic fixes in However, I think that the rendering of that example by the CI will not be possible as long as #32961 is not resolved. So here is the output of a local run tun on my machine:
to be compared with the plots and tables in https://scikit-learn.org/dev/auto_examples/ensemble/plot_gradient_boosting_quantile.html as currently rendered on While we do observe some discrepancies at the third digits for some of the loss values, I am willing to believe that they are caused by the same cause as explained in #32708 (comment) and the qualitative message of the example remains unchanged. |
|
Similar observations and conclusion for:
https://scikit-learn.org/stable/auto_examples/ensemble/plot_gradient_boosting_oob.html
https://scikit-learn.org/stable/auto_examples/ensemble/plot_gradient_boosting_regression.html
https://scikit-learn.org/stable/auto_examples/ensemble/plot_gradient_boosting_regularization.html |
Updates to changelog Co-authored-by: Christian Lorentzen <[email protected]>








Reference Issues/PRs
Towards #32700 (deprecation before complete removal).
Fixes:
criterion="friedman_mse"is buggy for multi-output #32718What does this implement/fix? Explain your changes.
FriedmanMSE(MSE)insklearn/tree/_criterion.pyx"friedman_mse"for trees & forests (if criterion="friedman_mse": criterion="squared_error"+ deprecation warning)criterionparam for gradient boosting