-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
RFC Unify old GradientBoosting estimators and HGBT #27873
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This seems like a somewhat painful path (deprecation-wise), but I think we can greatly benefit from it. So I'm overall +1. |
About first and second order optimizations, I think catboost has an interesting take on it, where you can select an optimizer order by setting Tthis will be a huge refactoring and documentation rewriting. From the user standpoint, it is really frustrating when a lot of parameter interactions are not documented: you set one parameter and it will actually change available options of other parameters and you will never know this until you try (catboost is actually a great example of such behaviour). Also about HGBM: I think there are some features missing that make it a bit less appealing than XGBoost or LightGBM, for example subsampling (of samples and features). |
I support the idea of making I can see that by switching (massively?) the algorithm that you get based on a constructor argument you can create confusion. However, I wonder how many people who use gradient boosting actually care about the precise algorithm. I'd include people who can't explain the differences between the two in the category of "don't care" (this includes me). Having both versions available under a dedicated name, say, |
We switch the algorithm with a constructor parameter in many models, right (for good or bad)? @glevv the subsampling of features and samples could be achieved with a |
An idea I had relating to this issue: what do people think of a small tool that transforms user's code? It could either be an automatic tool that makes the required edits or maybe more like a linter that points you to things that you should investigate. Taking it one step further, a linter for your scikit-learn usage would be cool. It could point out transformations you could make, anti-patterns, etc |
If we do such a tool, I think it'd be nice to have it a collaboration between a bunch of other projects in the pydata/scipy space. |
Someone just pointed me to https://numpy.org/devdocs/numpy_2_0_migration_guide.html#ruff-plugin There is nothing new under the sun :D |
Uh oh!
There was an error while loading. Please reload this page.
Current situation
We have the unfortunate situation to have 2 different versions of gradient boosting, the old estimators (
GradientBoostingClassifier
andGradientBoostingRegressor
) as well as the new ones using binning and histogram strategies similar to LightGBM (HistGradientBoostingClassifier
andHistGradientBoostingRegressor
).This makes advertising the new ones harder, e.g. #26826, and also result in a larger feature gap between those two.
Based on discussions in #27139 and during a monthly meeting (maybe not documented), I'd like to call for comments on the following:
Proposition
Unify both types of gradient boosting in a single class, i.e. the old names
GradientBoostingClassifier
and make them switch the underlying estimator class based on a parameter value, e.g.max_bins
(None
->old classes, integer->new classes).Note that binning and histograms are not the only difference.
Comparison
Algorithm
The old GBT uses Friedman gradient boosting with a line search step. (The lines search sometimes, e.g. for log loss, uses a 2. order approximation and is therefore, sometimes, called "hybrid gradient-Newton boosting"). The trees are learned on the gradients. A tree searches for the best split among all (veeeery many) split candidates for all features. After a single tree is fit, the terminal node values are re-computed which corresponds to a line search step.
The new HGBT uses a 2. order approximation of the loss, i.e. gradients and hessians (XGBoost paper, therefore sometimes called Newton boosting). In addition, it bins/discretizes the features
X
and uses a histogram of gradients/hessians/counts per feature. A tree then searches for the best split candidate, but there are only n_features * n_bins candidates (muuuuch less than in GBT).X
sum(gradient)/sum(hessian)
In fact, one could use 2. order loss (gradients and hessians) without binning
X
, and vice-versa, use binning with fitting trees on gradients (without hessians).Parameters
HistGradientBoostingRegressor
GradientBoostingRegressor
In fact, only the quantile/alpha and max_iter/n_estimator parameters are conflicting.
The text was updated successfully, but these errors were encountered: