RFC Unify old GradientBoosting estimators and HGBT #27873

lorentzenchr · 2023-11-29T17:53:48Z

Current situation

We have the unfortunate situation to have 2 different versions of gradient boosting, the old estimators (GradientBoostingClassifier and GradientBoostingRegressor) as well as the new ones using binning and histogram strategies similar to LightGBM (HistGradientBoostingClassifier and HistGradientBoostingRegressor).

This makes advertising the new ones harder, e.g. #26826, and also result in a larger feature gap between those two.
Based on discussions in #27139 and during a monthly meeting (maybe not documented), I'd like to call for comments on the following:

Proposition

Unify both types of gradient boosting in a single class, i.e. the old names GradientBoostingClassifier and make them switch the underlying estimator class based on a parameter value, e.g. max_bins (None->old classes, integer->new classes).

Note that binning and histograms are not the only difference.

Comparison

Algorithm

The old GBT uses Friedman gradient boosting with a line search step. (The lines search sometimes, e.g. for log loss, uses a 2. order approximation and is therefore, sometimes, called "hybrid gradient-Newton boosting"). The trees are learned on the gradients. A tree searches for the best split among all (veeeery many) split candidates for all features. After a single tree is fit, the terminal node values are re-computed which corresponds to a line search step.

The new HGBT uses a 2. order approximation of the loss, i.e. gradients and hessians (XGBoost paper, therefore sometimes called Newton boosting). In addition, it bins/discretizes the features X and uses a histogram of gradients/hessians/counts per feature. A tree then searches for the best split candidate, but there are only n_features * n_bins candidates (muuuuch less than in GBT).

estimator	trees train on	node values (consequence of tree train)	features `X`
GBT	gradients	recomputed in lines search	use as is
HGBT	gradients/hessians	`sum(gradient)/sum(hessian)`	bin/discretize

In fact, one could use 2. order loss (gradients and hessians) without binning X, and vice-versa, use binning with fitting trees on gradients (without hessians).

Parameters

`HistGradientBoostingRegressor`	`GradientBoostingRegressor`	Same	Comment
loss	loss	✅
quantile	alpha	❌
learning_rate	learning_rate	✅
max_iter	n_estimators	❌	#12807 (comment)
max_leaf_nodes	max_leaf_nodes	✅
max_depth	max_depth	✅
min_samples_leaf	min_samples_leaf	✅
l2_regularization	learning_rate	✅
max_features	max_features	✅
max_bins	⛔ (nonsense)	❌
categorical_features	⛔	❌
monotonic_cst	⛔	❌	#27305
interaction_cst	⛔	❌
warm_start	warm_start	✅
early_stopping	⛔	❌
scoring	⛔	❌
validation_fraction	validation_fraction	✅
n_iter_no_change	n_iter_no_change	✅
tol	tol	✅
verbose	verbose	✅
random_state	random_state	✅
class_weight	⛔	❌
⛔	subsample	❌	#16062
⛔ (nonsense)	criterion	❌
⛔	min_samples_split	❌
⛔	min_weight_fraction_leaf	❌
⛔	min_impurity_decrease	❌
⛔	init	❌	#27109
⛔	ccp_alpha	❌

In fact, only the quantile/alpha and max_iter/n_estimator parameters are conflicting.

The text was updated successfully, but these errors were encountered:

adrinjalali · 2023-12-05T15:52:48Z

This seems like a somewhat painful path (deprecation-wise), but I think we can greatly benefit from it. So I'm overall +1.

glevv · 2023-12-06T19:52:47Z

About first and second order optimizations, I think catboost has an interesting take on it, where you can select an optimizer order by setting leaf_estimation_method to {Newton, Gradient, Exact}.

Tthis will be a huge refactoring and documentation rewriting. From the user standpoint, it is really frustrating when a lot of parameter interactions are not documented: you set one parameter and it will actually change available options of other parameters and you will never know this until you try (catboost is actually a great example of such behaviour).

Also about HGBM: I think there are some features missing that make it a bit less appealing than XGBoost or LightGBM, for example subsampling (of samples and features).

betatim · 2023-12-07T14:47:00Z

I support the idea of making HistGradientBoosting* available as GradientBoosting*, and therefore the default. Our best "product" should be available at the prime location of our shop (aka name), not hidden under the counter. I think the proposed method of using the value of n_bins to choose between the two is the best idea so far for how to do this. But it does require significant effort and a few deprecation cycles :-/ c'est la vie.

I can see that by switching (massively?) the algorithm that you get based on a constructor argument you can create confusion. However, I wonder how many people who use gradient boosting actually care about the precise algorithm. I'd include people who can't explain the differences between the two in the category of "don't care" (this includes me).

Having both versions available under a dedicated name, say, HistGradientBoosting* and ExactGradientBoosting* (?) could be nice for people who want to be super explicit. You could take this idea further by offering more classes that use an estimator but hardwire some of the constructor arguments. The goal would be to make it simpler for users (dealing with the forest of constructor args) and increase discoverability. A downside would be a proliferation of class names. But this is going off-topic.

amueller · 2024-01-22T19:22:31Z

We switch the algorithm with a constructor parameter in many models, right (for good or bad)?
I think unification would be nice, but painful and laborious.

@glevv the subsampling of features and samples could be achieved with a BaggingClassifier around it, right? but I guess that's not entirely the same thing. Both of these would be quite easy to add, I think we just didn't want to explode the number of hyper-parameters.
I think dropping the tree pruning parameters from GradientBoostingRegressor wouldn't be too bad, I don't think anyone in their right mind uses anything but max_depth or max_leaf_nodes.

betatim · 2024-03-11T12:23:17Z

An idea I had relating to this issue: what do people think of a small tool that transforms user's code? It could either be an automatic tool that makes the required edits or maybe more like a linter that points you to things that you should investigate. Taking it one step further, a linter for your scikit-learn usage would be cool. It could point out transformations you could make, anti-patterns, etc

adrinjalali · 2024-04-09T14:50:09Z

If we do such a tool, I think it'd be nice to have it a collaboration between a bunch of other projects in the pydata/scipy space.

betatim · 2024-04-09T15:35:19Z

Someone just pointed me to https://numpy.org/devdocs/numpy_2_0_migration_guide.html#ruff-plugin

There is nothing new under the sun :D

github-actions bot added the Needs Triage Issue requires triage label Nov 29, 2023

lorentzenchr added RFC and removed Needs Triage Issue requires triage labels Nov 29, 2023

lorentzenchr mentioned this issue Feb 10, 2024

Some ideas for breaking changes for 2.0 #28394

Open

adrinjalali mentioned this issue Apr 16, 2024

Make it possible to specify monotonic_cst with feature names in all tree-based estimators #28850

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

RFC Unify old GradientBoosting estimators and HGBT #27873

RFC Unify old GradientBoosting estimators and HGBT #27873

lorentzenchr commented Nov 29, 2023 •

edited

Loading

adrinjalali commented Dec 5, 2023

Uh oh!

glevv commented Dec 6, 2023

Uh oh!

betatim commented Dec 7, 2023

Uh oh!

amueller commented Jan 22, 2024

Uh oh!

betatim commented Mar 11, 2024

Uh oh!

adrinjalali commented Apr 9, 2024

Uh oh!

betatim commented Apr 9, 2024

Uh oh!

Uh oh!

RFC Unify old GradientBoosting estimators and HGBT #27873

RFC Unify old GradientBoosting estimators and HGBT #27873

Comments

lorentzenchr commented Nov 29, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Current situation

Proposition

Comparison

Algorithm

Parameters

adrinjalali commented Dec 5, 2023

Uh oh!

glevv commented Dec 6, 2023

Uh oh!

betatim commented Dec 7, 2023

Uh oh!

amueller commented Jan 22, 2024

Uh oh!

betatim commented Mar 11, 2024

Uh oh!

adrinjalali commented Apr 9, 2024

Uh oh!

betatim commented Apr 9, 2024

Uh oh!

lorentzenchr commented Nov 29, 2023 •

edited

Loading