Should we turn on early stopping in HistGradientBoosting by default? #14303

GaelVaroquaux · 2019-07-10T14:13:51Z

Using n_iter_no_change=5 (or 10) in HistGradientBoosting makes a huge difference in terms of speed for me, and it seems to be harmless (at a cursory looking).

While these models are still experimental, should we make this change?

amueller · 2019-07-10T19:56:41Z

I thought it was on by default. Is it not in lightgbm?
@NicolasHug can you maybe check in with the lightgbm people why they didn't?

NicolasHug · 2019-07-10T21:39:26Z

hmm I can ask them. But I'd be OK too with it being active by default.

ogrisel · 2019-07-11T07:47:54Z

I don't think it is in lightgbm.

My concern is more about the inconsistent default behavior in scikit-learn. Why early stopping would be on by default in gbrt and not in logistic regression for instance (besides the fact that is not implemented)?

But aside from that concern, +1 as well.

GaelVaroquaux · 2019-07-11T13:51:59Z

My concern is more about the inconsistent default behavior in scikit-learn. Why early stopping would be on by default in gbrt and not in logistic regression for instance (besides the fact that is not implemented)?

My opinion is that we should strive for the best defaults, even if it's at the sake of a small cost in consistency.

NicolasHug · 2019-07-19T13:20:58Z

I opened microsoft/LightGBM#2270

It is not enabled by default because they require the validation set to be passed to fit.

NicolasHug · 2019-07-19T14:00:56Z

They want to give the user as much liberty as possible since the train/val split can be application specific.

We don't have this "problem": we always use train_test_split.

GaelVaroquaux · 2019-07-19T14:46:05Z

We don't have this "problem": we always use train_test_split.

Hence, I think that we should turn on early stopping :)

NicolasHug · 2019-07-29T13:49:30Z

closing in favor of #14503 :)

NicolasHug mentioned this issue Jul 18, 2019

API design choice: why not enable early stopping by default? microsoft/LightGBM#2270

Closed

NicolasHug mentioned this issue Jul 29, 2019

Turn on early stopping by default in HistGradientBoosting estimators #14503

Closed

4 tasks

NicolasHug closed this as completed Jul 29, 2019

johannfaouzi mentioned this issue Jul 30, 2019

FEA Turn on early stopping in histogram GBDT by default #14516

Merged

StrikerRUS mentioned this issue Dec 12, 2019

[RFC] compatibility with scikit-learn microsoft/LightGBM#2628

Closed

rohan-gt mentioned this issue Nov 9, 2020

[Question] Early stopping implementation catboost/catboost#1387

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should we turn on early stopping in HistGradientBoosting by default? #14303

Should we turn on early stopping in HistGradientBoosting by default? #14303

GaelVaroquaux commented Jul 10, 2019

amueller commented Jul 10, 2019

NicolasHug commented Jul 10, 2019

ogrisel commented Jul 11, 2019

GaelVaroquaux commented Jul 11, 2019 via email

NicolasHug commented Jul 19, 2019

NicolasHug commented Jul 19, 2019

GaelVaroquaux commented Jul 19, 2019 via email

NicolasHug commented Jul 29, 2019

Should we turn on early stopping in HistGradientBoosting by default? #14303

Should we turn on early stopping in HistGradientBoosting by default? #14303

Comments

GaelVaroquaux commented Jul 10, 2019

amueller commented Jul 10, 2019

NicolasHug commented Jul 10, 2019

ogrisel commented Jul 11, 2019

GaelVaroquaux commented Jul 11, 2019 via email

NicolasHug commented Jul 19, 2019

NicolasHug commented Jul 19, 2019

GaelVaroquaux commented Jul 19, 2019 via email

NicolasHug commented Jul 29, 2019