-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Should we turn on early stopping in HistGradientBoosting by default? #14303
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I thought it was on by default. Is it not in lightgbm? |
hmm I can ask them. But I'd be OK too with it being active by default. |
I don't think it is in lightgbm. My concern is more about the inconsistent default behavior in scikit-learn. Why early stopping would be on by default in gbrt and not in logistic regression for instance (besides the fact that is not implemented)? But aside from that concern, +1 as well. |
My concern is more about the inconsistent default behavior in scikit-learn. Why
early stopping would be on by default in gbrt and not in logistic regression
for instance (besides the fact that is not implemented)?
My opinion is that we should strive for the best defaults, even if it's
at the sake of a small cost in consistency.
|
I opened microsoft/LightGBM#2270 It is not enabled by default because they require the validation set to be passed to |
They want to give the user as much liberty as possible since the train/val split can be application specific. We don't have this "problem": we always use |
We don't have this "problem": we always use train_test_split.
Hence, I think that we should turn on early stopping :)
|
closing in favor of #14503 :) |
Using n_iter_no_change=5 (or 10) in HistGradientBoosting makes a huge difference in terms of speed for me, and it seems to be harmless (at a cursory looking).
While these models are still experimental, should we make this change?
The text was updated successfully, but these errors were encountered: