-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
Tweedie deviance loss for tree based models #16668
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I would start with The Gradient Boosting models already have the notion of link functions tied to their loss. Tthis is necessary to Bernoulli / logit expected values for binary classifications, and Categorical / softmax expected values for multiclass classification for instance. At the moment the link function is hard-coded for each loss and I think this is fine as a first step. +1 for using the log link for Tweedie / Poisson / Gamma by default. |
For pure tree based and RF models, indeed we will need something more hackish but I would focus on gradient boosting first. |
@ogrisel Fast random forests are also on my personal wish list as they provide an excellent no-brainer baseline model. For |
Let's keep things simple for now and not introduce the power parameter (we don't have that kind of flexible loss API yet)
Yes please!! |
Compound Poisson Gamma p strictly in (1,2) is nice too because it allows to have a mixture of a point mass at zero and a continuous on R+ which is not possible to model with either Poisson or Gamma alone (plus the optimal variance function is not necessarily linear or quadratic). I think we could have an extra "power" constructor param (set to None by default) that would only be used when `loss="tweedie_deviance"? |
I agree poisson gamma is useful, but I was hoping that for the first PR we would just have non-controversial content (i.e. with no API change), to avoid long discussions like for the GLM, at least for now |
Alright. |
You mean make scikit-learn RF faster by implementing histogram-based splits? Or having a fast implementation RF with Poisson/Gamma/Tweedie response variables by adding such ? Nowadays, whenever I tried HistGradientBoostingClassifier/Regressor vs RandomForestClassifier/Regressor both with default hyperparams, the former was always the fastest with better predictive performance. Hyper-parameter tuning often imrpoves a bit but is often not critical. So to me HistGradientBoostingClassifier/Regressor is the new no-brainer baseline. |
@ogrisel Can you read minds? Around the same time you answered yesterday, a friend of mine came up with the idea of a histogram random forest 😄 That would be great to have, but any RF with a poisson splitting criterion will do. So far, they are hard to find in the ecosystem. |
@ogrisel @lorentzenchr can I work on the other cases, or someone is already doing so? |
Uh oh!
There was an error while loading. Please reload this page.
Describe the workflow you want to enable
If the target
y
is (approximately) Poisson, Gamma or else Tweedie distributed, it would be beneficial for tree based regressors to support Tweedie deviance loss functions as splitting criterion. This partially addresses #5975.Describe your proposed solution
Ideally, one first implements
and then adds the different loss criteria to the tree based models:
DecisionTreeRegressor
(poisson only) [MRG] ENH add Poisson splitting criterion for single trees #17386RandomForestRegressor
(poisson only) ENH Adds Poisson criterion in RandomForestRegressor #19304 #19836GradientBoostingRegressor
HistGradientBoostingRegressor
(poisson and gamma but no other tweedie cases) ENH Poisson loss for HistGradientBoostingRegressor #16692Open for Discussion
For Poisson and Tweedie deviance with
1<=power<2
, ther targety
may be zero while the predictiony_pred
must be strictly larger than zero. A tree might find a split where one node hasy=0
for all samples in that node, resulting naively iny_pred = mean(y) = 0
for that node. I see 3 different solutions to that:y_pred = np.exp(tree)
See ENH Poisson loss for HistGradientBoostingRegressor #16692 for HistGradientBoostingRegressor. This may be no option for DecisionTreeRegressor.
sum(y)=0
.One might also introduce some option like
min_y_weight
, such that splits withsum(sample_weight*y) < min_y_weight
are forbidden.y_pred = a * mean(y) + (1-a) * y_pred_parent
and forbid further splits, see [1].(Bayes/credibility theory motivates to set
a = sum(sample_weight*y)/(gamma+sum(sample_weight*y))
for some hyperparametergamma
.)There is also a dirty solution that allows
y_pred=0
but sets the valuemin(eps, y_pred)
in the loss function for some tiny value ofeps
.References
[1] R rpart library, chapter 8 Poisson regression
The text was updated successfully, but these errors were encountered: