Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Tweedie deviance loss for tree based models #16668

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
3 of 5 tasks
lorentzenchr opened this issue Mar 10, 2020 · 12 comments
Open
3 of 5 tasks

Tweedie deviance loss for tree based models #16668

lorentzenchr opened this issue Mar 10, 2020 · 12 comments

Comments

@lorentzenchr
Copy link
Member

lorentzenchr commented Mar 10, 2020

Describe the workflow you want to enable

If the target y is (approximately) Poisson, Gamma or else Tweedie distributed, it would be beneficial for tree based regressors to support Tweedie deviance loss functions as splitting criterion. This partially addresses #5975.

Describe your proposed solution

Ideally, one first implements

and then adds the different loss criteria to the tree based models:

Open for Discussion

For Poisson and Tweedie deviance with 1<=power<2, ther target y may be zero while the prediction y_pred must be strictly larger than zero. A tree might find a split where one node has y=0 for all samples in that node, resulting naively in y_pred = mean(y) = 0 for that node. I see 3 different solutions to that:

  1. Use a log-link function, i.e. predict y_pred = np.exp(tree)
    See ENH Poisson loss for HistGradientBoostingRegressor #16692 for HistGradientBoostingRegressor. This may be no option for DecisionTreeRegressor.
  2. Use a splitting rule that forbids splits where one node has sum(y)=0.
    One might also introduce some option like min_y_weight, such that splits with sum(sample_weight*y) < min_y_weight are forbidden.
  3. Use some form of parent child average y_pred = a * mean(y) + (1-a) * y_pred_parent and forbid further splits, see [1].
    (Bayes/credibility theory motivates to set a = sum(sample_weight*y)/(gamma+sum(sample_weight*y)) for some hyperparameter gamma.)

There is also a dirty solution that allows y_pred=0 but sets the value min(eps, y_pred) in the loss function for some tiny value of eps.

References

[1] R rpart library, chapter 8 Poisson regression

@ogrisel
Copy link
Member

ogrisel commented Mar 13, 2020

I would start with HistGradientBoostingRegressor which is order of magnitude more scalable and therefore useful that the other models.

The Gradient Boosting models already have the notion of link functions tied to their loss. Tthis is necessary to Bernoulli / logit expected values for binary classifications, and Categorical / softmax expected values for multiclass classification for instance. At the moment the link function is hard-coded for each loss and I think this is fine as a first step.

+1 for using the log link for Tweedie / Poisson / Gamma by default.

@ogrisel
Copy link
Member

ogrisel commented Mar 13, 2020

For pure tree based and RF models, indeed we will need something more hackish but I would focus on gradient boosting first.

@ogrisel
Copy link
Member

ogrisel commented Mar 13, 2020

@NicolasHug
Copy link
Member

I'd be happy to review a PR

(I'd also love to actually submit one but it's better if I review, given that the number of reviewers for the HGBDT is pretty small).

BTW @ogrisel if you could take a look at #15582 that'd be awesome :)

@lorentzenchr
Copy link
Member Author

@ogrisel Fast random forests are also on my personal wish list as they provide an excellent no-brainer baseline model.

For HistGradientBoostingRegressor would we go just for Poisson and Gamma or for the whole Tweedie distributions with 1<=power<=2? Also, If you'd like, I can give it a shot.

@NicolasHug
Copy link
Member

just for Poisson and Gamma or for the whole Tweedie distributions

Let's keep things simple for now and not introduce the power parameter (we don't have that kind of flexible loss API yet)

If you'd like, I can give it a shot.

Yes please!!

@ogrisel
Copy link
Member

ogrisel commented Mar 13, 2020

Compound Poisson Gamma p strictly in (1,2) is nice too because it allows to have a mixture of a point mass at zero and a continuous on R+ which is not possible to model with either Poisson or Gamma alone (plus the optimal variance function is not necessarily linear or quadratic).

I think we could have an extra "power" constructor param (set to None by default) that would only be used when `loss="tweedie_deviance"?

@NicolasHug
Copy link
Member

I agree poisson gamma is useful, but I was hoping that for the first PR we would just have non-controversial content (i.e. with no API change), to avoid long discussions like for the GLM, at least for now

@ogrisel
Copy link
Member

ogrisel commented Mar 13, 2020

Alright.

@ogrisel
Copy link
Member

ogrisel commented Mar 13, 2020

@ogrisel Fast random forests are also on my personal wish list as they provide an excellent no-brainer baseline model.

You mean make scikit-learn RF faster by implementing histogram-based splits?

Or having a fast implementation RF with Poisson/Gamma/Tweedie response variables by adding such ?

Nowadays, whenever I tried HistGradientBoostingClassifier/Regressor vs RandomForestClassifier/Regressor both with default hyperparams, the former was always the fastest with better predictive performance. Hyper-parameter tuning often imrpoves a bit but is often not critical. So to me HistGradientBoostingClassifier/Regressor is the new no-brainer baseline.

@lorentzenchr
Copy link
Member Author

@ogrisel Can you read minds? Around the same time you answered yesterday, a friend of mine came up with the idea of a histogram random forest 😄 That would be great to have, but any RF with a poisson splitting criterion will do. So far, they are hard to find in the ecosystem.

@Reksbril
Copy link
Contributor

Reksbril commented Apr 27, 2020

@ogrisel @lorentzenchr can I work on the other cases, or someone is already doing so?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Discussion
Development

No branches or pull requests

5 participants