Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Tweedie deviance loss for tree based models #16668

Open
@lorentzenchr

Description

@lorentzenchr

Describe the workflow you want to enable

If the target y is (approximately) Poisson, Gamma or else Tweedie distributed, it would be beneficial for tree based regressors to support Tweedie deviance loss functions as splitting criterion. This partially addresses #5975.

Describe your proposed solution

Ideally, one first implements

and then adds the different loss criteria to the tree based models:

Open for Discussion

For Poisson and Tweedie deviance with 1<=power<2, ther target y may be zero while the prediction y_pred must be strictly larger than zero. A tree might find a split where one node has y=0 for all samples in that node, resulting naively in y_pred = mean(y) = 0 for that node. I see 3 different solutions to that:

  1. Use a log-link function, i.e. predict y_pred = np.exp(tree)
    See ENH Poisson loss for HistGradientBoostingRegressor #16692 for HistGradientBoostingRegressor. This may be no option for DecisionTreeRegressor.
  2. Use a splitting rule that forbids splits where one node has sum(y)=0.
    One might also introduce some option like min_y_weight, such that splits with sum(sample_weight*y) < min_y_weight are forbidden.
  3. Use some form of parent child average y_pred = a * mean(y) + (1-a) * y_pred_parent and forbid further splits, see [1].
    (Bayes/credibility theory motivates to set a = sum(sample_weight*y)/(gamma+sum(sample_weight*y)) for some hyperparameter gamma.)

There is also a dirty solution that allows y_pred=0 but sets the value min(eps, y_pred) in the loss function for some tiny value of eps.

References

[1] R rpart library, chapter 8 Poisson regression

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Status

    Discussion

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions