Description
Describe the workflow you want to enable
If the target y
is (approximately) Poisson, Gamma or else Tweedie distributed, it would be beneficial for tree based regressors to support Tweedie deviance loss functions as splitting criterion. This partially addresses #5975.
Describe your proposed solution
Ideally, one first implements
- differentiable loss functions A common private module for differentiable loss functions used as objective functions in estimators #15123
and then adds the different loss criteria to the tree based models:
-
DecisionTreeRegressor
(poisson only) [MRG] ENH add Poisson splitting criterion for single trees #17386 -
RandomForestRegressor
(poisson only) ENH Adds Poisson criterion in RandomForestRegressor #19304 #19836 -
GradientBoostingRegressor
-
HistGradientBoostingRegressor
(poisson and gamma but no other tweedie cases) ENH Poisson loss for HistGradientBoostingRegressor #16692
Open for Discussion
For Poisson and Tweedie deviance with 1<=power<2
, ther target y
may be zero while the prediction y_pred
must be strictly larger than zero. A tree might find a split where one node has y=0
for all samples in that node, resulting naively in y_pred = mean(y) = 0
for that node. I see 3 different solutions to that:
- Use a log-link function, i.e. predict
y_pred = np.exp(tree)
See ENH Poisson loss for HistGradientBoostingRegressor #16692 for HistGradientBoostingRegressor. This may be no option for DecisionTreeRegressor. - Use a splitting rule that forbids splits where one node has
sum(y)=0
.
One might also introduce some option likemin_y_weight
, such that splits withsum(sample_weight*y) < min_y_weight
are forbidden. - Use some form of parent child average
y_pred = a * mean(y) + (1-a) * y_pred_parent
and forbid further splits, see [1].
(Bayes/credibility theory motivates to seta = sum(sample_weight*y)/(gamma+sum(sample_weight*y))
for some hyperparametergamma
.)
There is also a dirty solution that allows y_pred=0
but sets the value min(eps, y_pred)
in the loss function for some tiny value of eps
.
References
[1] R rpart library, chapter 8 Poisson regression
Metadata
Metadata
Assignees
Type
Projects
Status