-
-
Notifications
You must be signed in to change notification settings - Fork 26k
DOC Improve description of l2_regularization for hgbt models #28652
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DOC Improve description of l2_regularization for hgbt models #28652
Conversation
doc/modules/ensemble.rst
Outdated
of lenght :math:`T` containing the leaf weights. | ||
|
||
Notice that :math:`\gamma` penalizes the number of leaves (which makes it a | ||
smooth version of `max_leaf_nodes` and is not implemented in scikit-learn), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See
value = -sum_gradient / (sum_hessian + l2_regularization + 1e-15) |
…nto hgbt_regularization
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR, here is a pass of feedback.
Co-authored-by: Olivier Grisel <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an overall net improvement.
I think, some overhaul and entanglement of the different GBTs would be very good. It‘s hard to speak about newton boosting (HGBT), when the original GBT is not yet introduced.
doc/modules/ensemble.rst
Outdated
`n_classes >= 3`, it uses the multi-class log loss function, with multinomial deviance | ||
and categorical cross-entropy as alternative names. The appropriate loss version is | ||
selected based on :term:`y` passed to :term:`fit`. | ||
Available losses for regression are 'squared_error', 'absolute_error', which is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For another PR: quantile is missing as loss.
Co-authored-by: Christian Lorentzen <[email protected]>
…nto hgbt_regularization
Otherwise I think all comments have been addressed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
only nitpicks left
doc/modules/ensemble.rst
Outdated
The leaf values :math:`w_k` are then a continuous value corresponding to the | ||
loss function to use in the boosting process. Those values contribute to the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not understand this sentence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ArturoAmorQ Could you fix, rephrase or remove this sentence? Then I can merge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree the wording here was somewhat vague. I re-wrote the sentence in 20254ab, hoping it does represent an improvement over the previous wording.
Co-authored-by: Christian Lorentzen <[email protected]>
doc/modules/ensemble.rst
Outdated
|
||
The leaf values :math:`w_k` are then a continuous value corresponding to the | ||
loss function to use in the boosting process. Those values contribute to the | ||
model's prediction for a given input that ends up the corresponding leaf. The final |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
model's prediction for a given input that ends up the corresponding leaf. The final | |
model's prediction for a given input that ends up in the corresponding leaf. The final |
Reference Issues/PRs
What does this implement/fix? Explain your changes.
The current docstring description of the the hyperparameter
l2_regularization
in histogram gradient boosting models is not very descriptive:Visiting the user guide is not helpful either. This PR aims to solve the issue.
Any other comments?
By writing this PR I came out with the doubt: if
l2_regularization
does penalize the magnitude of the individual tree predictions, wouldn't that just mean the convergence takes more iterations?