Thanks to visit codestin.com
Credit goes to github.com

Skip to content

DOC Improve description of l2_regularization for hgbt models #28652

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 21 commits into from
Apr 8, 2024

Conversation

ArturoAmorQ
Copy link
Member

Reference Issues/PRs

What does this implement/fix? Explain your changes.

The current docstring description of the the hyperparameter l2_regularization in histogram gradient boosting models is not very descriptive:

l2_regularizationfloat, default=0

    The L2 regularization parameter. Use 0 for no regularization (default).

Visiting the user guide is not helpful either. This PR aims to solve the issue.

Any other comments?

By writing this PR I came out with the doubt: if l2_regularization does penalize the magnitude of the individual tree predictions, wouldn't that just mean the convergence takes more iterations?

of lenght :math:`T` containing the leaf weights.

Notice that :math:`\gamma` penalizes the number of leaves (which makes it a
smooth version of `max_leaf_nodes` and is not implemented in scikit-learn),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See

value = -sum_gradient / (sum_hessian + l2_regularization + 1e-15)
for the claim "not implemented in scikit-learn"

Copy link

github-actions bot commented Mar 18, 2024

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 20254ab. Link to the linter CI: here

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR, here is a pass of feedback.

Copy link
Member

@lorentzenchr lorentzenchr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an overall net improvement.
I think, some overhaul and entanglement of the different GBTs would be very good. It‘s hard to speak about newton boosting (HGBT), when the original GBT is not yet introduced.

`n_classes >= 3`, it uses the multi-class log loss function, with multinomial deviance
and categorical cross-entropy as alternative names. The appropriate loss version is
selected based on :term:`y` passed to :term:`fit`.
Available losses for regression are 'squared_error', 'absolute_error', which is
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For another PR: quantile is missing as loss.

@ArturoAmorQ
Copy link
Member Author

Otherwise I think all comments have been addressed.

Copy link
Member

@lorentzenchr lorentzenchr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only nitpicks left

Comment on lines 148 to 149
The leaf values :math:`w_k` are then a continuous value corresponding to the
loss function to use in the boosting process. Those values contribute to the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not understand this sentence.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ArturoAmorQ Could you fix, rephrase or remove this sentence? Then I can merge.

Copy link
Member Author

@ArturoAmorQ ArturoAmorQ Apr 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree the wording here was somewhat vague. I re-wrote the sentence in 20254ab, hoping it does represent an improvement over the previous wording.

Co-authored-by: Christian Lorentzen <[email protected]>

The leaf values :math:`w_k` are then a continuous value corresponding to the
loss function to use in the boosting process. Those values contribute to the
model's prediction for a given input that ends up the corresponding leaf. The final
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
model's prediction for a given input that ends up the corresponding leaf. The final
model's prediction for a given input that ends up in the corresponding leaf. The final

@lorentzenchr lorentzenchr merged commit 016670e into scikit-learn:main Apr 8, 2024
@ArturoAmorQ ArturoAmorQ deleted the hgbt_regularization branch April 8, 2024 21:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants