DOC Improve description of l2_regularization for hgbt models #28652

ArturoAmorQ · 2024-03-18T13:54:49Z

Reference Issues/PRs

What does this implement/fix? Explain your changes.

The current docstring description of the the hyperparameter l2_regularization in histogram gradient boosting models is not very descriptive:

l2_regularizationfloat, default=0

    The L2 regularization parameter. Use 0 for no regularization (default).

Visiting the user guide is not helpful either. This PR aims to solve the issue.

Any other comments?

By writing this PR I came out with the doubt: if l2_regularization does penalize the magnitude of the individual tree predictions, wouldn't that just mean the convergence takes more iterations?

ArturoAmorQ · 2024-03-18T13:55:47Z

doc/modules/ensemble.rst

+of lenght :math:`T` containing the leaf weights.
+
+Notice that :math:`\gamma` penalizes the number of leaves (which makes it a
+smooth version of `max_leaf_nodes` and is not implemented in scikit-learn),


See

scikit-learn/sklearn/ensemble/_hist_gradient_boosting/splitting.pyx

Line 1186 in f07e013

value = -sum_gradient / (sum_hessian + l2_regularization + 1e-15)

for the claim "not implemented in scikit-learn"

github-actions · 2024-03-18T13:56:05Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: 20254ab. Link to the linter CI: here}

…nto hgbt_regularization

ogrisel

Thanks for the PR, here is a pass of feedback.

doc/modules/ensemble.rst

Co-authored-by: Olivier Grisel <[email protected]>

lorentzenchr

This is an overall net improvement.
I think, some overhaul and entanglement of the different GBTs would be very good. It‘s hard to speak about newton boosting (HGBT), when the original GBT is not yet introduced.

lorentzenchr · 2024-03-25T12:11:47Z

doc/modules/ensemble.rst

-`n_classes >= 3`, it uses the multi-class log loss function, with multinomial deviance
-and categorical cross-entropy as alternative names. The appropriate loss version is
-selected based on :term:`y` passed to :term:`fit`.
+Available losses for regression are 'squared_error', 'absolute_error', which is


For another PR: quantile is missing as loss.

doc/modules/ensemble.rst

sklearn/ensemble/_hist_gradient_boosting/gradient_boosting.py

Co-authored-by: Christian Lorentzen <[email protected]>

…nto hgbt_regularization

ArturoAmorQ · 2024-04-03T07:14:22Z

Otherwise I think all comments have been addressed.

lorentzenchr

only nitpicks left

doc/modules/ensemble.rst

lorentzenchr · 2024-04-05T16:05:08Z

doc/modules/ensemble.rst

+The leaf values :math:`w_k` are then a continuous value corresponding to the
+loss function to use in the boosting process. Those values contribute to the


I do not understand this sentence.

@ArturoAmorQ Could you fix, rephrase or remove this sentence? Then I can merge.

I agree the wording here was somewhat vague. I re-wrote the sentence in 20254ab, hoping it does represent an improvement over the previous wording.

doc/modules/ensemble.rst

Co-authored-by: Christian Lorentzen <[email protected]>

lorentzenchr · 2024-04-08T08:21:05Z

doc/modules/ensemble.rst

+
+The leaf values :math:`w_k` are then a continuous value corresponding to the
+loss function to use in the boosting process. Those values contribute to the
+model's prediction for a given input that ends up the corresponding leaf. The final


Suggested change

model's prediction for a given input that ends up the corresponding leaf. The final

model's prediction for a given input that ends up in the corresponding leaf. The final

ArturoAmorQ added 5 commits March 15, 2024 18:26

DOC Improve description of l2_regularization for hgbt models

16b539e

Update docstrings

63a5e81

Unrelated formatting

ed31f21

Add comment on scikit-learn implementation

3f18768

Add note with suggestion

a235207

github-actions bot added module:ensemble Documentation labels Mar 18, 2024

ArturoAmorQ commented Mar 18, 2024

View reviewed changes

ArturoAmorQ added 2 commits March 18, 2024 15:20

Wording

636a1f4

Merge branch 'main' of https://github.com/scikit-learn/scikit-learn i…

7006df3

…nto hgbt_regularization

ogrisel reviewed Mar 19, 2024

View reviewed changes

doc/modules/ensemble.rst Outdated Show resolved Hide resolved

doc/modules/ensemble.rst Outdated Show resolved Hide resolved

doc/modules/ensemble.rst Outdated Show resolved Hide resolved

doc/modules/ensemble.rst Outdated Show resolved Hide resolved

ArturoAmorQ and others added 3 commits March 20, 2024 10:09

Apply suggestions from code review

4d8748d

Co-authored-by: Olivier Grisel <[email protected]>

Address comment from Olivier

9e29218

Tweak

7e071c8

lorentzenchr reviewed Mar 25, 2024

View reviewed changes

ArturoAmorQ and others added 8 commits March 26, 2024 15:19

Apply suggestions from code review

beda796

Co-authored-by: Christian Lorentzen <[email protected]>

Format

51f34e5

Remove negative statement

f2e5d75

Correct statement in docstrings

ffca391

Add comment on learning rate

9835c16

Remove unrelated formatting

1a968f6

Plug lambda in the loss and mayor rephrasing

5b6ac82

Merge branch 'main' of https://github.com/scikit-learn/scikit-learn i…

709a8ba

…nto hgbt_regularization

lorentzenchr approved these changes Apr 5, 2024

View reviewed changes

Apply suggestions from code review

4b607a4

Co-authored-by: Christian Lorentzen <[email protected]>

lorentzenchr reviewed Apr 8, 2024

View reviewed changes

ArturoAmorQ added 2 commits April 8, 2024 10:27

Format

d11d5e8

Rephrase sentence to be more precise

20254ab

lorentzenchr merged commit 016670e into scikit-learn:main Apr 8, 2024

ArturoAmorQ deleted the hgbt_regularization branch April 8, 2024 21:01

ArturoAmorQ mentioned this pull request May 21, 2024

DOC Add quantile loss to user guide on HGBT regression #29063

Merged

		The leaf values :math:`w_k` are then a continuous value corresponding to the
		loss function to use in the boosting process. Those values contribute to the

	model's prediction for a given input that ends up the corresponding leaf. The final
	model's prediction for a given input that ends up in the corresponding leaf. The final

Uh oh!

DOC Improve description of l2_regularization for hgbt models #28652

DOC Improve description of l2_regularization for hgbt models #28652

Uh oh!

Conversation

ArturoAmorQ commented Mar 18, 2024

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

ArturoAmorQ Mar 18, 2024

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Mar 18, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✔️ Linting Passed

Uh oh!

ogrisel left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lorentzenchr left a comment

Choose a reason for hiding this comment

Uh oh!

lorentzenchr Mar 25, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ArturoAmorQ commented Apr 3, 2024

Uh oh!

lorentzenchr left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lorentzenchr Apr 5, 2024

Choose a reason for hiding this comment

Uh oh!

lorentzenchr Apr 8, 2024

Choose a reason for hiding this comment

Uh oh!

ArturoAmorQ Apr 8, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

lorentzenchr Apr 8, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot commented Mar 18, 2024 •

edited

Loading

ArturoAmorQ Apr 8, 2024 •

edited

Loading