Thanks to visit codestin.com
Credit goes to github.com

Skip to content

DOC GradientBoosting* will not implement monotonic constraints, use HistGradientBoosting* instead #27516

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 27 commits into from
Sep 5, 2024

Conversation

Charlie-XIAO
Copy link
Contributor

Reference Issues/PRs

Closes #27305.

What does this implement/fix? Explain your changes.

This PR implements monotonicity constraints for GradientBoostingClassifier and GradientBoostingRegressor. This is dropped from #13649.

Any other comments?

For your reference: Greedy Function Approximation, Friedman. There were discussions around whether line search should be performed when using monotonic constraints, see #13649 (comment). I actually did not fully understand this so it would be nice if someone can explain in more details. By the way, test_monotonic_constraints_classifications in sklearn/tree/tests/test_monotonic_tree.py would fail if line search is performed.

Speaking of tests, I'm also a bit confused where they should be placed. It seems that we should have similar (if not the same) tests as sklearn/tree/tests/test_monotonic_tree.py so I currently only extended the parametrizations to include GradientBoostingClassifier and GradientBoostingRegressor. Still, it's a bit strange to test one module under another. Please correct me if this is wrong.

@lorentzenchr Would you want to take a look? I'm not sure if this is what the target issue desired.

@github-actions
Copy link

github-actions bot commented Oct 2, 2023

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 5765414. Link to the linter CI: here

@glemaitre
Copy link
Member

Before to review this PR, I would like first to fix the constrain in main by going forward the following PR: #27639

In case we don't make it for the 1.4 release, we would need to revert more than a single commit which would make it complex. But right after that, I will be reviewing this PR.

@lorentzenchr lorentzenchr added this to the 1.5 milestone Jan 29, 2024
@lorentzenchr
Copy link
Member

#27639 is merged. So we could resume to this PR.

As to the tests, we should have almost the same as in sklearn/ensemble/_hist_gradient_boosting/tests/test_monotonic_contraints.py.

@Charlie-XIAO
Copy link
Contributor Author

@lorentzenchr Thanks for the feedback and I updated the tests.

It seems that sklearn/ensemble/_hist_gradient_boosting/tests/test_monotonic_contraints.py contains many tests regarding the splitter and the grower, which cannot be directly used for GradientBoosting* if I'm not mistaken. Therefore I only took the main test test_prediction. The rest are mainly taken from sklearn/tree/tests/test_monotonic_tree.py.

Please let me know if I misunderstood what you meant.

@lorentzenchr
Copy link
Member

There were discussions around whether line search should be performed when using monotonic constraints

AFAIU, line search must be performed with the addition that node values (after ls) trespassing the monotonicity boundary must be set to the boundary value.
This is more or less the same logic as in HGBT, except that HGBT uses hessians instead of a line search (but exceeding node values are also set back to the boundary, aka midvalue).

@jeremiedbb jeremiedbb modified the milestones: 1.5, 1.6 May 13, 2024
@Charlie-XIAO
Copy link
Contributor Author

Charlie-XIAO commented Jul 20, 2024

@lorentzenchr Sorry that I forgot about this PR.

AFAIU, line search must be performed with the addition that node values (after ls) trespassing the monotonicity boundary must be set to the boundary value.

I think I understand the theory but I'm not sure if I'm implementing the right way 😢 The boundary values does not seem to be accessible from the tree instance in the line search step (i.e., _update_terminal_regions)? In cbeb272 I tried to record boundaries in the nodes and things seem to work. However this will increase size of the tree (64 bytes/node -> 80 bytes/node), so I wonder if there is a better way 🤔

@glemaitre glemaitre self-requested a review July 22, 2024 15:15
@lorentzenchr
Copy link
Member

@Charlie-XIAO This feature is trickier than anticipated. Modifying sklearn.tree is not an option as we then would need to investigate the impact on trees very thoroughly. I see 2 options:

  1. Save the boundaries in an array that is aligned with the tree nodes, such that it is easy to index the array given a tree node. This array should live inside GBT and either be passed around as argument or, if simpler, save as attribute.
  2. Reconsider this feature and maybe do not implement it.

@Charlie-XIAO
Copy link
Contributor Author

Charlie-XIAO commented Aug 12, 2024

Sorry for the late reply @lorentzenchr. I don't see an intuitive way to achieve Option 1, again because we cannot access the tree-building process in GBT (it calls DecisionTreeRegressor so it can only access the built tree instance). One possibility is to add float64_t* lower_bounds and float64_t* upper_bounds to Tree. Then we make it accept an additional parameter like record_boundaries, which controls whether we allocate memory and update those boundaries. Am I missing something here?

Update: Maybe something like in 3ac75ea?

@lorentzenchr
Copy link
Member

@Charlie-XIAO Modifying the tree does also not work, because we know the "line search" values only after having fit the tree. And only those line search values are the ones that count (except for squared error).
Summary: Only after fitting a tree, we know the terminal tree node values which must follow the monotonic constraints. But these constraints should influence the tree splitting process, i.e. the tree fitting. This is a contradiction.

Could we use the constraints of the decision trees and then during the line search, check that we don't violate the constraints?

If that doesn't work either, it looks bad for this feature.

@adrinjalali @thomasjpfan @NicolasHug @ogrisel Do you see possible solutions?

@adrinjalali
Copy link
Member

Interesting PR.

Taking a step back, I'm wondering, don't we want to basically make HGBT the sort of "default" for GBs? Wouldn't that make sense to only have this for HGBTs and not have it for GBs?

@thomasjpfan
Copy link
Member

I'm +1 on having monotonic constraints only in HGBT and not the regular GB.

@Charlie-XIAO
Copy link
Contributor Author

In that case should we just maybe add a note in the docstring recommending HGBT for monotonic constraints?

@adrinjalali
Copy link
Member

In that case should we just maybe add a note in the docstring recommending HGBT for monotonic constraints?

That makes sense to me.

Copy link
Member

@lorentzenchr lorentzenchr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Charlie-XIAO Charlie-XIAO changed the title ENH Implement monotonicity constraints for GradientBoosting* DOC GradientBoosting* will not implement monotonic constraints, use HistGradientBoosting* instead Sep 5, 2024
Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, otherwise LGTM.

@Charlie-XIAO
Copy link
Contributor Author

Fixed :)

@adrinjalali adrinjalali enabled auto-merge (squash) September 5, 2024 15:58
@adrinjalali adrinjalali merged commit 6d9d09a into scikit-learn:main Sep 5, 2024
28 checks passed
@Charlie-XIAO Charlie-XIAO deleted the gbc-monotonicity-cst branch September 5, 2024 17:02
glemaitre pushed a commit to glemaitre/scikit-learn that referenced this pull request Sep 9, 2024
glemaitre pushed a commit that referenced this pull request Sep 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Monotonicity constraints for GradientBoostingClassifier and GradientBoostingRegressor
6 participants