-
-
Notifications
You must be signed in to change notification settings - Fork 26k
FEA Add missing-value support for ExtaTreeClassifier and ExtaTreeRegressor #27966
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Adam Li <[email protected]>
Signed-off-by: Adam Li <[email protected]>
Signed-off-by: Adam Li <[email protected]>
Signed-off-by: Adam Li <[email protected]>
Signed-off-by: Adam Li <[email protected]>
Signed-off-by: Adam Li <[email protected]>
Signed-off-by: Adam Li <[email protected]>
… into extratreenan
Signed-off-by: Adam Li <[email protected]>
… into extratreenan
Signed-off-by: Adam Li <[email protected]>
Signed-off-by: Adam Li <[email protected]>
Signed-off-by: Adam Li <[email protected]>
Signed-off-by: Adam Li <[email protected]>
Signed-off-by: Adam Li <[email protected]>
… into extratreenan
Signed-off-by: Adam Li <[email protected]>
Signed-off-by: Adam Li <[email protected]>
Signed-off-by: Adam Li <[email protected]>
Signed-off-by: Adam Li <[email protected]>
Signed-off-by: Adam Li <[email protected]>
Signed-off-by: Adam Li <[email protected]>
Signed-off-by: Adam Li <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comments otherwise this looks good now
Signed-off-by: Adam Li <[email protected]>
Perhaps @glemaitre and @thomasjpfan can take one last look then? I think this has changed a bit since @glemaitre approved changes, and I believe I've addressed most of the core issues raised. Thanks everyone for reviewing! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks a lot @adam2392.
Sorry for the delay, I was a bit under the water. LGTM and I see the auto-merge was activated. |
Reference Issues/PRs
Towards: #27931
Follow-up to: #26391 and #23595
What does this implement/fix? Explain your changes.
RandomSplitter
in _splitter.pyxDecisionTree*
andExtraTree*
more numerically robust by increasing the tolerance for a GLOBAL_RANDOM_SEED check, and using cross-validation scores rather than a single scoreAny other comments?
Compared to BestSplitter, there can be an expected cost to doing splits on missing-values, as we can either:
The push of missing values down the tree can be done randomly (i.e. first option), OR the second option can actually be evaluated. There is a computational cost to doing so, but more importantly there is an interpretation tradeoff. The tradeoff imo comes from the assumption of the missing-values:
However, I think the difference at a tree level is not super important. E.g. in the ExtraTree forest, #28268 demonstrates that the ExtraTrees when combined as a forest are resilient and predictive for missing-values.
Benchmarks demonstrating no significant runtime performance degradation
There is some complexity involved in checking if there are missing values. However, this only occurs at the Python level as shown by the following benchmark. In terms of the Cython code, there is no regression.
Benchmarks with and without Python Check
Also ran this benchmark for ExtraTrees, which demonstrates that this check is negligible at the forest level, since it only occurs once. See #28268, which has the short code to enable it for ExtraTrees.
Benchmarks on ExtraTrees