Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

sercant
Copy link

@sercant sercant commented Sep 23, 2025

Reference Issues/PRs

What does this implement/fix? Explain your changes.

I noticed one of our tests failing after upgrading from 1.5 to 1.6 and above. I traced the issue to the tree implementation change in #29458. The initialization of cdef constant cannot be made in the pxd file. This resulted in FEATURE_THRESHOLD to be initialized to 0.0 instead of 1e-7. This PR fixes that by moving the initialization to the pyx file.

Any other comments?

It's my first time contributing to scikit-learn, so please let me know if anything is missing.

  • Implementation
  • Add the change to docs

Copy link

github-actions bot commented Sep 23, 2025

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 97966d1. Link to the linter CI: here

@sercant sercant force-pushed the fix-tree-feature-threshold-regression branch 2 times, most recently from eb15f6b to b3efc12 Compare September 23, 2025 23:51
@sercant sercant marked this pull request as ready for review September 23, 2025 23:51
@sercant sercant force-pushed the fix-tree-feature-threshold-regression branch from b3efc12 to 76e630e Compare September 23, 2025 23:52
@sercant sercant force-pushed the fix-tree-feature-threshold-regression branch from 76e630e to 2a3f7ec Compare September 24, 2025 00:08
Copy link
Contributor

@cakedev0 cakedev0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM. Good catch!

If you have some bandwidth to detail the use-case that relied on this "ignore almost constant features" behavior, I would be happy. But that's just for my curiosity ^^

Comment on lines +21 to +22
# Mitigate precision differences between 32 bit and 64 bit
FEATURE_THRESHOLD = 1e-7
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been working on sklearn/tree/* quite a lot lately, but this comment has remained a mystery to me. It seems you rely on this behavior, so maybe you can detail a bit more what's the purpose of "mitigating precision differences between 32 bit and 64 bit"?

(100% optional though)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @cakedev0 , I am also unsure of the purpose of this threshold. Actually, the test that failed on our side was based on randomly generated fake data. I don't believe we have features with such low min/max values. So, I think we also don't rely on this behavior.


def test_almost_constant_feature():
random_state = check_random_state(0)
X = random_state.rand(10, 20)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
X = random_state.rand(10, 20)
X = random_state.rand(10, 2)

I think you just need 2 features for this test to work. It would make it clearer IMO.

Copy link
Author

@sercant sercant Sep 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, let me push a commit for this . Also I will add an assertion that the other feature has an importance higher than 0.

@betatim
Copy link
Member

betatim commented Sep 25, 2025

Side note on force pushing: I like doing it as well but it seems to mess with links from notifications. Which means people get a notification, click on the link in it and then end up "in the middle of nowhere". So we recommend that people don't force push. The PR gets merged via squashing, so an "ugly" history doesn't matter so much.

@sercant
Copy link
Author

sercant commented Sep 25, 2025

Which means people get a notification, click on the link in it and then end up "in the middle of nowhere".

@betatim understood. Sorry for the noise! Will keep in mind for future contributions.

@sercant sercant requested review from betatim and cakedev0 September 25, 2025 13:04
Copy link
Contributor

@cakedev0 cakedev0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Copy link
Member

@betatim betatim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for finding this. I think it would be good to get the eyes of a cython guru on this as well as other reviewers

@betatim betatim added the Waiting for Second Reviewer First reviewer is done, need a second one! label Sep 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cython module:tree Waiting for Second Reviewer First reviewer is done, need a second one!
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants