-
-
Notifications
You must be signed in to change notification settings - Fork 26.6k
[MRG] Added validation test for iforest on uniform data #14771
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
ping @agramfort maybe? |
NicolasHug
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor comments but LGTM
Co-Authored-By: Nicolas Hug <[email protected]>
agramfort
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks reasonable. thx
|
Thanks @Jay-z007 ! |
|
Hi, I am sorry to say so, but this commit doesn't seems to fix and test anything valuable. Imagine the following code: With
|
…rn#14771)" This reverts commit bcaf381. The test in reverted commit is useless and doesn't rely on the code implementation. The commit claims to fix scikit-learn#7141, where the isolation forest is trained on the identical values leading to the degenerated trees. Under described circumstances, one may check that the exact score value for every point in the parameter space is zero (or 0.5 depending on if we are talking about initial paper or scikit-learn implementation). However, there is no special code in existing implementation, and the score value is a subject of rounding erros. So, for instance, for 100 identical input samples, we have a forest predicting everything as inliners, but for 101 input samples, we have a forest predicting everything as outliers. The decision is taken only based on floating point rounding error value. One may check this by changing the number of input samples: X = np.ones((100, 10)) to X = np.ones((101, 10)) or something else.
Reference Issues/PRs
Fixes #7141
What does this implement/fix? Explain your changes.
Added tests to make sure that the trees predict inliers after being fitted on uniform data.
Any other comments?