Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@jayybhatt
Copy link
Contributor

Reference Issues/PRs

Fixes #7141

What does this implement/fix? Explain your changes.

Added tests to make sure that the trees predict inliers after being fitted on uniform data.

Any other comments?

@jayybhatt jayybhatt changed the title Added validation test for iforest on uniform data [MRG] Added validation test for iforest on uniform data Aug 24, 2019
@amueller
Copy link
Member

ping @agramfort maybe?
Looks good to me.

@jayybhatt
Copy link
Contributor Author

@agramfort

Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor comments but LGTM

Copy link
Member

@agramfort agramfort left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks reasonable. thx

@NicolasHug
Copy link
Member

Thanks @Jay-z007 !

@NicolasHug NicolasHug merged commit bcaf381 into scikit-learn:master Aug 26, 2019
@matwey
Copy link

matwey commented Apr 16, 2020

Hi,

I am sorry to say so, but this commit doesn't seems to fix and test anything valuable.

Imagine the following code:

import numpy as np
from sklearn.ensemble import IsolationForest

for n_samples in range(100,104):
    X = np.ones((n_samples, 10))
    iforest = IsolationForest()
    iforest.fit(X)
    print(n_samples, np.any(iforest.predict(X) == 1))

With sklearn from master I have:

100 True
101 False
102 True
103 False

iforest.predict(X) result is based only on floating point arithmetic errors in this case.

matwey added a commit to matwey/scikit-learn that referenced this pull request Apr 18, 2020
…rn#14771)"

This reverts commit bcaf381.

The test in reverted commit is useless and doesn't rely on the code
implementation. The commit claims to fix scikit-learn#7141, where the isolation forest is
trained on the identical values leading to the degenerated trees.

Under described circumstances, one may check that the exact score value for
every point in the parameter space is zero (or 0.5 depending on if we are
talking about initial paper or scikit-learn implementation).
However, there is no special code in existing implementation, and the score
value is a subject of rounding erros. So, for instance, for 100 identical input
samples, we have a forest predicting everything as inliners, but for 101 input
samples, we have a forest predicting everything as outliers. The decision is
taken only based on floating point rounding error value.

One may check this by changing the number of input samples:

    X = np.ones((100, 10))

to

    X = np.ones((101, 10))

or something else.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

IsolationForest degenerates with uniform training data

6 participants