Thanks to visit codestin.com
Credit goes to github.com

Skip to content

The depth formula in iforest is incorrect #8549

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
liuxinuestc opened this issue Mar 7, 2017 · 6 comments
Closed

The depth formula in iforest is incorrect #8549

liuxinuestc opened this issue Mar 7, 2017 · 6 comments
Labels
Bug Easy Well-defined and straightforward way to resolve

Comments

@liuxinuestc
Copy link

liuxinuestc commented Mar 7, 2017

Description

In the paper of iforest the formula of average path length of unsuccessful search in BST as

avgPathLen = 2. * (np.log(n_samples_leaf   - 1.) + 0.5772156649) - 2. * (
                n_samples_leaf - 1.) / n_samples_leaf

but in code sklearn.ensemble.iforest._average_path_length the formula is

2 * (np.log(n_samples_leaf ) + 0.5772156649) - 2. * (
                n_samples_leaf - 1.) / n_samples_leaf

Steps/Code to Reproduce

Actual Results

if isinstance(n_samples_leaf, INTEGER_TYPES):
        if n_samples_leaf <= 1:
            return 1.
        else:
            return 2. * (np.log(n_samples_leaf) + 0.5772156649) - 2. * (
                n_samples_leaf - 1.) / n_samples_leaf

average_path_length[not_mask] = 2. * (
            np.log(n_samples_leaf[not_mask]) + 0.5772156649) - 2. * (
                n_samples_leaf[not_mask] - 1.) / n_samples_leaf[not_mask]

-->

Expected Results

if isinstance(n_samples_leaf, INTEGER_TYPES):
        if n_samples_leaf <= 1:
            return 1.
        else:
            return 2. * (np.log(n_samples_leaf - 1.0) + 0.5772156649) - 2. * (
                n_samples_leaf - 1.) / n_samples_leaf

average_path_length[not_mask] = 2. * (
            np.log(n_samples_leaf[not_mask] - 1.) + 0.5772156649) - 2. * (
                n_samples_leaf[not_mask] - 1.) / n_samples_leaf[not_mask]

Versions

@lesteve
Copy link
Member

lesteve commented Mar 7, 2017

Readability counts, a lot! Please use triple back-quotes aka fenced code blocks to format error messages code snippets. Bonus points if you use syntax highlighting with py for python snippets and pytb for tracebacks.

@ngoix
Copy link
Contributor

ngoix commented Mar 8, 2017

You're right a -1 is missing inside the log: np.log(n_samples_leaf) has to be replaced by np.log(n_samples_leaf - 1.)
@liuxinuestc do you want to open a PR?

@liuxinuestc
Copy link
Author

I'm just a new user in github, and I'm a learner. I don't know much about PR. But recently I have designed an unsupervised DT algorithm, I want to implement this algorithm on GitHub, and for everyone to use. @ngoix

@liuxinuestc
Copy link
Author

thanks for your suggestions~~!! @lesteve

@lesteve lesteve added Documentation Easy Well-defined and straightforward way to resolve Need Contributor and removed Documentation labels Mar 9, 2017
@lesteve
Copy link
Member

lesteve commented Mar 9, 2017

I don't know much about PR

Read about it then, maybe this is a good start. If you are new to Github I would encourage you to browse https://opensource.guide and https://guides.github.com/. For scikit-learn specific guidance, have a look at http://scikit-learn.org/stable/developers/contributing.html#contributing.

@pzbw
Copy link
Contributor

pzbw commented Mar 12, 2017

I'm working on this issue.

@raghavrv raghavrv changed the title the depth formula in iforest is incorrect The depth formula in iforest is incorrect Mar 12, 2017
pzbw pushed a commit to pzbw/scikit-learn that referenced this issue Mar 12, 2017
raghavrv pushed a commit that referenced this issue Mar 14, 2017
* Fixed depth formula in iforest

* Added non-regression test for issue #8549

* reverted some whitespace changes

* Made changes to what's new and whitespace changes

* Update whats_new.rst

* Update whats_new.rst

* fixed faulty whitespace

* faulty whitespace fix and change to whats new

* added constants to iforest average_path_length and the according non regression test

* COSMIT

* Update whats_new.rst

* Corrected IsolationForest average path formula and added integer array equiv test

* changed line to under 80 char

* Update whats_new.rst

* Update whats_new.rst

* reran tests

* redefine np.euler_gamma

* added import statement for euler_gammma in iforest and test_iforest

* changed np.euler_gamma to euler_gamma

* fix small formatting issue

* fix small formatting issue

* modified average_path_length tests

* formatting fix + removed redundant tests

* fix import error

* retry remote server error

* retry remote server error

* retry remote server error

* re-added some iforest tests

* re-added some iforest tests
herilalaina pushed a commit to herilalaina/scikit-learn that referenced this issue Mar 26, 2017
* Fixed depth formula in iforest

* Added non-regression test for issue scikit-learn#8549

* reverted some whitespace changes

* Made changes to what's new and whitespace changes

* Update whats_new.rst

* Update whats_new.rst

* fixed faulty whitespace

* faulty whitespace fix and change to whats new

* added constants to iforest average_path_length and the according non regression test

* COSMIT

* Update whats_new.rst

* Corrected IsolationForest average path formula and added integer array equiv test

* changed line to under 80 char

* Update whats_new.rst

* Update whats_new.rst

* reran tests

* redefine np.euler_gamma

* added import statement for euler_gammma in iforest and test_iforest

* changed np.euler_gamma to euler_gamma

* fix small formatting issue

* fix small formatting issue

* modified average_path_length tests

* formatting fix + removed redundant tests

* fix import error

* retry remote server error

* retry remote server error

* retry remote server error

* re-added some iforest tests

* re-added some iforest tests
massich pushed a commit to massich/scikit-learn that referenced this issue Apr 26, 2017
* Fixed depth formula in iforest

* Added non-regression test for issue scikit-learn#8549

* reverted some whitespace changes

* Made changes to what's new and whitespace changes

* Update whats_new.rst

* Update whats_new.rst

* fixed faulty whitespace

* faulty whitespace fix and change to whats new

* added constants to iforest average_path_length and the according non regression test

* COSMIT

* Update whats_new.rst

* Corrected IsolationForest average path formula and added integer array equiv test

* changed line to under 80 char

* Update whats_new.rst

* Update whats_new.rst

* reran tests

* redefine np.euler_gamma

* added import statement for euler_gammma in iforest and test_iforest

* changed np.euler_gamma to euler_gamma

* fix small formatting issue

* fix small formatting issue

* modified average_path_length tests

* formatting fix + removed redundant tests

* fix import error

* retry remote server error

* retry remote server error

* retry remote server error

* re-added some iforest tests

* re-added some iforest tests
Sundrique pushed a commit to Sundrique/scikit-learn that referenced this issue Jun 14, 2017
* Fixed depth formula in iforest

* Added non-regression test for issue scikit-learn#8549

* reverted some whitespace changes

* Made changes to what's new and whitespace changes

* Update whats_new.rst

* Update whats_new.rst

* fixed faulty whitespace

* faulty whitespace fix and change to whats new

* added constants to iforest average_path_length and the according non regression test

* COSMIT

* Update whats_new.rst

* Corrected IsolationForest average path formula and added integer array equiv test

* changed line to under 80 char

* Update whats_new.rst

* Update whats_new.rst

* reran tests

* redefine np.euler_gamma

* added import statement for euler_gammma in iforest and test_iforest

* changed np.euler_gamma to euler_gamma

* fix small formatting issue

* fix small formatting issue

* modified average_path_length tests

* formatting fix + removed redundant tests

* fix import error

* retry remote server error

* retry remote server error

* retry remote server error

* re-added some iforest tests

* re-added some iforest tests
NelleV pushed a commit to NelleV/scikit-learn that referenced this issue Aug 11, 2017
* Fixed depth formula in iforest

* Added non-regression test for issue scikit-learn#8549

* reverted some whitespace changes

* Made changes to what's new and whitespace changes

* Update whats_new.rst

* Update whats_new.rst

* fixed faulty whitespace

* faulty whitespace fix and change to whats new

* added constants to iforest average_path_length and the according non regression test

* COSMIT

* Update whats_new.rst

* Corrected IsolationForest average path formula and added integer array equiv test

* changed line to under 80 char

* Update whats_new.rst

* Update whats_new.rst

* reran tests

* redefine np.euler_gamma

* added import statement for euler_gammma in iforest and test_iforest

* changed np.euler_gamma to euler_gamma

* fix small formatting issue

* fix small formatting issue

* modified average_path_length tests

* formatting fix + removed redundant tests

* fix import error

* retry remote server error

* retry remote server error

* retry remote server error

* re-added some iforest tests

* re-added some iforest tests
paulha pushed a commit to paulha/scikit-learn that referenced this issue Aug 19, 2017
* Fixed depth formula in iforest

* Added non-regression test for issue scikit-learn#8549

* reverted some whitespace changes

* Made changes to what's new and whitespace changes

* Update whats_new.rst

* Update whats_new.rst

* fixed faulty whitespace

* faulty whitespace fix and change to whats new

* added constants to iforest average_path_length and the according non regression test

* COSMIT

* Update whats_new.rst

* Corrected IsolationForest average path formula and added integer array equiv test

* changed line to under 80 char

* Update whats_new.rst

* Update whats_new.rst

* reran tests

* redefine np.euler_gamma

* added import statement for euler_gammma in iforest and test_iforest

* changed np.euler_gamma to euler_gamma

* fix small formatting issue

* fix small formatting issue

* modified average_path_length tests

* formatting fix + removed redundant tests

* fix import error

* retry remote server error

* retry remote server error

* retry remote server error

* re-added some iforest tests

* re-added some iforest tests
maskani-moh pushed a commit to maskani-moh/scikit-learn that referenced this issue Nov 15, 2017
* Fixed depth formula in iforest

* Added non-regression test for issue scikit-learn#8549

* reverted some whitespace changes

* Made changes to what's new and whitespace changes

* Update whats_new.rst

* Update whats_new.rst

* fixed faulty whitespace

* faulty whitespace fix and change to whats new

* added constants to iforest average_path_length and the according non regression test

* COSMIT

* Update whats_new.rst

* Corrected IsolationForest average path formula and added integer array equiv test

* changed line to under 80 char

* Update whats_new.rst

* Update whats_new.rst

* reran tests

* redefine np.euler_gamma

* added import statement for euler_gammma in iforest and test_iforest

* changed np.euler_gamma to euler_gamma

* fix small formatting issue

* fix small formatting issue

* modified average_path_length tests

* formatting fix + removed redundant tests

* fix import error

* retry remote server error

* retry remote server error

* retry remote server error

* re-added some iforest tests

* re-added some iforest tests
jwjohnson314 pushed a commit to jwjohnson314/scikit-learn that referenced this issue Dec 18, 2017
* Fixed depth formula in iforest

* Added non-regression test for issue scikit-learn#8549

* reverted some whitespace changes

* Made changes to what's new and whitespace changes

* Update whats_new.rst

* Update whats_new.rst

* fixed faulty whitespace

* faulty whitespace fix and change to whats new

* added constants to iforest average_path_length and the according non regression test

* COSMIT

* Update whats_new.rst

* Corrected IsolationForest average path formula and added integer array equiv test

* changed line to under 80 char

* Update whats_new.rst

* Update whats_new.rst

* reran tests

* redefine np.euler_gamma

* added import statement for euler_gammma in iforest and test_iforest

* changed np.euler_gamma to euler_gamma

* fix small formatting issue

* fix small formatting issue

* modified average_path_length tests

* formatting fix + removed redundant tests

* fix import error

* retry remote server error

* retry remote server error

* retry remote server error

* re-added some iforest tests

* re-added some iforest tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Easy Well-defined and straightforward way to resolve
Projects
None yet
Development

No branches or pull requests

5 participants