Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[MRG+1] FIX Correct depth formula in iforest #8576

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 31 commits into from
Mar 14, 2017

Conversation

pzbw
Copy link
Contributor

@pzbw pzbw commented Mar 12, 2017

Reference Issue

What does this implement/fix? Explain your changes.

Fixed issue #8549

Any other comments?

@jnothman
Copy link
Member

Would it be reasonable to add a non-regression test?

@pzbw
Copy link
Contributor Author

pzbw commented Mar 12, 2017

@jnothman the change is made in a private method for the IsolationForest class and it appears that the tests in: scikit-learn/sklearn/ensemble/tests/test_iforest.py are passing.

@raghavrv raghavrv changed the title Fixed depth formula in iforest [WIP] FIX Correct depth formula in iforest Mar 12, 2017
@raghavrv
Copy link
Member

@PtrWang Thanks for the PR. This needs s regression test as @jnothman pointed out. Indeed the tests in test_iforest.py pass, but that indicates a lack in the test which gave way to a bug, so it would be nice to have a test which will fail without your bugfix :)

@raghavrv raghavrv added the Bug label Mar 12, 2017
@raghavrv raghavrv requested review from raghavrv and jnothman March 12, 2017 14:35
@jnothman
Copy link
Member

jnothman commented Mar 12, 2017 via email

@raghavrv
Copy link
Member

Thanks for the test. Does it fail in master branch?

@raghavrv
Copy link
Member

A whatsnew and this is good to go...

@raghavrv raghavrv changed the title [WIP] FIX Correct depth formula in iforest [MRG + 1] FIX Correct depth formula in iforest Mar 12, 2017
@jnothman
Copy link
Member

jnothman commented Mar 12, 2017 via email

@@ -18,7 +18,9 @@ parameters, may produce different models from the previous version. This often
occurs due to changes in the modelling logic (bug fixes or enhancements), or in
random sampling procedures.

* *to be listed*
- Made a change to :class:`sklearn.ensemble.IsolationForest` by
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you'll just have to list the class :class:sklearn.ensemble.IsolationForest` and the user is expected to ctrl-f it out... Confirm with @jnothman though...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think

* :class:`sklearn.ensemble.IsolationForest` (bug fix)

would more than suffice.

@@ -300,7 +300,7 @@ def _average_path_length(n_samples_leaf):
if n_samples_leaf <= 1:
return 1.
else:
return 2. * (np.log(n_samples_leaf) + 0.5772156649) - 2. * (
return 2. * (np.log(n_samples_leaf - 1.) + 0.5772156649) - 2. * (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we please use np.euler_gamma instead of 0.57721...

# for average path length

assert_almost_equal(_average_path_length(1), 1., decimal=10)
assert_almost_equal(_average_path_length(5), 2.327020052, decimal=10)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd rather this written out as:
2*np.log(4) + 2 * np.euler_gamma − (2 * 4/5)
unless you've got 2.327020052 straight from some reference table.

@@ -28,7 +28,7 @@ cannot assure that this list is complete.)
Changelog
---------

New features
- New features
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be reverted...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, something weird has happened here.

@@ -314,7 +314,7 @@ def _average_path_length(n_samples_leaf):

average_path_length[mask] = 1.
average_path_length[not_mask] = 2. * (
np.log(n_samples_leaf[not_mask]) + 0.5772156649) - 2. * (
np.log(n_samples_leaf[not_mask]) + np.euler_gamma) - 2. * (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW is it correct to not subtract 1 here? @ngoix

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch! so much for our LGTMs... look a bit wider in future?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its subtract 1 in lines 303-304

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be done here too...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to see a separate test for average_path_length testing equivalence between the integer and array cases. Please add.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

look a bit wider in future?

Indeed. Sorry for not being alert to that...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I.e. ensure _average_path_length(999) == _average_path_length(np.array([999]))

@jnothman jnothman changed the title [MRG + 1] FIX Correct depth formula in iforest [MRG + 2] FIX Correct depth formula in iforest Mar 13, 2017
@jnothman
Copy link
Member

LGTM. Will merge once CI approves

@@ -300,7 +301,7 @@ def _average_path_length(n_samples_leaf):
if n_samples_leaf <= 1:
return 1.
else:
return 2. * (np.log(n_samples_leaf) + 0.5772156649) - 2. * (
return 2. * (np.log(n_samples_leaf - 1.) + np.euler_gamma) - 2. * (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now you should reference just euler_gamma, not np.

Copy link
Member

@raghavrv raghavrv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A small enhancement request... With that I'm done here... Thx

assert_almost_equal(_average_path_length(999), result_two, decimal=10)


def test_average_path_length_arr_int():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remove this test and add to the previous test?

assert_array_almost_equal(_...(np.array([1, 5, 999])), [1., result_one, result_two]), deci...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@raghavrv
Copy link
Member

is that unethical?

;) I guess we don't want to do that and have users complaint at numpy "only if I import sklearn, I get euler_gamma" :p

@@ -36,6 +36,9 @@ def _parse_version(version_string):
version.append(x)
return tuple(version)

euler_gamma = getattr(np,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

np and euler_gamma could be put in a single line if you did this for pep8 line limit?

assert_almost_equal(_average_path_length(1), 1., decimal=10)
assert_almost_equal(_average_path_length(5), result_one, decimal=10)
assert_almost_equal(_average_path_length(999), result_two, decimal=10)
assert_almost_equal(_average_path_length(5),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is now redundant and can be removed

@pzbw
Copy link
Contributor Author

pzbw commented Mar 14, 2017

@raghavrv @jnothman changes have all been made, let me know if there's anything more to be done :)

Copy link
Member

@raghavrv raghavrv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patience!


result_one = 2. * (np.log(4.) + euler_gamma) - 2. * 4. / 5.
result_two = 2. * (np.log(998.) + euler_gamma) - 2. * 998. / 999.
assert_array_almost_equal(_average_path_length(np.array([1, 5, 999])),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry if it was unclear, I meant to ask for the removal of the redundant _average_path_length(np.array([1]) == _average_path_length(1) line...

We still need to test the int arguments as they are being handled in a different line of code than if the argument is an array...

i.e. Could you add back the assert_almost_equal(_average_path_length(1), result_one) and assert...(5), result_two) lines?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(The current tests would pass happily even if you revert the changes done at https://github.com/scikit-learn/scikit-learn/pull/8576/files#diff-522aed8770bec9fb385e859d53c63983R304)

@raghavrv raghavrv merged commit 4ab99c7 into scikit-learn:master Mar 14, 2017
@raghavrv
Copy link
Member

Thanks a lot @PtrWang!

@pzbw pzbw deleted the fix-8549 branch March 14, 2017 17:05
@Przemo10 Przemo10 mentioned this pull request Mar 17, 2017
herilalaina pushed a commit to herilalaina/scikit-learn that referenced this pull request Mar 26, 2017
* Fixed depth formula in iforest

* Added non-regression test for issue scikit-learn#8549

* reverted some whitespace changes

* Made changes to what's new and whitespace changes

* Update whats_new.rst

* Update whats_new.rst

* fixed faulty whitespace

* faulty whitespace fix and change to whats new

* added constants to iforest average_path_length and the according non regression test

* COSMIT

* Update whats_new.rst

* Corrected IsolationForest average path formula and added integer array equiv test

* changed line to under 80 char

* Update whats_new.rst

* Update whats_new.rst

* reran tests

* redefine np.euler_gamma

* added import statement for euler_gammma in iforest and test_iforest

* changed np.euler_gamma to euler_gamma

* fix small formatting issue

* fix small formatting issue

* modified average_path_length tests

* formatting fix + removed redundant tests

* fix import error

* retry remote server error

* retry remote server error

* retry remote server error

* re-added some iforest tests

* re-added some iforest tests
massich pushed a commit to massich/scikit-learn that referenced this pull request Apr 26, 2017
* Fixed depth formula in iforest

* Added non-regression test for issue scikit-learn#8549

* reverted some whitespace changes

* Made changes to what's new and whitespace changes

* Update whats_new.rst

* Update whats_new.rst

* fixed faulty whitespace

* faulty whitespace fix and change to whats new

* added constants to iforest average_path_length and the according non regression test

* COSMIT

* Update whats_new.rst

* Corrected IsolationForest average path formula and added integer array equiv test

* changed line to under 80 char

* Update whats_new.rst

* Update whats_new.rst

* reran tests

* redefine np.euler_gamma

* added import statement for euler_gammma in iforest and test_iforest

* changed np.euler_gamma to euler_gamma

* fix small formatting issue

* fix small formatting issue

* modified average_path_length tests

* formatting fix + removed redundant tests

* fix import error

* retry remote server error

* retry remote server error

* retry remote server error

* re-added some iforest tests

* re-added some iforest tests
Sundrique pushed a commit to Sundrique/scikit-learn that referenced this pull request Jun 14, 2017
* Fixed depth formula in iforest

* Added non-regression test for issue scikit-learn#8549

* reverted some whitespace changes

* Made changes to what's new and whitespace changes

* Update whats_new.rst

* Update whats_new.rst

* fixed faulty whitespace

* faulty whitespace fix and change to whats new

* added constants to iforest average_path_length and the according non regression test

* COSMIT

* Update whats_new.rst

* Corrected IsolationForest average path formula and added integer array equiv test

* changed line to under 80 char

* Update whats_new.rst

* Update whats_new.rst

* reran tests

* redefine np.euler_gamma

* added import statement for euler_gammma in iforest and test_iforest

* changed np.euler_gamma to euler_gamma

* fix small formatting issue

* fix small formatting issue

* modified average_path_length tests

* formatting fix + removed redundant tests

* fix import error

* retry remote server error

* retry remote server error

* retry remote server error

* re-added some iforest tests

* re-added some iforest tests
NelleV pushed a commit to NelleV/scikit-learn that referenced this pull request Aug 11, 2017
* Fixed depth formula in iforest

* Added non-regression test for issue scikit-learn#8549

* reverted some whitespace changes

* Made changes to what's new and whitespace changes

* Update whats_new.rst

* Update whats_new.rst

* fixed faulty whitespace

* faulty whitespace fix and change to whats new

* added constants to iforest average_path_length and the according non regression test

* COSMIT

* Update whats_new.rst

* Corrected IsolationForest average path formula and added integer array equiv test

* changed line to under 80 char

* Update whats_new.rst

* Update whats_new.rst

* reran tests

* redefine np.euler_gamma

* added import statement for euler_gammma in iforest and test_iforest

* changed np.euler_gamma to euler_gamma

* fix small formatting issue

* fix small formatting issue

* modified average_path_length tests

* formatting fix + removed redundant tests

* fix import error

* retry remote server error

* retry remote server error

* retry remote server error

* re-added some iforest tests

* re-added some iforest tests
paulha pushed a commit to paulha/scikit-learn that referenced this pull request Aug 19, 2017
* Fixed depth formula in iforest

* Added non-regression test for issue scikit-learn#8549

* reverted some whitespace changes

* Made changes to what's new and whitespace changes

* Update whats_new.rst

* Update whats_new.rst

* fixed faulty whitespace

* faulty whitespace fix and change to whats new

* added constants to iforest average_path_length and the according non regression test

* COSMIT

* Update whats_new.rst

* Corrected IsolationForest average path formula and added integer array equiv test

* changed line to under 80 char

* Update whats_new.rst

* Update whats_new.rst

* reran tests

* redefine np.euler_gamma

* added import statement for euler_gammma in iforest and test_iforest

* changed np.euler_gamma to euler_gamma

* fix small formatting issue

* fix small formatting issue

* modified average_path_length tests

* formatting fix + removed redundant tests

* fix import error

* retry remote server error

* retry remote server error

* retry remote server error

* re-added some iforest tests

* re-added some iforest tests
maskani-moh pushed a commit to maskani-moh/scikit-learn that referenced this pull request Nov 15, 2017
* Fixed depth formula in iforest

* Added non-regression test for issue scikit-learn#8549

* reverted some whitespace changes

* Made changes to what's new and whitespace changes

* Update whats_new.rst

* Update whats_new.rst

* fixed faulty whitespace

* faulty whitespace fix and change to whats new

* added constants to iforest average_path_length and the according non regression test

* COSMIT

* Update whats_new.rst

* Corrected IsolationForest average path formula and added integer array equiv test

* changed line to under 80 char

* Update whats_new.rst

* Update whats_new.rst

* reran tests

* redefine np.euler_gamma

* added import statement for euler_gammma in iforest and test_iforest

* changed np.euler_gamma to euler_gamma

* fix small formatting issue

* fix small formatting issue

* modified average_path_length tests

* formatting fix + removed redundant tests

* fix import error

* retry remote server error

* retry remote server error

* retry remote server error

* re-added some iforest tests

* re-added some iforest tests
jwjohnson314 pushed a commit to jwjohnson314/scikit-learn that referenced this pull request Dec 18, 2017
* Fixed depth formula in iforest

* Added non-regression test for issue scikit-learn#8549

* reverted some whitespace changes

* Made changes to what's new and whitespace changes

* Update whats_new.rst

* Update whats_new.rst

* fixed faulty whitespace

* faulty whitespace fix and change to whats new

* added constants to iforest average_path_length and the according non regression test

* COSMIT

* Update whats_new.rst

* Corrected IsolationForest average path formula and added integer array equiv test

* changed line to under 80 char

* Update whats_new.rst

* Update whats_new.rst

* reran tests

* redefine np.euler_gamma

* added import statement for euler_gammma in iforest and test_iforest

* changed np.euler_gamma to euler_gamma

* fix small formatting issue

* fix small formatting issue

* modified average_path_length tests

* formatting fix + removed redundant tests

* fix import error

* retry remote server error

* retry remote server error

* retry remote server error

* re-added some iforest tests

* re-added some iforest tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants