[MRG+1] FIX Correct depth formula in iforest #8576

pzbw · 2017-03-12T04:05:07Z

Reference Issue

What does this implement/fix? Explain your changes.

Fixed issue #8549

Any other comments?

jnothman · 2017-03-12T10:42:27Z

Would it be reasonable to add a non-regression test?

pzbw · 2017-03-12T14:14:22Z

@jnothman the change is made in a private method for the IsolationForest class and it appears that the tests in: scikit-learn/sklearn/ensemble/tests/test_iforest.py are passing.

raghavrv · 2017-03-12T14:34:39Z

@PtrWang Thanks for the PR. This needs s regression test as @jnothman pointed out. Indeed the tests in test_iforest.py pass, but that indicates a lack in the test which gave way to a bug, so it would be nice to have a test which will fail without your bugfix :)

jnothman · 2017-03-12T21:14:58Z

yes, but they were paying when it was broken too. so we should add a test which checks for the correct behaviour.

…

On 13 Mar 2017 1:14 am, "Peter Wang" ***@***.***> wrote: @jnothman <https://github.com/jnothman> the change is made in a private method for the IsolationForest class and it appears that the tests in: scikit-learn/sklearn/ensemble/tests/test_iforest.py are passing. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#8576 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz66rvCh1lc4aD4-MgHFUZ67zSmsETks5rk_2_gaJpZM4MabcV> .

raghavrv · 2017-03-12T23:12:10Z

Thanks for the test. Does it fail in master branch?

raghavrv · 2017-03-12T23:12:28Z

A whatsnew and this is good to go...

jnothman · 2017-03-12T23:15:55Z

Also needs to be added to list of models with changed behaviour in what's new

raghavrv · 2017-03-12T23:45:53Z

(Please add a line at https://github.com/scikit-learn/scikit-learn/pull/8576/files/6de409d809ddf706d3f229d9ebf2377226970479..29fa2c0b567cffa6e5068a0fc4ef98303998489a#diff-3433b590ca8611f80d7687b51a869383R22)

raghavrv · 2017-03-13T00:04:28Z

doc/whats_new.rst

@@ -18,7 +18,9 @@ parameters, may produce different models from the previous version. This often
 occurs due to changes in the modelling logic (bug fixes or enhancements), or in
 random sampling procedures.

-* *to be listed*
+   - Made a change to :class:`sklearn.ensemble.IsolationForest` by 


I think you'll just have to list the class :class:sklearn.ensemble.IsolationForest` and the user is expected to ctrl-f it out... Confirm with @jnothman though...

Yes, I think

* :class:`sklearn.ensemble.IsolationForest` (bug fix)

would more than suffice.

…to fix-8549

jnothman · 2017-03-13T00:22:36Z

sklearn/ensemble/iforest.py

@@ -300,7 +300,7 @@ def _average_path_length(n_samples_leaf):
        if n_samples_leaf <= 1:
            return 1.
        else:
-            return 2. * (np.log(n_samples_leaf) + 0.5772156649) - 2. * (
+            return 2. * (np.log(n_samples_leaf - 1.) + 0.5772156649) - 2. * (


can we please use np.euler_gamma instead of 0.57721...

jnothman · 2017-03-13T00:24:29Z

sklearn/ensemble/tests/test_iforest.py

+    # for average path length
+
+    assert_almost_equal(_average_path_length(1), 1., decimal=10)
+    assert_almost_equal(_average_path_length(5), 2.327020052, decimal=10)


I think I'd rather this written out as:
2*np.log(4) + 2 * np.euler_gamma − (2 * 4/5)
unless you've got 2.327020052 straight from some reference table.

…regression test

raghavrv · 2017-03-13T02:39:38Z

doc/whats_new.rst

@@ -28,7 +28,7 @@ cannot assure that this list is complete.)
 Changelog
 ---------

-New features
+- New features


I think this should be reverted...

Yes, something weird has happened here.

raghavrv · 2017-03-13T02:41:11Z

sklearn/ensemble/iforest.py

@@ -314,7 +314,7 @@ def _average_path_length(n_samples_leaf):

        average_path_length[mask] = 1.
        average_path_length[not_mask] = 2. * (
-            np.log(n_samples_leaf[not_mask]) + 0.5772156649) - 2. * (
+            np.log(n_samples_leaf[not_mask]) + np.euler_gamma) - 2. * (


BTW is it correct to not subtract 1 here? @ngoix

good catch! so much for our LGTMs... look a bit wider in future?

Its subtract 1 in lines 303-304

I think it should be done here too...

I'd like to see a separate test for average_path_length testing equivalence between the integer and array cases. Please add.

look a bit wider in future?

Indeed. Sorry for not being alert to that...

I.e. ensure _average_path_length(999) == _average_path_length(np.array([999]))

jnothman · 2017-03-13T02:41:36Z

LGTM. Will merge once CI approves

jnothman · 2017-03-13T10:02:42Z

sklearn/ensemble/iforest.py

@@ -300,7 +301,7 @@ def _average_path_length(n_samples_leaf):
        if n_samples_leaf <= 1:
            return 1.
        else:
-            return 2. * (np.log(n_samples_leaf) + 0.5772156649) - 2. * (
+            return 2. * (np.log(n_samples_leaf - 1.) + np.euler_gamma) - 2. * (


now you should reference just euler_gamma, not np.

raghavrv

A small enhancement request... With that I'm done here... Thx

raghavrv · 2017-03-13T23:31:02Z

sklearn/ensemble/tests/test_iforest.py

+    assert_almost_equal(_average_path_length(999), result_two, decimal=10)
+
+
+def test_average_path_length_arr_int():


Can you remove this test and add to the previous test?

assert_array_almost_equal(_...(np.array([1, 5, 999])), [1., result_one, result_two]), deci...)

raghavrv · 2017-03-13T23:32:53Z

is that unethical?

;) I guess we don't want to do that and have users complaint at numpy "only if I import sklearn, I get euler_gamma" :p

raghavrv · 2017-03-13T23:58:15Z

sklearn/utils/fixes.py

@@ -36,6 +36,9 @@ def _parse_version(version_string):
            version.append(x)
    return tuple(version)

+euler_gamma = getattr(np,


np and euler_gamma could be put in a single line if you did this for pep8 line limit?

raghavrv · 2017-03-13T23:58:40Z

sklearn/ensemble/tests/test_iforest.py

+    assert_almost_equal(_average_path_length(1), 1., decimal=10)
+    assert_almost_equal(_average_path_length(5), result_one, decimal=10)
+    assert_almost_equal(_average_path_length(999), result_two, decimal=10)
+    assert_almost_equal(_average_path_length(5),


This test is now redundant and can be removed

pzbw · 2017-03-14T04:13:56Z

@raghavrv @jnothman changes have all been made, let me know if there's anything more to be done :)

raghavrv

Thanks for the patience!

raghavrv · 2017-03-14T07:55:45Z

sklearn/ensemble/tests/test_iforest.py

+
+    result_one = 2. * (np.log(4.) + euler_gamma) - 2. * 4. / 5.
+    result_two = 2. * (np.log(998.) + euler_gamma) - 2. * 998. / 999.
+    assert_array_almost_equal(_average_path_length(np.array([1, 5, 999])),


Sorry if it was unclear, I meant to ask for the removal of the redundant _average_path_length(np.array([1]) == _average_path_length(1) line...

We still need to test the int arguments as they are being handled in a different line of code than if the argument is an array...

i.e. Could you add back the assert_almost_equal(_average_path_length(1), result_one) and assert...(5), result_two) lines?

(The current tests would pass happily even if you revert the changes done at https://github.com/scikit-learn/scikit-learn/pull/8576/files#diff-522aed8770bec9fb385e859d53c63983R304)

raghavrv · 2017-03-14T15:50:35Z

Thanks a lot @PtrWang!

* Fixed depth formula in iforest * Added non-regression test for issue scikit-learn#8549 * reverted some whitespace changes * Made changes to what's new and whitespace changes * Update whats_new.rst * Update whats_new.rst * fixed faulty whitespace * faulty whitespace fix and change to whats new * added constants to iforest average_path_length and the according non regression test * COSMIT * Update whats_new.rst * Corrected IsolationForest average path formula and added integer array equiv test * changed line to under 80 char * Update whats_new.rst * Update whats_new.rst * reran tests * redefine np.euler_gamma * added import statement for euler_gammma in iforest and test_iforest * changed np.euler_gamma to euler_gamma * fix small formatting issue * fix small formatting issue * modified average_path_length tests * formatting fix + removed redundant tests * fix import error * retry remote server error * retry remote server error * retry remote server error * re-added some iforest tests * re-added some iforest tests

Fixed depth formula in iforest

16729ac

raghavrv changed the title ~~Fixed depth formula in iforest~~ [WIP] FIX Correct depth formula in iforest Mar 12, 2017

raghavrv added the Bug label Mar 12, 2017

raghavrv requested review from raghavrv and jnothman March 12, 2017 14:35

Peter Wang added 2 commits March 12, 2017 18:49

Added non-regression test for issue scikit-learn#8549

86ab126

reverted some whitespace changes

6de409d

raghavrv changed the title ~~[WIP] FIX Correct depth formula in iforest~~ [MRG + 1] FIX Correct depth formula in iforest Mar 12, 2017

Made changes to what's new and whitespace changes

29fa2c0

Peter Wang added 2 commits March 12, 2017 19:46

Update whats_new.rst

0832c73

Update whats_new.rst

e5e40b3

raghavrv reviewed Mar 13, 2017

View reviewed changes

Peter Wang added 3 commits March 12, 2017 20:20

fixed faulty whitespace

5df8e14

Merge branch 'fix-8549' of https://github.com/PTRWang/scikit-learn in…

b42d763

…to fix-8549

faulty whitespace fix and change to whats new

df5acc4

jnothman reviewed Mar 13, 2017

View reviewed changes

added constants to iforest average_path_length and the according non …

06225a9

…regression test

raghavrv reviewed Mar 13, 2017

View reviewed changes

COSMIT

aaaea54

raghavrv reviewed Mar 13, 2017

View reviewed changes

jnothman changed the title ~~[MRG + 1] FIX Correct depth formula in iforest~~ [MRG + 2] FIX Correct depth formula in iforest Mar 13, 2017

added import statement for euler_gammma in iforest and test_iforest

477f50e

jnothman reviewed Mar 13, 2017

View reviewed changes

Peter Wang added 3 commits March 13, 2017 09:31

changed np.euler_gamma to euler_gamma

9a37bad

fix small formatting issue

41a4a32

fix small formatting issue

9cbae33

raghavrv suggested changes Mar 13, 2017

View reviewed changes

modified average_path_length tests

c9bba59

raghavrv suggested changes Mar 13, 2017

View reviewed changes

Peter Wang and others added 5 commits March 13, 2017 20:02

formatting fix + removed redundant tests

a36870f

fix import error

6d887f4

retry remote server error

e7f98a8

retry remote server error

68b40a7

retry remote server error

d3dc543

raghavrv suggested changes Mar 14, 2017

View reviewed changes

Peter Wang added 2 commits March 14, 2017 10:55

re-added some iforest tests

2e040dc

re-added some iforest tests

d2084b4

raghavrv approved these changes Mar 14, 2017

View reviewed changes

raghavrv merged commit 4ab99c7 into scikit-learn:master Mar 14, 2017

pzbw deleted the fix-8549 branch March 14, 2017 17:05

Przemo10 mentioned this pull request Mar 17, 2017

update fork (#1) #8606

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MRG+1] FIX Correct depth formula in iforest #8576

[MRG+1] FIX Correct depth formula in iforest #8576

pzbw commented Mar 12, 2017 •

edited

Loading

jnothman commented Mar 12, 2017

pzbw commented Mar 12, 2017

raghavrv commented Mar 12, 2017

jnothman commented Mar 12, 2017 via email

raghavrv commented Mar 12, 2017

raghavrv commented Mar 12, 2017

jnothman commented Mar 12, 2017 via email •

edited by TomDLT

Loading

raghavrv commented Mar 12, 2017

raghavrv Mar 13, 2017

jnothman Mar 13, 2017

jnothman Mar 13, 2017

jnothman Mar 13, 2017

raghavrv Mar 13, 2017

jnothman Mar 13, 2017

raghavrv Mar 13, 2017

jnothman Mar 13, 2017

pzbw Mar 13, 2017

raghavrv Mar 13, 2017

jnothman Mar 13, 2017

raghavrv Mar 13, 2017

jnothman Mar 13, 2017

jnothman commented Mar 13, 2017

jnothman Mar 13, 2017

raghavrv left a comment

raghavrv Mar 13, 2017

pzbw Mar 13, 2017

raghavrv commented Mar 13, 2017

raghavrv Mar 13, 2017

raghavrv Mar 13, 2017

pzbw commented Mar 14, 2017

raghavrv left a comment

raghavrv Mar 14, 2017

raghavrv Mar 14, 2017

raghavrv commented Mar 14, 2017

		assert_almost_equal(_average_path_length(999), result_two, decimal=10)


		def test_average_path_length_arr_int():

[MRG+1] FIX Correct depth formula in iforest #8576

[MRG+1] FIX Correct depth formula in iforest #8576

Conversation

pzbw commented Mar 12, 2017 • edited Loading

Reference Issue

What does this implement/fix? Explain your changes.

Any other comments?

jnothman commented Mar 12, 2017

pzbw commented Mar 12, 2017

raghavrv commented Mar 12, 2017

jnothman commented Mar 12, 2017 via email

raghavrv commented Mar 12, 2017

raghavrv commented Mar 12, 2017

jnothman commented Mar 12, 2017 via email • edited by TomDLT Loading

raghavrv commented Mar 12, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jnothman commented Mar 13, 2017

Choose a reason for hiding this comment

raghavrv left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

raghavrv commented Mar 13, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pzbw commented Mar 14, 2017

raghavrv left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

raghavrv commented Mar 14, 2017

pzbw commented Mar 12, 2017 •

edited

Loading

jnothman commented Mar 12, 2017 via email •

edited by TomDLT

Loading