Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[MRG + 1] FIX use high precision cumsum and check it is stable enough #7331

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Sep 9, 2016

Conversation

jnothman
Copy link
Member

@jnothman jnothman commented Sep 2, 2016

Fixes #6842. I don't know a test that will run quickly enough.

@TomDLT TomDLT added this to the 0.18 milestone Sep 2, 2016
@TomDLT
Copy link
Member

TomDLT commented Sep 2, 2016

Wow even with float64 there are many errors ...
Small mistake in the check.

"""
out = np.cumsum(arr, dtype=np.float64)
expected = np.sum(arr, dtype=np.float64)
if not np.allclose(out, expected):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if not np.allclose(out[-1], expected):

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought I committed that change :p

@jnothman
Copy link
Member Author

jnothman commented Sep 3, 2016

Sorry for opening a PR and running away :)

@yenchenlin
Copy link
Contributor

LGTM

@jnothman jnothman added the Bug label Sep 3, 2016
@tguillemot
Copy link
Contributor

tguillemot commented Sep 5, 2016

Indeed that could be problematic. LGTM

@TomDLT
Copy link
Member

TomDLT commented Sep 5, 2016

LGTM

@jnothman
Copy link
Member Author

jnothman commented Sep 5, 2016

Any ideas of pathological cases to test with quickly?

@TomDLT
Copy link
Member

TomDLT commented Sep 6, 2016

No idea, except comparing roc_auc_score(y_true, y_pred, sample_weight=sample_weight_32) and  roc_auc_score(y_true, y_pred, sample_weight=sample_weight_64), yet it does not test the bug we want to fix, but the correction we propose, so I am not fond of this solution.

@jnothman
Copy link
Member Author

jnothman commented Sep 6, 2016

I only think it's an effective test for very large vectors, hence
impractical.

On 6 September 2016 at 20:05, Tom Dupré la Tour [email protected]
wrote:

No idea, except comparing roc_auc_score(y_true, y_pred,
sample_weight=sample_weight_32) and roc_auc_score(y_true, y_pred,
sample_weight=sample_weight_64), yet it does not test the bug we want to
fix, but the correction we propose, so I am not fond of this solution.


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#7331 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAEz6zdyrhBjU7xPTazZ20BVeBDqeRrgks5qnTrggaJpZM4JzY5A
.

@amueller
Copy link
Member

amueller commented Sep 6, 2016

how long does a test with a large-enough vector run?

@amueller
Copy link
Member

amueller commented Sep 6, 2016

lgtm

@amueller amueller changed the title [MRG] FIX use high precision cumsum and check it is stable enough [MRG + 1] FIX use high precision cumsum and check it is stable enough Sep 6, 2016
@jnothman
Copy link
Member Author

jnothman commented Sep 7, 2016

import sys
import time
import numpy as np
import pandas as pd
n_trials = 50
all_results = []
for dtype in [np.float32, np.float64]:
    for i in range(3, 8):
        n = 1 * 10 ** i
        absdiff, reldiff = [], []
        s = time.time()
        for j in range(n_trials):
            x = np.random.rand(n).astype(dtype)
            a = np.cumsum(x)[-1]
            b = np.sum(x)
            absdiff.append(np.abs(a - b))
            reldiff.append(absdiff[-1] / b)
        all_results.append((dtype.__name__, n, (time.time() - s) / len(results), np.log10(np.mean(absdiff)), np.log10(np.mean(reldiff))))

pd.DataFrame(all_results, columns=['dtype', 'n', 'time', 'log abs diff', 'log rel diff'])
dtype n time log abs diff log rel diff
float32 1000 0.00 -3.84 -6.54
float32 10000 0.00 -2.23 -5.93
float32 100000 0.00 -0.64 -5.34
float32 1000000 0.02 0.71 -4.99
float32 10000000 0.22 2.28 -4.41
float64 1000 0.00 -12.59 -15.28
float64 10000 0.00 -10.96 -14.66
float64 100000 0.00 -9.47 -14.16
float64 1000000 0.03 -7.96 -13.66
float64 10000000 0.23 -6.40 -13.09

np.allclose has default rtol=1e-5, atol=1e-8

Two questions:

  • can we build a non-regression case where the old code would have been broken? possibly: all the absolute diffs above for float32 exceed the atol; for 1e6 samples we get relative diff just greater than allclose's rtol in 0.02s, but this would be slower embedded in the rest of roc_auc_score.
  • can we get test coverage for the error message here? probably not, without having rtol as a parameter to stable_cumsum. This might be worth doing.

So I'll try build a non-regression test with 1e6 samples.

@jnothman
Copy link
Member Author

jnothman commented Sep 8, 2016

A regression test for roc_auc_score turns out to be too slow.

@jnothman
Copy link
Member Author

jnothman commented Sep 8, 2016

I've added a direct test of stable_cumsum using configurable atol, rtol

@jnothman
Copy link
Member Author

jnothman commented Sep 8, 2016

I suppose the remaining question is: are there other places in the codebase where cumsum is happening and might be over float32 data or very very large arrays?

@lesteve
Copy link
Member

lesteve commented Sep 8, 2016

One of the Travis error will go away if you rebase on master but there is one on Python 2.7 that seems genuine:

======================================================================
FAIL: sklearn.utils.tests.test_extmath.test_stable_cumsum
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/travis/miniconda/envs/testenv/lib/python2.6/site-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/home/travis/sklearn_build_oldest/scikit-learn/sklearn/utils/tests/test_extmath.py", line 656, in test_stable_cumsum
    stable_cumsum, r, rtol=0, atol=0)
  File "/home/travis/sklearn_build_oldest/scikit-learn/sklearn/utils/testing.py", line 438, in assert_raise_message
    (names, function.__name__))
AssertionError: RuntimeError not raised by stable_cumsum

@jnothman
Copy link
Member Author

jnothman commented Sep 8, 2016

One of the Travis error will go away if you rebase on master but there is one on Python 2.7 that seems genuine:

Yeah, I was wondering if there'd be some platform that didn't fail my test....

@jnothman
Copy link
Member Author

jnothman commented Sep 8, 2016

I assume that means numpy used to have an unstable implementation of sum, rather than it used to have a stable implementation of cumsum

@jnothman
Copy link
Member Author

jnothman commented Sep 8, 2016

I wonder: Should I only run the test on recent numpy, or just remove it?

@amueller
Copy link
Member

amueller commented Sep 8, 2016

Hm... testing the error message seems slightly odd since it relies on numpy being broken. I guess having a test with a known correct result that wasn't correct before might be nicer, but not offer the error coverage?

Like

x = np.empty(1e7)
x.fill(1e-7)
np.isclose(stable_cumsum(x)[-1], 1, rtol=0, atol=1e-12)

@jnothman
Copy link
Member Author

jnothman commented Sep 8, 2016

It might just be my bedtime, but I'm not sure I get what test you're suggesting.

@amueller
Copy link
Member

amueller commented Sep 8, 2016

np.isclose(cumsum(x)[-1], 1, rtol=0, atol=1e-12)
fails for np.cumsum currently, but should work for stable_cumsum

@jnothman
Copy link
Member Author

jnothman commented Sep 8, 2016

No, I don't think it will work for stable_cumsum in old numpy. I might be wrong though.

@amueller
Copy link
Member

amueller commented Sep 8, 2016

ah... huh...

@jnothman
Copy link
Member Author

jnothman commented Sep 8, 2016

So just drop the test, I suppose.

@jnothman
Copy link
Member Author

jnothman commented Sep 8, 2016

And merge the PR?

@amueller
Copy link
Member

amueller commented Sep 8, 2016

fine with me

@lesteve
Copy link
Member

lesteve commented Sep 8, 2016

I believe the improved stability of np.sum was done in numpy 1.9: http://docs.scipy.org/doc/numpy/release.html#better-numerical-stability-for-sum-in-some-cases and I also quickly checked that in 1.8 cumsum and sum were giving the same wrong result in one of your snippet but 1.9 was fine. Maybe skip the test for numpy < 1.9?

@@ -648,6 +650,8 @@ def test_softmax():


def test_stable_cumsum():
if np_version < (1, 19):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

19 -> 9 otherwise we may wait a while until this test gets run ;-).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Argh. Making so many errors. Should be in bed. Should take a break from this stuff, too!

@jnothman
Copy link
Member Author

jnothman commented Sep 8, 2016

could please run a check for np1.9, @lesteve?

@lesteve
Copy link
Member

lesteve commented Sep 8, 2016

could please run a check for np1.9, @lesteve?

Not sure what you exactly mean, but here is what I tried:

import numpy as np

np.random.seed(42)
n_samples = 4 * 10 ** 7
y = np.random.randint(2, size=n_samples)
prediction = np.random.normal(size=n_samples) + y * 0.01
trivial_weight = np.ones(n_samples)

print(np.cumsum(trivial_weight.astype('float32'))[-1])
print(np.sum(trivial_weight.astype('float32')))

Output for numpy 1.8:

1.67772e+07
1.67772e+07

Output for numpy 1.9:

1.67772e+07
4e+07

@lesteve
Copy link
Member

lesteve commented Sep 8, 2016

I also made sure that the test was run for numpy 1.9. LGTM will wait for AppVeyor and then merge.

@lesteve
Copy link
Member

lesteve commented Sep 9, 2016

Merging, thanks!

@lesteve lesteve merged commit 49d126f into scikit-learn:master Sep 9, 2016
rsmith54 pushed a commit to rsmith54/scikit-learn that referenced this pull request Sep 14, 2016
…scikit-learn#7331)

* FIX use high precision cumsum and check it is stable enough
TomDLT pushed a commit to TomDLT/scikit-learn that referenced this pull request Oct 3, 2016
…scikit-learn#7331)

* FIX use high precision cumsum and check it is stable enough
Sundrique pushed a commit to Sundrique/scikit-learn that referenced this pull request Jun 14, 2017
…scikit-learn#7331)

* FIX use high precision cumsum and check it is stable enough
paulha pushed a commit to paulha/scikit-learn that referenced this pull request Aug 19, 2017
…scikit-learn#7331)

* FIX use high precision cumsum and check it is stable enough
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants