[MRG+1] FIX unstable cumsum #7376

yangarbiter · 2016-09-09T12:22:06Z

Reference Issue

What does this implement/fix? Explain your changes.

np.cumsum is reported unstable when dealing with float32 data or very large arrays of float64 data in #6842. This pull request change them into sklearn.utils.extmath.stable_cumsum to solve this problem (#7331).

TomDLT · 2016-09-09T15:34:48Z

Failure in test_random_choice_csc is due to a call to stable_cumsum(array([nan])), which fails since np.allclose(np.nan, np.nan) = False.

TomDLT · 2016-09-12T08:09:42Z

sklearn/utils/extmath.py

-    out = np.cumsum(arr, dtype=np.float64)
-    expected = np.sum(arr, dtype=np.float64)
-    if not np.allclose(out[-1], expected, rtol=rtol, atol=atol):
+    if np_version < (1, 9):


you should add a comment explaining why we skip the check in this case

TomDLT · 2016-09-12T09:08:02Z

LGTM

jnothman · 2016-09-12T12:39:11Z

sklearn/datasets/samples_generator.py

@@ -333,7 +334,7 @@ def make_multilabel_classification(n_samples=100, n_features=20, n_classes=5,
    generator = check_random_state(random_state)
    p_c = generator.rand(n_classes)
    p_c /= p_c.sum()
-    cumulative_p_c = np.cumsum(p_c)
+    cumulative_p_c = stable_cumsum(p_c)


I don't think this is much value. p_c will always be high precision, and problems of numerical instability in cumulative summing aren't likely to be issue at the scale of "number of classes". Please apply this change with more discretion. It comes at a (small) cost.

amueller · 2016-09-13T21:01:25Z

LGTM

jnothman · 2016-09-13T22:05:56Z

I've not reviewed this fully yet and don't consider it an immediate priority. I think we should use some discretion with the helper as it is a little more expensive.

jnothman · 2016-10-08T10:13:58Z

I still haven't taken a good look at these. I'd like to be a bit conservative about it.

GaelVaroquaux · 2016-10-08T13:41:17Z

Is the plan that we are going to raise errors on users? As @NelleV, I wouldn't find this very useful for the end users. I would much prefer raising a warning. If we want to control for such problem in our test codebase, we could specifically turn this warning into an error (eg with warnings.simplefilter) during the tests.

GaelVaroquaux · 2016-10-08T13:58:16Z

Perhaps you're right. I think an error is better than a silent failure with indeterminate consequences, but perhaps a warning is better than an error.

I disagree. Non convergence and numerical instibility are classicaly warnings. Only the user can know whether or not it is a problem. Of course many users will not look at the warnings. But bad behavior from users is not a good reason to penalize people who know what they are doing, and can just move on. Honestly, if people are using machine learning without checking what they are doing, they are dangerous. It's not a question understanding what's going on, it's a question of checking it works.

jnothman · 2016-10-13T13:10:34Z

I'm happy to change the behaviour to a warning.

jnothman · 2016-10-13T13:10:58Z

@yangarbiter do you want to incorporate that change?

yangarbiter · 2016-10-13T13:42:44Z

Sure. I can do that!

On Thu, Oct 13, 2016, 9:11 PM Joel Nothman [email protected] wrote:

@yangarbiter https://github.com/yangarbiter do you want to incorporate
that change?

—
You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub
#7376 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AD_51aIF-MRzujMg9KMrDkJuIsgYjkm0ks5qzi4SgaJpZM4J4_Zp
.

--YY

GaelVaroquaux · 2016-10-14T09:47:34Z

I wanted to give my +1 and merge, but there are test failures both in AppVeyor and in travis, with different failure: you seem to have a mixture of tabs and spaces for indentation, and the test needs to be upgraded to test for the warning, and not the RunTime error.

yangarbiter · 2016-10-14T10:33:37Z

Sorry about that, I've fixed it.

tguillemot · 2016-10-17T13:21:13Z

I think we have a +3 for that.
@jnothman ok to merge ?

GaelVaroquaux · 2016-10-17T13:49:18Z

+1 to merge. Merging. Thanks!

jnothman · 2016-10-18T10:08:05Z

I hadn't actually looked at this in full, so I hope others gave it a proper
review. Thanks @yangarbiter.

On 18 October 2016 at 00:49, Gael Varoquaux [email protected]
wrote:

Merged #7376 #7376.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#7376 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAEz61zHw9wtgcA7-SPfWnF-UQ7crMdoks5q03zkgaJpZM4J4_Zp
.

yangarbiter · 2016-10-18T12:00:53Z

Thank you too!

lesteve · 2016-10-19T07:14:39Z

ConvergenceWarning seems like a slightly strange choice for stable_cumsum, doesn't it? Should we not use a RuntimeWarning like numpy seems to do for overflows?

jnothman · 2016-10-19T12:12:35Z

ConvergenceWarning should probably be a RuntimeWarning. But yes, I suppose
this is a question of numerical stability rather than parameter choice, so
RuntimeWarning may be more appropriate.

On 19 October 2016 at 18:14, Loïc Estève [email protected] wrote:

ConvergenceWarning seems like a slightly strange choice for stable_cumsum,
doesn't it? Should we not use a RuntimeWarning like numpy seems to do for
overflows?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#7376 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAEz672EeCrXMRZIRpvDjG8gqXVHeMIyks5q1cNggaJpZM4J4_Zp
.

yangarbiter · 2016-10-19T12:38:48Z

Let me fix that.
Do I need to start a new PR?

lesteve · 2016-11-22T09:30:46Z

Opened #7922 to replace ConvergenceWarning by RuntimeWarning

* FIX unstable cumsum in utils.random * equal_nan = true for isclose since numpy < 1.9 sum is as unstable as cumsum, fallback to np.cumsum * added axis parameter to stable_cumsum * FIX unstable sumsum in ensemble.weight_boosting and utils.stats * FIX axis problem in stable_cumsum * FIX unstable cumsum in mixture.gmm and mixture.dpgmm * FIX unstable cumsum in cluster.k_means_, decomposition.pca, and manifold.locally_linear * FIX unstable sumsum in dataset.samples_generator * added docstring for parameter axis of stable_cumsum * added comment for why fall back to np.cumsum when np version < 1.9 * remove unneeded stable_cumsum * added stable_cumsum's axis testing * FIX numpy docstring for make_sparse_spd_matrix * change stable_cumsum from error to warning

yangarbiter force-pushed the cumsum branch from 15fe9d5 to 34d8eef Compare September 9, 2016 13:34

yangarbiter force-pushed the cumsum branch 15 times, most recently from 7198a3f to 64b244b Compare September 12, 2016 03:02

yangarbiter changed the title ~~[WIP] FIX unstable cumsum~~ [MRG] FIX unstable cumsum Sep 12, 2016

TomDLT reviewed Sep 12, 2016
View reviewed changes

TomDLT changed the title ~~[MRG] FIX unstable cumsum~~ [MRG+1] FIX unstable cumsum Sep 12, 2016

jnothman reviewed Sep 12, 2016
View reviewed changes

jnothman added this to the 0.19 milestone Sep 13, 2016

yangarbiter force-pushed the cumsum branch 2 times, most recently from 63ded1e to d771a26 Compare September 13, 2016 01:49

yangarbiter added 3 commits October 8, 2016 10:01

remove unneeded stable_cumsum

4247926

added stable_cumsum's axis testing

8a87a9c

FIX numpy docstring for make_sparse_spd_matrix

20e0724

yangarbiter force-pushed the cumsum branch from 507b89f to 20e0724 Compare October 8, 2016 02:01

yangarbiter force-pushed the cumsum branch from eb8dac5 to fb8cd36 Compare October 14, 2016 10:32

yangarbiter force-pushed the cumsum branch 2 times, most recently from dabe6f6 to d7d003d Compare October 14, 2016 11:21

change stable_cumsum from error to warning

fd9a02e

yangarbiter force-pushed the cumsum branch from d7d003d to fd9a02e Compare October 14, 2016 11:35

GaelVaroquaux merged commit fa59873 into scikit-learn:master Oct 17, 2016

lesteve mentioned this pull request Nov 22, 2016

[MRG + 1] Replace ConvergenceWarning by RuntimeWarning when cumsum in unstable #7922

Merged

Uh oh!

[MRG+1] FIX unstable cumsum #7376

[MRG+1] FIX unstable cumsum #7376

Uh oh!

Conversation

yangarbiter commented Sep 9, 2016

Reference Issue

What does this implement/fix? Explain your changes.

Uh oh!

TomDLT commented Sep 9, 2016

Uh oh!

TomDLT Sep 12, 2016

Choose a reason for hiding this comment

Uh oh!

TomDLT commented Sep 12, 2016

Uh oh!

jnothman Sep 12, 2016

Choose a reason for hiding this comment

Uh oh!

amueller commented Sep 13, 2016

Uh oh!

jnothman commented Sep 13, 2016

Uh oh!

jnothman commented Oct 8, 2016

Uh oh!

GaelVaroquaux commented Oct 8, 2016

Uh oh!

GaelVaroquaux commented Oct 8, 2016 via email

Uh oh!

jnothman commented Oct 13, 2016

Uh oh!

jnothman commented Oct 13, 2016

Uh oh!

yangarbiter commented Oct 13, 2016

Uh oh!

GaelVaroquaux commented Oct 14, 2016

Uh oh!

yangarbiter commented Oct 14, 2016

Uh oh!

tguillemot commented Oct 17, 2016

Uh oh!

GaelVaroquaux commented Oct 17, 2016

Uh oh!

jnothman commented Oct 18, 2016

Uh oh!

yangarbiter commented Oct 18, 2016

Uh oh!

lesteve commented Oct 19, 2016

Uh oh!

jnothman commented Oct 19, 2016

Uh oh!

yangarbiter commented Oct 19, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lesteve commented Nov 22, 2016

Uh oh!

Uh oh!

yangarbiter commented Oct 19, 2016 •

edited

Loading