Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[MRG+1] FIX unstable cumsum #7376

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Oct 17, 2016
Merged

Conversation

yangarbiter
Copy link
Contributor

Reference Issue

#7359

What does this implement/fix? Explain your changes.

np.cumsum is reported unstable when dealing with float32 data or very large arrays of float64 data in #6842. This pull request change them into sklearn.utils.extmath.stable_cumsum to solve this problem (#7331).

@TomDLT
Copy link
Member

TomDLT commented Sep 9, 2016

Failure in test_random_choice_csc is due to a call to stable_cumsum(array([nan])), which fails since np.allclose(np.nan, np.nan) = False.

@yangarbiter yangarbiter force-pushed the cumsum branch 15 times, most recently from 7198a3f to 64b244b Compare September 12, 2016 03:02
@yangarbiter yangarbiter changed the title [WIP] FIX unstable cumsum [MRG] FIX unstable cumsum Sep 12, 2016
out = np.cumsum(arr, dtype=np.float64)
expected = np.sum(arr, dtype=np.float64)
if not np.allclose(out[-1], expected, rtol=rtol, atol=atol):
if np_version < (1, 9):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should add a comment explaining why we skip the check in this case

@TomDLT TomDLT changed the title [MRG] FIX unstable cumsum [MRG+1] FIX unstable cumsum Sep 12, 2016
@TomDLT
Copy link
Member

TomDLT commented Sep 12, 2016

LGTM

@@ -333,7 +334,7 @@ def make_multilabel_classification(n_samples=100, n_features=20, n_classes=5,
generator = check_random_state(random_state)
p_c = generator.rand(n_classes)
p_c /= p_c.sum()
cumulative_p_c = np.cumsum(p_c)
cumulative_p_c = stable_cumsum(p_c)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is much value. p_c will always be high precision, and problems of numerical instability in cumulative summing aren't likely to be issue at the scale of "number of classes". Please apply this change with more discretion. It comes at a (small) cost.

@jnothman jnothman added this to the 0.19 milestone Sep 13, 2016
@yangarbiter yangarbiter force-pushed the cumsum branch 2 times, most recently from 63ded1e to d771a26 Compare September 13, 2016 01:49
@amueller
Copy link
Member

LGTM

@jnothman
Copy link
Member

I've not reviewed this fully yet and don't consider it an immediate priority. I think we should use some discretion with the helper as it is a little more expensive.

@jnothman
Copy link
Member

jnothman commented Oct 8, 2016

I still haven't taken a good look at these. I'd like to be a bit conservative about it.

@GaelVaroquaux
Copy link
Member

Is the plan that we are going to raise errors on users? As @NelleV, I wouldn't find this very useful for the end users. I would much prefer raising a warning. If we want to control for such problem in our test codebase, we could specifically turn this warning into an error (eg with warnings.simplefilter) during the tests.

@GaelVaroquaux
Copy link
Member

GaelVaroquaux commented Oct 8, 2016 via email

@jnothman
Copy link
Member

I'm happy to change the behaviour to a warning.

@jnothman
Copy link
Member

@yangarbiter do you want to incorporate that change?

@yangarbiter
Copy link
Contributor Author

Sure. I can do that!

On Thu, Oct 13, 2016, 9:11 PM Joel Nothman [email protected] wrote:

@yangarbiter https://github.com/yangarbiter do you want to incorporate
that change?


You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub
#7376 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AD_51aIF-MRzujMg9KMrDkJuIsgYjkm0ks5qzi4SgaJpZM4J4_Zp
.

--YY

@GaelVaroquaux
Copy link
Member

I wanted to give my +1 and merge, but there are test failures both in AppVeyor and in travis, with different failure: you seem to have a mixture of tabs and spaces for indentation, and the test needs to be upgraded to test for the warning, and not the RunTime error.

@yangarbiter
Copy link
Contributor Author

Sorry about that, I've fixed it.

@yangarbiter yangarbiter force-pushed the cumsum branch 2 times, most recently from dabe6f6 to d7d003d Compare October 14, 2016 11:21
@tguillemot
Copy link
Contributor

I think we have a +3 for that.
@jnothman ok to merge ?

@GaelVaroquaux
Copy link
Member

+1 to merge. Merging. Thanks!

@GaelVaroquaux GaelVaroquaux merged commit fa59873 into scikit-learn:master Oct 17, 2016
@jnothman
Copy link
Member

I hadn't actually looked at this in full, so I hope others gave it a proper
review. Thanks @yangarbiter.

On 18 October 2016 at 00:49, Gael Varoquaux [email protected]
wrote:

Merged #7376 #7376.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#7376 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAEz61zHw9wtgcA7-SPfWnF-UQ7crMdoks5q03zkgaJpZM4J4_Zp
.

@yangarbiter
Copy link
Contributor Author

Thank you too!

@lesteve
Copy link
Member

lesteve commented Oct 19, 2016

ConvergenceWarning seems like a slightly strange choice for stable_cumsum, doesn't it? Should we not use a RuntimeWarning like numpy seems to do for overflows?

@jnothman
Copy link
Member

ConvergenceWarning should probably be a RuntimeWarning. But yes, I suppose
this is a question of numerical stability rather than parameter choice, so
RuntimeWarning may be more appropriate.

On 19 October 2016 at 18:14, Loïc Estève [email protected] wrote:

ConvergenceWarning seems like a slightly strange choice for stable_cumsum,
doesn't it? Should we not use a RuntimeWarning like numpy seems to do for
overflows?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#7376 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAEz672EeCrXMRZIRpvDjG8gqXVHeMIyks5q1cNggaJpZM4J4_Zp
.

@yangarbiter
Copy link
Contributor Author

yangarbiter commented Oct 19, 2016

Let me fix that.
Do I need to start a new PR?

@lesteve
Copy link
Member

lesteve commented Nov 22, 2016

Opened #7922 to replace ConvergenceWarning by RuntimeWarning

Sundrique pushed a commit to Sundrique/scikit-learn that referenced this pull request Jun 14, 2017
* FIX unstable cumsum in utils.random

* equal_nan = true for isclose
since numpy < 1.9 sum is as unstable as cumsum, fallback to np.cumsum

* added axis parameter to stable_cumsum

* FIX unstable sumsum in ensemble.weight_boosting and utils.stats

* FIX axis problem in stable_cumsum

* FIX unstable cumsum in mixture.gmm and mixture.dpgmm

* FIX unstable cumsum in cluster.k_means_, decomposition.pca, and manifold.locally_linear

* FIX unstable sumsum in dataset.samples_generator

* added docstring for parameter axis of stable_cumsum

* added comment for why fall back to np.cumsum when np version < 1.9

* remove unneeded stable_cumsum

* added stable_cumsum's axis testing

* FIX numpy docstring for make_sparse_spd_matrix

* change stable_cumsum from error to warning
paulha pushed a commit to paulha/scikit-learn that referenced this pull request Aug 19, 2017
* FIX unstable cumsum in utils.random

* equal_nan = true for isclose
since numpy < 1.9 sum is as unstable as cumsum, fallback to np.cumsum

* added axis parameter to stable_cumsum

* FIX unstable sumsum in ensemble.weight_boosting and utils.stats

* FIX axis problem in stable_cumsum

* FIX unstable cumsum in mixture.gmm and mixture.dpgmm

* FIX unstable cumsum in cluster.k_means_, decomposition.pca, and manifold.locally_linear

* FIX unstable sumsum in dataset.samples_generator

* added docstring for parameter axis of stable_cumsum

* added comment for why fall back to np.cumsum when np version < 1.9

* remove unneeded stable_cumsum

* added stable_cumsum's axis testing

* FIX numpy docstring for make_sparse_spd_matrix

* change stable_cumsum from error to warning
maskani-moh pushed a commit to maskani-moh/scikit-learn that referenced this pull request Nov 15, 2017
* FIX unstable cumsum in utils.random

* equal_nan = true for isclose
since numpy < 1.9 sum is as unstable as cumsum, fallback to np.cumsum

* added axis parameter to stable_cumsum

* FIX unstable sumsum in ensemble.weight_boosting and utils.stats

* FIX axis problem in stable_cumsum

* FIX unstable cumsum in mixture.gmm and mixture.dpgmm

* FIX unstable cumsum in cluster.k_means_, decomposition.pca, and manifold.locally_linear

* FIX unstable sumsum in dataset.samples_generator

* added docstring for parameter axis of stable_cumsum

* added comment for why fall back to np.cumsum when np version < 1.9

* remove unneeded stable_cumsum

* added stable_cumsum's axis testing

* FIX numpy docstring for make_sparse_spd_matrix

* change stable_cumsum from error to warning
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants