-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
[MRG+1] FIX unstable cumsum #7376
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Failure in |
7198a3f
to
64b244b
Compare
out = np.cumsum(arr, dtype=np.float64) | ||
expected = np.sum(arr, dtype=np.float64) | ||
if not np.allclose(out[-1], expected, rtol=rtol, atol=atol): | ||
if np_version < (1, 9): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you should add a comment explaining why we skip the check in this case
LGTM |
@@ -333,7 +334,7 @@ def make_multilabel_classification(n_samples=100, n_features=20, n_classes=5, | |||
generator = check_random_state(random_state) | |||
p_c = generator.rand(n_classes) | |||
p_c /= p_c.sum() | |||
cumulative_p_c = np.cumsum(p_c) | |||
cumulative_p_c = stable_cumsum(p_c) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is much value. p_c
will always be high precision, and problems of numerical instability in cumulative summing aren't likely to be issue at the scale of "number of classes". Please apply this change with more discretion. It comes at a (small) cost.
63ded1e
to
d771a26
Compare
LGTM |
I've not reviewed this fully yet and don't consider it an immediate priority. I think we should use some discretion with the helper as it is a little more expensive. |
I still haven't taken a good look at these. I'd like to be a bit conservative about it. |
Is the plan that we are going to raise errors on users? As @NelleV, I wouldn't find this very useful for the end users. I would much prefer raising a warning. If we want to control for such problem in our test codebase, we could specifically turn this warning into an error (eg with warnings.simplefilter) during the tests. |
Perhaps you're right. I think an error is better than a silent failure with
indeterminate consequences, but perhaps a warning is better than an error.
I disagree. Non convergence and numerical instibility are classicaly
warnings. Only the user can know whether or not it is a problem. Of
course many users will not look at the warnings. But bad behavior from
users is not a good reason to penalize people who know what they are
doing, and can just move on.
Honestly, if people are using machine learning without checking what they
are doing, they are dangerous. It's not a question understanding what's
going on, it's a question of checking it works.
|
I'm happy to change the behaviour to a warning. |
@yangarbiter do you want to incorporate that change? |
Sure. I can do that! On Thu, Oct 13, 2016, 9:11 PM Joel Nothman [email protected] wrote:
--YY |
I wanted to give my +1 and merge, but there are test failures both in AppVeyor and in travis, with different failure: you seem to have a mixture of tabs and spaces for indentation, and the test needs to be upgraded to test for the warning, and not the RunTime error. |
Sorry about that, I've fixed it. |
dabe6f6
to
d7d003d
Compare
I think we have a +3 for that. |
+1 to merge. Merging. Thanks! |
I hadn't actually looked at this in full, so I hope others gave it a proper On 18 October 2016 at 00:49, Gael Varoquaux [email protected]
|
Thank you too! |
ConvergenceWarning seems like a slightly strange choice for stable_cumsum, doesn't it? Should we not use a RuntimeWarning like numpy seems to do for overflows? |
ConvergenceWarning should probably be a RuntimeWarning. But yes, I suppose On 19 October 2016 at 18:14, Loïc Estève [email protected] wrote:
|
Let me fix that. |
Opened #7922 to replace ConvergenceWarning by RuntimeWarning |
* FIX unstable cumsum in utils.random * equal_nan = true for isclose since numpy < 1.9 sum is as unstable as cumsum, fallback to np.cumsum * added axis parameter to stable_cumsum * FIX unstable sumsum in ensemble.weight_boosting and utils.stats * FIX axis problem in stable_cumsum * FIX unstable cumsum in mixture.gmm and mixture.dpgmm * FIX unstable cumsum in cluster.k_means_, decomposition.pca, and manifold.locally_linear * FIX unstable sumsum in dataset.samples_generator * added docstring for parameter axis of stable_cumsum * added comment for why fall back to np.cumsum when np version < 1.9 * remove unneeded stable_cumsum * added stable_cumsum's axis testing * FIX numpy docstring for make_sparse_spd_matrix * change stable_cumsum from error to warning
* FIX unstable cumsum in utils.random * equal_nan = true for isclose since numpy < 1.9 sum is as unstable as cumsum, fallback to np.cumsum * added axis parameter to stable_cumsum * FIX unstable sumsum in ensemble.weight_boosting and utils.stats * FIX axis problem in stable_cumsum * FIX unstable cumsum in mixture.gmm and mixture.dpgmm * FIX unstable cumsum in cluster.k_means_, decomposition.pca, and manifold.locally_linear * FIX unstable sumsum in dataset.samples_generator * added docstring for parameter axis of stable_cumsum * added comment for why fall back to np.cumsum when np version < 1.9 * remove unneeded stable_cumsum * added stable_cumsum's axis testing * FIX numpy docstring for make_sparse_spd_matrix * change stable_cumsum from error to warning
* FIX unstable cumsum in utils.random * equal_nan = true for isclose since numpy < 1.9 sum is as unstable as cumsum, fallback to np.cumsum * added axis parameter to stable_cumsum * FIX unstable sumsum in ensemble.weight_boosting and utils.stats * FIX axis problem in stable_cumsum * FIX unstable cumsum in mixture.gmm and mixture.dpgmm * FIX unstable cumsum in cluster.k_means_, decomposition.pca, and manifold.locally_linear * FIX unstable sumsum in dataset.samples_generator * added docstring for parameter axis of stable_cumsum * added comment for why fall back to np.cumsum when np version < 1.9 * remove unneeded stable_cumsum * added stable_cumsum's axis testing * FIX numpy docstring for make_sparse_spd_matrix * change stable_cumsum from error to warning
Reference Issue
#7359
What does this implement/fix? Explain your changes.
np.cumsum
is reported unstable when dealing with float32 data or very large arrays of float64 data in #6842. This pull request change them intosklearn.utils.extmath.stable_cumsum
to solve this problem (#7331).