-
-
Notifications
You must be signed in to change notification settings - Fork 26.6k
[MRG+1] Fixes #10393 Fixed error when fitting RidgeCV with integers #10397
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG+1] Fixes #10393 Fixed error when fitting RidgeCV with integers #10397
Conversation
|
negative alphas should raise an error, right? |
|
Yes! Sorry, I got confused and thought it should not raise an error. It's fixed now, testing negative and positive alphas both integers and float. |
jnothman
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use Fixes #10393 in the PR description, rather than something ad hoc like #Fixes issue #10393 so that GitHub knows to close the issue automatically when this is merged.
A first glance:
sklearn/linear_model/ridge.py
Outdated
| error = scorer is None | ||
|
|
||
| for i, alpha in enumerate(self.alphas): | ||
| if float(alpha) < 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need the float cast here...
| ridge = RidgeCV(alphas) | ||
| assert_raises(ValueError, ridge.fit, X, y) | ||
|
|
||
| # Positive alphas |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to test for positive alphas which are float. I think that we already are doing so in all the tests, isn't it?
| ridge = RidgeCV(alphas) | ||
| ridge.fit(X, y) | ||
|
|
||
| # Negative integers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would separate the tests for negative alphas since that they should raise error.
You can make a test called test_ridgecv_neg_alphas() with a parametrize pytest for the integer and floating type.
| decimal=6) | ||
|
|
||
|
|
||
| def test_ridgecv_alphas(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename test_ridgecv_int_alphas
|
|
||
|
|
||
| def test_ridgecv_alphas(): | ||
| # Test that no error is raised when fitting RidgeCV |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would remove this comment since that it is obvious from the renaming from the function
|
|
||
| # Integers | ||
| alphas = (1, 10, 100) | ||
| ridge = RidgeCV(alphas) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
put directly a list when instantiating: RidgeCV(alphas=[1, 10, 100]).
You could also make sure that a numpy array with integer is also converted. In this case use a parametrized test
@pytest.mark.parametrize(
"alphas",
[(np.array([1, 10, 100])),
([1, 10, 100])])
def test_ridge_cv_alphas(alphas):
X = ...
y = ...
ridge = RidgeCV(alphas)
ridge.fit(X, y)|
Also I would make the conversion directly from https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/linear_model/ridge.py#L886
|
sklearn/linear_model/ridge.py
Outdated
| error = scorer is None | ||
|
|
||
| for i, alpha in enumerate(self.alphas): | ||
| if alpha < 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The checking needs to be done outside from the loop. Otherwise we start to compute some stuff to actually break it at the end.
So something like:
if np.any(alphas < 0):
raise ValueError("alphas cannot be negative. Got {} containing some negative value instead.")| # Negative integers | ||
| alphas = (-1, -10, -100) | ||
| ridge = RidgeCV(alphas) | ||
| assert_raises(ValueError, ridge.fit, X, y) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you need to use assert_raises_regex to match the string
d93e99c to
2f71665
Compare
|
Hi! Is it ok now? |
|
If you think the work is complete, please change WIP in the title to MRG |
| ] | ||
|
|
||
| @pytest.mark.parametrize("alpha_input, alpha_expected", testdata_alpha) | ||
| def test_conversion(alpha_input, alpha_expected): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is never executed.
| decimal=6) | ||
|
|
||
|
|
||
| def test_ridgecv_alpha_conversion_to_array(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can remove this line and dedent the rest of this function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, I'm new to tests in python. Already fixed it.
jnothman
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks
|
|
||
| @pytest.mark.parametrize("alpha_input, alpha_expected", testdata_alpha) | ||
| def test_conversion(alpha_input, alpha_expected): | ||
| assert((RidgeCV(alpha_input).get_params()['alphas'] == |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not actually sure what this is trying to test. Is it trying to test that the input is validated and turned into floats before fit? We don't usually do this, because the user may also set them with set_params.
I also don't think this is currently asserting that the alphas are floats, only that they are unchanged or equivalent.
And I think we have common tests which do that. I'm short, I don't think this test adds anything in its current form.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I'll remove it, just put it as a suggestion from the other reviewer. At least I've learned how these tests work.
sklearn/linear_model/ridge.py
Outdated
| normalize=False, random_state=None, solver='auto', tol=0.001) | ||
| """ | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please avoid introducing unnecessary and unrelated changes like this. It makes it hard to review your work, and may introduce merge conflicts for other changes in the works.
sklearn/linear_model/ridge.py
Outdated
| cv=None, gcv_mode=None, | ||
| store_cv_values=False): | ||
| self.alphas = alphas | ||
| self.alphas = np.asarray(alphas, dtype=np.float64) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usually we do not alter parameters in __init__, because they can also be set in other ways. We delay all validation until fit (except in old code)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my mistake, an error of the file. changed again
b6dc752 to
4444390
Compare
| alpha_expected).all()) | ||
|
|
||
|
|
||
| def test_ridgecv_int_alphas(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why did you remove this?
| cv=None, gcv_mode=None, | ||
| store_cv_values=False): | ||
| self.alphas = np.asarray(alphas, dtype=np.float64) | ||
| self.alphas = np.asarray(alphas) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see that this was already done in master. These days we would avoid such validation.
jnothman
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
flake8 error.
Otherwise LGTM
|
Please add an entry to the change log under Bug Fixes at |
f1b47c0 to
4f9d213
Compare
|
|
||
| - :class:`decomposition.IncrementalPCA` in Python 2 (bug fix) | ||
| - :class:`isotonic.IsotonicRegression` (bug fix) | ||
| - :class:`linear_model.ARDRegression` (bug fix) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You seem to have removed some text from v0.20.rst, probably without realising. Please look at your diff and re-add the text you remove.
| DENSE_FILTER = lambda X: X | ||
| SPARSE_FILTER = lambda X: sp.csr_matrix(X) | ||
|
|
||
| def DENSE_FILTER(X): return X |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please avoid changes that are not related to your PR. It makes the review less pleasant for everyone involved. Can you put back the lambdas?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sorry, I don't know why this keeps happening. I had already reverted those changes.
0049219 to
9b4a319
Compare
|
I'm sorry for all the mess, I'm new to open source and did not know how to deal with remote changes and do the pull --rebase, that's why some the parts got removed. I updated the documentation adding my line and then reverted again the issue with lambdas. |
9b4a319 to
e3a6d72
Compare
qinhanmin2014
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall, @mabelvj please try to avoid unrelevant changes (there's still some extra blank lines). Also, please try to fill current line before starting a new line.
doc/whats_new/v0.20.rst
Outdated
| overridden when using parameter ``copy_X=True`` and ``check_input=False``. | ||
| :issue:`10581` by :user:`Yacine Mazari <ymazari>`. | ||
|
|
||
| - Fixed a bug in :class:`linear_model.RidgeCV` where using negative integer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean by this? negative integer -> integer since the bug is mainly about unexpected error when using integer alpha? (negative integer will be rejected right)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, both integers raise error.
|
|
||
| - Add test :func:`estimator_checks.check_methods_subset_invariance` to check | ||
| that estimators methods are invariant if applied to a data subset. | ||
| :issue:`10420` by :user:`Jonathan Ohayon <Johayon>` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Try to get rid of this strange diff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know, the file in the master has already that line. Should I remove it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you actually don't change anything and find it hard to get rid of it, you might just keep it. (Hope there won't be some strange things when merging)
| cv=None, gcv_mode=None, | ||
| store_cv_values=False): | ||
| self.alphas = alphas | ||
| self.alphas = np.asarray(alphas) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why doing so?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was suggested to make that changes a few lines above: to add a conversion in the __init__ and then remove the float. It's done in the init of _RidgeGCV.
| "alphas cannot be negative.", | ||
| ridge.fit, X, y) | ||
|
|
||
| # Negative alphas |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this test is redundant. We don't need too much tests for such a minor issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added that test because the initial error stated: ValueError: Integers to negative integer powers are not allowed. So I had to add a line to raise an error for negative alphas and in the tests I was testing it worked.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't persuade me here but I won't focus too much on that.
I just think such a minor thing doesn't deserve so many tests.
|
@mabelvj Thanks for the explanation. I don't think I'll focus too much on these minor things, so please: |
ec918fa to
e3a6d72
Compare
…negative alphas and added a test
…alphas do not raise error.
…hecking of negative alphas
…lphas, and check alpha conversion to array. Added raise error when any of alphas is negative
… of integer alphas to float
…on of integer alphas to float
e3a6d72 to
7ce9ba4
Compare
qinhanmin2014
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I've pushed some minor change about the format.
|
Thanks @mabelvj :) |
fixes #10393
class _RidgeGCV(LinearModel), lines 1050 and 1052.