-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
[MRG + 1]Deprecating 1D inputs in fast_mcd #5234
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Can you add a test that it does throw a deprecation warning with 1d input? |
@amueller |
LGTM. Now it warns that 1d isn't good, at least. Having a warning for the right number of samples is for another day. |
|
||
X = np.arange(100) | ||
try: | ||
assert_warns(DeprecationWarning, fast_mcd, X) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could also check that the DeprectionWarning
is only due to the 1D X using assert_warns_message
, if you don't mind ;)
ping @ogrisel |
@@ -39,6 +40,15 @@ def test_mcd(): | |||
# 1D data set | |||
launch_mcd_on_dataset(500, 1, 100, 0.001, 0.001, 350) | |||
|
|||
def test_fast_mcd(): | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be consistent with the style of the other functions in this file and the project in general, please remove this blank line.
I don't really see the point of raising a deprecation warning before raising an exception with an obscure error message: >>> import numpy as np
>>> from sklearn.covariance import fast_mcd
>>> fast_mcd(np.arange(100))
/volatile/ogrisel/code/scikit-learn/sklearn/utils/validation.py:372: DeprecationWarning: Passing 1d arrays as data is deprecated and will be removed in 0.18. Reshape your data either usingX.reshape(-1, 1) if your data has a single feature orX.reshape(1, -1) if it contains a single sample.
DeprecationWarning)
/volatile/ogrisel/code/scikit-learn/sklearn/covariance/empirical_covariance_.py:75: UserWarning: Only one sample available. You may want to reshape your data array
warnings.warn("Only one sample available. "
Traceback (most recent call last):
File "<ipython-input-4-6294736be2de>", line 1, in <module>
fast_mcd(np.arange(100))
File "/volatile/ogrisel/code/scikit-learn/sklearn/covariance/robust_covariance.py", line 487, in fast_mcd
random_state=random_state)
File "/volatile/ogrisel/code/scikit-learn/sklearn/covariance/robust_covariance.py", line 274, in select_candidates
random_state=random_state))
File "/volatile/ogrisel/code/scikit-learn/sklearn/covariance/robust_covariance.py", line 146, in _c_step
"Singular covariance matrix. "
ValueError: Singular covariance matrix. Please check that the covariance matrix corresponding to the dataset is full rank and that MinCovDet is used with Gaussian-distributed data (or at least data drawn from a unimodal, symmetric distribution. In my opinion it would be better to directly raise a X = check_array(X, ensure_2d=False, ensure_min_samples=2)
if X.ndim == 1:
raise ValueError('Calling fast_mcd on a 1D array is invalid.') The WDYT @amueller? |
@ogrisel There are some other places right now where a 1 sample or 1 feature input doesn't make sense. Would all of them would eventually be fixed this way ? Why not change |
I improved the error messages in #5334 by adding the estimator name when provided: >>> from sklearn.cluster import AgglomerativeClustering
>>> AgglomerativeClustering().fit([[1, 0, -1]])
Traceback (most recent call last):
File "<ipython-input-7-2d36cbf780b4>", line 1, in <module>
AgglomerativeClustering().fit([[1, 0, -1]])
File "/volatile/ogrisel/code/scikit-learn/sklearn/cluster/hierarchical.py", line 716, in fit
X = check_array(X, ensure_min_samples=2, estimator=self)
File "/volatile/ogrisel/code/scikit-learn/sklearn/utils/validation.py", line 403, in check_array
context))
ValueError: Found array with 1 sample(s) (shape=(1, 3)) while a minimum of 2 is required by AgglomerativeClustering. |
For estimators where 1 features does not make sense I think the current state that is improved in #5334 is fine. For the case where |
@ogrisel If I am not wrong the error will be raised 2 versions later right ? Is there anything more you would expect here? |
In 2 versions, 1dim input will be rejected with a stronger, more generic error message like: "estimator expect 2 dimensional array-like as input, got 1 instead". |
Can you please rebase this on top of the current master and change |
@ogrisel I have rebased, but I am not sure I understand what you mean. If I modify |
ef82e61
to
19c72b1
Compare
This will impact only code that use |
The deprecation period spans from 0.17 to 0.19. Starting in 0.19 we will always raise a |
@ogrisel Thank you, done |
@@ -374,6 +374,9 @@ def check_array(array, accept_sparse=None, dtype="numeric", order=None, | |||
|
|||
if ensure_2d: | |||
if array.ndim == 1: | |||
if ensure_min_samples >= 2: | |||
raise ValueError("%s expects at least 2 samples provided " | |||
"in a 2 dimensional array-like input") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The indentation level does not follow PEP8, use a linter such as https://pypi.python.org/pypi/pep8 or better: https://pypi.python.org/pypi/flake8 to spot such issues.
@ogrisel All done, thank you for your patience. |
@@ -40,6 +42,17 @@ def test_mcd(): | |||
launch_mcd_on_dataset(500, 1, 100, 0.001, 0.001, 350) | |||
|
|||
|
|||
def test_fast_mcd_on_invalid_input(): | |||
X = np.arange(100) | |||
assert_raise_message(ValueError, 'expects at least 2 samples', fast_mcd, X) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please check for the presence of the class name in the message: 'fast_mcd expects at least 2 samples'
.
def test_mcd_class_on_invalid_input(): | ||
X = np.arange(100) | ||
mcd = MinCovDet() | ||
assert_raise_message(ValueError, 'expects at least 2 samples', mcd.fit, X) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please check for the presence of the class name in the message: 'MinCovDet expects at least 2 samples'
.
@ogrisel All done |
@vighneshbirodkar can you squash all your changes please? |
Also: I agree, raising a value error with a good message is better. |
Added fast_mcd 1D test fixed syntax error cause py34 was failing raise value error in check_array when min_samples>=2 formatting added checl_array to MCD class and a test for the class pep8 formatting fixed string formatting and modified tests
4cc2152
to
f0121b7
Compare
@amueller Done |
thanks :) |
Thank you very much for your patience @vighneshbirodkar ! Merging now! |
[MRG + 1]Deprecating 1D inputs in fast_mcd
Addresses #4512
I am just waiting to make sure that Travis does not throw the 1D deprecation warning