-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
[MRG + 1] Warn on 1D arrays, addresses #4511 #5152
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG + 1] Warn on 1D arrays, addresses #4511 #5152
Conversation
@@ -374,6 +378,9 @@ def check_array(array, accept_sparse=None, dtype="numeric", order=None, | |||
if force_all_finite: | |||
_assert_all_finite(array) | |||
|
|||
if array.ndim == 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Surely this is appropriate iff ensure_2d
is set?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The check is redundant if ensure_2d
is set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh. So what's wrong with 1d arrays if ensure_2d=False
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am a little confused myself now. Here is what I understand
- Warn against 1D arrays now and raise an error in later versions.
- If
ensure_2d
is set, nothing to be done now. In later versionsensure_2d
parameter is deprecated ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that the current ensure_2d
does the wrong thing. I think we should change the behavior of ensure_2d
from reshaping to warning / erroring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So should I go ahead with ensure_2d
then ? It will warn now and throw an error in later versions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. this is only used internally and we can rename it later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I remove the np.atleast_2d
call I get 227 errors in the tests. It's my understanding that this PR should fix all of them ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are probably many errors caused by the same underlying issues. But yes, indeed.
You should be able to just set the SingleDimensionWarning
to "error" though (maybe at the top of the file if it is just for testing).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many of these will have fixes in #4511.
Mentioning #4511 for visibility (mentioning in the title will not show up there). |
@amueller Down to ~130 errors. I hope these little fixes are correct. |
dist = euclidean_distances( | ||
node.centroids_, Y_norm_squared=node.squared_norm_, squared=True) | ||
node.centroids_, Y_norm_squared=squared_norm, squared=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems odd. why does this need to be 2d? Y_norm_squared should be 1d.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many things were internally converted to 2D using the np.atleast2d
call, without it, many tests are causing errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but then this should probably be fixed in euclidean distances. the Y_norm_squared should be ok to be 1d.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I have made a note of it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can euclidian_distances
check for 1d and do a conversion to 2d ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, you can do that.
@amueller The tests are passing now. There are 9 places where the new warning thrown, I will fix that soon. However, you also mentioned creating a common test for every estimator and ensure that a 1D input throws the new warning. But as far as I know, due to the 1D array, something most likely will fail later on. |
@@ -351,7 +351,7 @@ def safe_sqr(X, copy=True): | |||
------- | |||
X ** 2 : element wise square | |||
""" | |||
X = check_array(X, accept_sparse=['csr', 'csc', 'coo']) | |||
X = check_array(X, accept_sparse=['csr', 'csc', 'coo'], ensure_2d=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is mainly checking for sparse formats and finiteness. Since the _ensure_sparse_format
is an internal function, I have kept the check_array
call as it is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's fine.
@amueller |
if ensure_2d: | ||
array = np.atleast_2d(array) | ||
if array.ndim == 1: | ||
warnings.warn("Array needs atleast 2 dimnesions", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo on the warning message
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs a better message, like "Passing 1d arrays as data is deprecated and will be removed in 0.18. Reshape your data either using X.reshape(-1, 1)
if your data has a single feature or X.reshape(1, -1)
if it contains a single sample."
f52f1a7
to
492e364
Compare
@@ -435,7 +435,12 @@ def kneighbors(self, X, n_neighbors=None, return_distance=True): | |||
neighbors, distances = [], [] | |||
bin_queries, max_depth = self._query(X) | |||
for i in range(X.shape[0]): | |||
neighs, dists = self._get_candidates(X[i], max_depth[i], | |||
|
|||
query = X[i] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wouldn't X[[i]]
be easier?
ping @jnothman and @GaelVaroquaux who might be interested. |
Also maybe ping @ogrisel |
@@ -599,8 +599,10 @@ def log_logistic(X, out=None): | |||
http://fa.bianp.net/blog/2013/numerical-optimizers-for-logistic-regression/ | |||
""" | |||
is_1d = X.ndim == 1 | |||
X = np.atleast_2d(X) | |||
X = check_array(X, dtype=np.float) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not related to this PR but this should be changed to dtype=np.float64
to be explicitly not platform dependent.
@@ -119,6 +121,8 @@ def test_check_array(): | |||
X_csr = sp.csr_matrix(X) | |||
assert_raises(TypeError, check_array, X_csr) | |||
# ensure_2d | |||
# This might not be needed | |||
assert_warns(DeprecationWarning, check_array, [0, 1, 2]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, this is good. leave it.
Passing 1D arrays to check_array, without setting `ensure_2d` to false now raises a deprecation warning before reshaping it. This will later throw an error. All Scaler classes also throw warnings when 1D arrays are passed. All unit tests/doctests are modified to ensure that no 1D arrays are passed, except in explicit 1D array tests where the warnings have been silenced. Additional tests are also included which check for different 1D array cases. 2D array tests with one samples and one features are also added and where they failed, `check_array` call has been modified to give a more useful error message
cda640d
to
f760d27
Compare
@@ -185,7 +185,7 @@ Now you can predict new values, in particular, we can ask to the | |||
classifier what is the digit of our last image in the ``digits`` dataset, | |||
which we have not used to train the classifier:: | |||
|
|||
>>> clf.predict(digits.data[-1]) | |||
>>> clf.predict([digits.data[-1]]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we be consistent and also use [-1:]
here?
LGTM apart from minor nitpicks |
@ogrisel wdyt? |
I will squash the commit, merge by rebase and document the change in |
Merged as 2f09933. Thanks @vighneshbirodkar! |
Thanks @vighneshbirodkar, this is really helpful. Thanks for the review @ogrisel. |
Just to reiterate the discussion I had with @amueller in person
I have put a check in
check_array
and issued a warning. I will proceed to fix the tests now.