Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[MRG + 1] Warn on 1D arrays, addresses #4511 #5152

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

vighneshbirodkar
Copy link
Contributor

Just to reiterate the discussion I had with @amueller in person

I have put a check in check_array and issued a warning. I will proceed to fix the tests now.

@@ -374,6 +378,9 @@ def check_array(array, accept_sparse=None, dtype="numeric", order=None,
if force_all_finite:
_assert_all_finite(array)

if array.ndim == 1:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Surely this is appropriate iff ensure_2d is set?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The check is redundant if ensure_2d is set.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh. So what's wrong with 1d arrays if ensure_2d=False?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a little confused myself now. Here is what I understand

  • Warn against 1D arrays now and raise an error in later versions.
  • If ensure_2d is set, nothing to be done now. In later versions ensure_2d parameter is deprecated ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that the current ensure_2d does the wrong thing. I think we should change the behavior of ensure_2d from reshaping to warning / erroring.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So should I go ahead with ensure_2d then ? It will warn now and throw an error in later versions

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. this is only used internally and we can rename it later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I remove the np.atleast_2d call I get 227 errors in the tests. It's my understanding that this PR should fix all of them ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are probably many errors caused by the same underlying issues. But yes, indeed.

You should be able to just set the SingleDimensionWarning to "error" though (maybe at the top of the file if it is just for testing).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many of these will have fixes in #4511.

@amueller
Copy link
Member

Mentioning #4511 for visibility (mentioning in the title will not show up there).

@vighneshbirodkar
Copy link
Contributor Author

@amueller Down to ~130 errors. I hope these little fixes are correct.

dist = euclidean_distances(
node.centroids_, Y_norm_squared=node.squared_norm_, squared=True)
node.centroids_, Y_norm_squared=squared_norm, squared=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems odd. why does this need to be 2d? Y_norm_squared should be 1d.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many things were internally converted to 2D using the np.atleast2d call, without it, many tests are causing errors.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but then this should probably be fixed in euclidean distances. the Y_norm_squared should be ok to be 1d.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I have made a note of it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can euclidian_distances check for 1d and do a conversion to 2d ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, you can do that.

@vighneshbirodkar
Copy link
Contributor Author

@amueller The tests are passing now. There are 9 places where the new warning thrown, I will fix that soon.

However, you also mentioned creating a common test for every estimator and ensure that a 1D input throws the new warning. But as far as I know, due to the 1D array, something most likely will fail later on.

@@ -351,7 +351,7 @@ def safe_sqr(X, copy=True):
-------
X ** 2 : element wise square
"""
X = check_array(X, accept_sparse=['csr', 'csc', 'coo'])
X = check_array(X, accept_sparse=['csr', 'csc', 'coo'], ensure_2d=False)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is mainly checking for sparse formats and finiteness. Since the _ensure_sparse_format is an internal function, I have kept the check_array call as it is.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fine.

@vighneshbirodkar
Copy link
Contributor Author

@amueller
For the common tests with 1d input, I am leaning towards making an assert_warns call and then catching any exceptions that might occur. Thoughts ?

if ensure_2d:
array = np.atleast_2d(array)
if array.ndim == 1:
warnings.warn("Array needs atleast 2 dimnesions",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo on the warning message

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs a better message, like "Passing 1d arrays as data is deprecated and will be removed in 0.18. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample."

@@ -435,7 +435,12 @@ def kneighbors(self, X, n_neighbors=None, return_distance=True):
neighbors, distances = [], []
bin_queries, max_depth = self._query(X)
for i in range(X.shape[0]):
neighs, dists = self._get_candidates(X[i], max_depth[i],

query = X[i]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wouldn't X[[i]] be easier?

@amueller amueller added this to the 0.17 milestone Aug 31, 2015
@amueller
Copy link
Member

ping @jnothman and @GaelVaroquaux who might be interested.
One question: do we also want to deprecate 1d input in the scalers? It was explicitly allowed there up till now.

@amueller
Copy link
Member

Also maybe ping @ogrisel

@giorgiop
Copy link
Contributor

giorgiop commented Sep 2, 2015

@amueller I was working on this on #5104 for Scalers. So I think the answer is yes.
I have used a 2d-input validator so far (third point in the to do list), but I will remove it to be consistent with the DeprecationWarning

@@ -599,8 +599,10 @@ def log_logistic(X, out=None):
http://fa.bianp.net/blog/2013/numerical-optimizers-for-logistic-regression/
"""
is_1d = X.ndim == 1
X = np.atleast_2d(X)
X = check_array(X, dtype=np.float)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not related to this PR but this should be changed to dtype=np.float64 to be explicitly not platform dependent.

@@ -119,6 +121,8 @@ def test_check_array():
X_csr = sp.csr_matrix(X)
assert_raises(TypeError, check_array, X_csr)
# ensure_2d
# This might not be needed
assert_warns(DeprecationWarning, check_array, [0, 1, 2])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, this is good. leave it.

Passing 1D arrays to check_array, without setting `ensure_2d` to false now
raises a deprecation warning before reshaping it. This will later throw an
error.

All Scaler classes also throw warnings when 1D arrays are passed.

All unit tests/doctests are modified to ensure that no 1D arrays are passed,
except in explicit 1D array tests where the warnings have been silenced.

Additional tests are also included which check for different 1D array cases.

2D array tests with one samples and one features are also added and where
they failed, `check_array` call has been modified to give a more useful error
message
@vighneshbirodkar vighneshbirodkar changed the title Warn on 1D arrays, addresses #4511 [WIP] Warn on 1D arrays, addresses #4511 [MRG] Sep 4, 2015
@@ -185,7 +185,7 @@ Now you can predict new values, in particular, we can ask to the
classifier what is the digit of our last image in the ``digits`` dataset,
which we have not used to train the classifier::

>>> clf.predict(digits.data[-1])
>>> clf.predict([digits.data[-1]])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we be consistent and also use [-1:] here?

@amueller amueller changed the title Warn on 1D arrays, addresses #4511 [MRG] [MRG + 1] Warn on 1D arrays, addresses #4511 [MRG] Sep 4, 2015
@amueller
Copy link
Member

amueller commented Sep 4, 2015

LGTM apart from minor nitpicks

@amueller amueller changed the title [MRG + 1] Warn on 1D arrays, addresses #4511 [MRG] [MRG + 1] Warn on 1D arrays, addresses #4511 Sep 4, 2015
@amueller
Copy link
Member

amueller commented Sep 8, 2015

@ogrisel wdyt?

@ogrisel
Copy link
Member

ogrisel commented Sep 9, 2015

I will squash the commit, merge by rebase and document the change in whats_new.rst.

@ogrisel
Copy link
Member

ogrisel commented Sep 9, 2015

Merged as 2f09933. Thanks @vighneshbirodkar!

@ogrisel ogrisel closed this Sep 9, 2015
@amueller
Copy link
Member

amueller commented Sep 9, 2015

Thanks @vighneshbirodkar, this is really helpful. Thanks for the review @ogrisel.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants