Thanks to visit codestin.com
Credit goes to github.com

Skip to content

sklearn.preprocessing.MinMaxScaler not preserving symmetry / Add axis=None #4892

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
alessiob opened this issue Jun 24, 2015 · 17 comments
Closed
Labels

Comments

@alessiob
Copy link

MinMaxScaler does not preserve symmetry.

scikit-learn (0.15.2) and scikit-learn (0.16.1)
Windows 7 SP 1 64 bit
Python 2.7.9 32 bit

An affected numpy matrix and the script to reproduce the problem are available at: https://www.dropbox.com/s/vkcuq71wa69jrw7/sklearn-bug.tar?dl=0

@TomDLT
Copy link
Member

TomDLT commented Jun 24, 2015

# A more simple example:
array([[  1.,   2.],
       [  2.,  10.]])
# will be transformed in:
array([[ 0.,  0.],
       [ 1.,  1.]])

This is not a bug.
MinMaxScaler standardizes each feature (column) individually.
The docstring says:

This estimator scales and translates each feature individually such
that it is in the given range on the training set, i.e. between zero and one.

@jnothman
Copy link
Member

Perhaps we should consider supporting axis=None? Ping @untom.

@untom
Copy link
Contributor

untom commented Jun 25, 2015

Is there a common-enough use-case to add axis=None (I can't think of one)?

In a pinch, the same result can be had by using ravel() on the input and reshape() on the result of the scaler.

@alessiob
Copy link
Author

Thanks for the answers and my apologies.
axis=None would be very useful in my case.

@jnothman
Copy link
Member

ravel and reshape is not a pretty operation to achieve in a pipeline!

On 26 June 2015 at 02:10, Alessio Bazzica [email protected] wrote:

Thanks for the answers and my apologies.
axis=None would be very useful in my case.


Reply to this email directly or view it on GitHub
#4892 (comment)
.

@amueller
Copy link
Member

amueller commented Jul 1, 2015

Is this for a pairwise distance? Preprocessors apart from the KernelCenterer are not really supposed to be used on that.

@jnothman
Copy link
Member

jnothman commented Jul 2, 2015

Is there a reason not to support axis=None, @amueller (except in sparse
where it requires real additional work)?

On 2 July 2015 at 06:57, Andreas Mueller [email protected] wrote:

Is this for a pairwise distance? Preprocessors apart from the
KernelCenterer are not really supposed to be used on that.


Reply to this email directly or view it on GitHub
#4892 (comment)
.

@amueller
Copy link
Member

No, I think it would actually be cool.

@amueller amueller added Easy Well-defined and straightforward way to resolve Documentation Need Contributor labels Jul 11, 2015
@amueller amueller changed the title sklearn.preprocessing.MinMaxScaler not preserving symmetry sklearn.preprocessing.MinMaxScaler not preserving symmetry / Add axis=None Jul 11, 2015
@stephen-hoover
Copy link
Contributor

I can work on this, but it appears that none of the Scalers accept an "axis" argument. All of them operate only on single features independent of the other features. Should I add an "axis" argument to all of them, accepting inputs of [0, ..., ndim-1] or None (defaulting to 0)?

@amueller
Copy link
Member

ndim is always 2. I thought we had an axis in the scalers but it seems that is only for the function interface. Which I feel is slightly odd.
Maybe just add 0 or None, defaulting to 0 for now. That would make sense for both MinMaxScaler and StandardScaler.

@untom
Copy link
Contributor

untom commented Jul 11, 2015

I once tried introducing an axis argument to all scalers back in #2514, but IIRC the problem was that having axis=1 does not make sense for scalers and is likely overengineering (see here: #3639 (comment) ). Since then, I have started agreeing with the viewpoint: most of sklearn assumes a Samples by Features format, and normalizing along anything else than features doesn't make much sense. For the rare cases where it is needed, there are the scaling functions.

@amueller
Copy link
Member

we could add the additional axis=None to the function interface? not sure though.

@untom
Copy link
Contributor

untom commented Jul 11, 2015

Not sure how it affects the original issue in this thread (e.g. if the application scenario involves fitting a scaler on a training set and applying it on the test data or not)

@stephen-hoover
Copy link
Contributor

I think that it's useful to allow the "axis=None" option, but that might not be the best option name. What if the Scalers took an option "grouped=False"?

@thomasjpfan
Copy link
Member

@amueller I do not see a use case for axis=None. Without a use case, I am overall -1 on adding this feature.

@thomasjpfan thomasjpfan added Needs Decision - Include Feature Requires decision regarding including feature New Feature and removed Easy Well-defined and straightforward way to resolve Documentation labels Jul 21, 2022
@TomDLT
Copy link
Member

TomDLT commented Jul 21, 2022

Apart from numerical reasons, I don't see any usecase either to scaling all the features in the same way. I would be surprised if some estimators behaved differently depending on the global scale of features.

-1 as well

@thomasjpfan
Copy link
Member

With the comments: #4963 (comment), #4892 (comment), #4892 (comment), I do not think we will include this feature.

@thomasjpfan thomasjpfan closed this as not planned Won't fix, can't repro, duplicate, stale Jul 22, 2022
Repository owner moved this from Todo📬 to Done🚀 in Quansight's scikit-learn Project Board Jul 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

8 participants