[MRG+2] add MaxAbsScaler #4828

untom · 2015-06-07T07:55:26Z

This PR adds the MaxAbsScaler and maxabs_scale to sklearn.preprocessing. This scaler scales its inputs by the maximum absolute value of each feature. This scaler is especially useful for sparse data, but is probably also always a better alternative to MinMaxScaler when the data is already centered.

The scaler itself was previously discussed in #1799 and #2514.

untom · 2015-06-07T07:57:21Z

@amueller : I hope this is what you had in mind

amueller · 2015-06-08T14:34:29Z

doc/modules/preprocessing.rst

-lie between a given minimum and maximum value, often between zero and one.
-This can be achieved using :class:`MinMaxScaler`.
+lie between a given minimum and maximum value, often between zero and one,
+or so that the maximum value of each feature is scaled to unit size.


absolute value?

amueller · 2015-06-08T15:31:01Z

Apart from nitpicks and sparse matrix testing LGTM

TomDLT · 2015-06-10T13:47:04Z

doc/modules/preprocessing.rst

+
+
+As with :func:`scale`, the module further provides a
+convenience function function :func:`maxabs_scale` if you don't want to


duplicated word?

TomDLT · 2015-06-10T13:58:57Z

Looks good :)

TomDLT · 2015-06-10T14:07:35Z

Since it is not in RobustScaler anymore, do we want to use _handle_zeros_in_scale here ?

untom · 2015-06-10T14:46:13Z

Thanks for that 2nd review. I've implemented the changes you've suggested and squashed the commits.

jnothman · 2015-06-11T12:35:04Z

doc/modules/preprocessing.rst

@@ -146,6 +148,61 @@ full formula is::

    X_scaled = X_std / (max - min) + min

+:class:`MaxAbsScaler` works in a very similar fashion, but scales data so
+it lies within the range ``[-1, 1]``, and is meant for data


Yet this is not true of the following example. Either qualify the statement or add a trim option to the scaler.

I've changed the text a bit, please have a look to see if you like the new wording better.

jnothman · 2015-06-11T13:16:24Z

Remove backticks, add what's new entry, squash commits, and you can haz merge. LGTM!

[MRG] add MaxAbsScaler

jnothman · 2015-06-11T19:17:30Z

Thanks @untom, for your contribution and your perseverance!

untom · 2015-06-11T19:21:20Z

Thanks for your review and in general for helping out with this!

Jeffrey04 · 2015-10-19T03:15:40Z

I found a small problem (?) with MaxAbsScaler and not sure whether I should file a bug report for this (because 0.17 is not officially released)

So I have my collection of data scaled with MinMaxScaler. Then I need to transform a new sparse matrix with one row (sparse vector?), eg.

  (0, 56839)    0.462743526481
  (0, 55421)    0.469655562306
  (0, 54368)    0.203714596644
  (0, 54060)    0.0962621236939
  (0, 51441)    0.540495850676
  (0, 48518)    0.056152354043
  (0, 45181)    0.0652388777274
  (0, 38682)    0.230776053348
  (0, 31876)    0.199738544715
  (0, 14641)    0.280892719445
  (0, 434)  0.207189026352

However, if I attempt to transform the 1-row sparse matrix above (so I can do comparison with the collection), I get this assertion error

Traceback (most recent call last):
  File "./address-query.py", line 41, in <module>
    main()
  File "./address-query.py", line 28, in main
    query = scaler.transform(query_)[0].toarray()
  File "/Users/jeffrey04/.local/lib/python3.5/site-packages/sklearn/preprocessing/data.py", line 792, in transform
    inplace_row_scale(X, 1.0 / self.scale_)
  File "/Users/jeffrey04/.local/lib/python3.5/site-packages/sklearn/utils/sparsefuncs.py", line 200, in inplace_row_scale
    inplace_csr_row_scale(X, scale)
  File "/Users/jeffrey04/.local/lib/python3.5/site-packages/sklearn/utils/sparsefuncs.py", line 61, in inplace_csr_row_scale
    assert scale.shape[0] == X.shape[0]

The workaround to the problem is to use a matrix that has more than 1 row, due to this part of the code (if I am not mistaken)

        if sparse.issparse(X):
            if X.shape[0] == 1:
                inplace_row_scale(X, self.scale_)
            else:
                inplace_column_scale(X, self.scale_)

jnothman · 2015-10-19T03:37:58Z

Could you please file this as a separate issue, and also report: X.shape
for the training data, X.shape for the new "sparse row vector", and
scaler.scale_.shape. Thank you.

On 19 October 2015 at 14:15, Jeffrey04 [email protected] wrote:

I found a small problem (?) with MaxAbsScaler and not sure whether I
should file a bug report for this

So I have my collection of data scaled with MinMaxScaler. Then I need to
transform a new sparse matrix with one row (sparse vector?), eg.

(0, 56839) 0.462743526481
(0, 55421) 0.469655562306
(0, 54368) 0.203714596644
(0, 54060) 0.0962621236939
(0, 51441) 0.540495850676
(0, 48518) 0.056152354043
(0, 45181) 0.0652388777274
(0, 38682) 0.230776053348
(0, 31876) 0.199738544715
(0, 14641) 0.280892719445
(0, 434) 0.207189026352

However, if I attempt to transform the 1-row sparse matrix above (so I can
do comparison with the collection), I get this assertion error

Traceback (most recent call last):
File "./address-query.py", line 41, in
main()
File "./address-query.py", line 28, in main
query = scaler.transform(query_)[0].toarray()
File "/Users/jeffrey04/.local/lib/python3.5/site-packages/sklearn/preprocessing/data.py", line 792, in transform
inplace_row_scale(X, 1.0 / self.scale_)
File "/Users/jeffrey04/.local/lib/python3.5/site-packages/sklearn/utils/sparsefuncs.py", line 200, in inplace_row_scale
inplace_csr_row_scale(X, scale)
File "/Users/jeffrey04/.local/lib/python3.5/site-packages/sklearn/utils/sparsefuncs.py", line 61, in inplace_csr_row_scale
assert scale.shape[0] == X.shape[0]

—
Reply to this email directly or view it on GitHub
#4828 (comment)
.

Jeffrey04 · 2015-10-19T05:34:43Z

done, thanks (:

ClimbsRocks · 2015-11-03T06:20:53Z

Just wanted to say thanks for this feature! I've already tested it out on several datasets and have found it super useful with sparse arrays. Thanks for all the hard work you've put into this, everyone!

untom force-pushed the maxabs_scaler branch from fadca42 to ef95761 Compare June 8, 2015 05:55

amueller reviewed Jun 8, 2015
View reviewed changes

amueller changed the title ~~ENH Add MaxAbsScaler~~ [MRG] dd MaxAbsScaler Jun 8, 2015

amueller changed the title ~~[MRG] dd MaxAbsScaler~~ [MRG] add MaxAbsScaler Jun 8, 2015

amueller mentioned this pull request Jun 8, 2015

[WIP] Refactor scaler code #3639

Closed

untom force-pushed the maxabs_scaler branch from 9c7d392 to 5965d49 Compare June 9, 2015 16:26

amueller changed the title ~~[MRG] add MaxAbsScaler~~ [MRG + 1] add MaxAbsScaler Jun 9, 2015

untom force-pushed the maxabs_scaler branch 2 times, most recently from 676d201 to 22074da Compare June 9, 2015 21:34

TomDLT reviewed Jun 10, 2015
View reviewed changes

untom force-pushed the maxabs_scaler branch from 22074da to 9f28d10 Compare June 10, 2015 14:24

jnothman reviewed Jun 11, 2015
View reviewed changes

jnothman changed the title ~~[MRG + 1] add MaxAbsScaler~~ [MRG+2] add MaxAbsScaler Jun 11, 2015

ENH add MaxAbsScaler

ab734b7

untom force-pushed the maxabs_scaler branch from d5365dc to ab734b7 Compare June 11, 2015 14:08

jnothman added a commit that referenced this pull request Jun 11, 2015

Merge pull request #4828 from untom/maxabs_scaler

dc8578a

[MRG] add MaxAbsScaler

jnothman merged commit dc8578a into scikit-learn:master Jun 11, 2015

amueller mentioned this pull request Jun 11, 2015

Global normalization and sparse matrix support for MinMaxScaler #1799

Closed

amueller mentioned this pull request Aug 25, 2015

[MRG+1-1] Refactoring and expanding sklearn.preprocessing scaling #2514

Closed

amueller mentioned this pull request Sep 11, 2015

MinMaxScaler does not support sparse input. #1324

Closed

nlathia mentioned this pull request Aug 30, 2016

Incorrect documentation on scaling sparse data #7293

Closed



		As with :func:`scale`, the module further provides a
		convenience function function :func:`maxabs_scale` if you don't want to

Uh oh!

[MRG+2] add MaxAbsScaler #4828

[MRG+2] add MaxAbsScaler #4828

Uh oh!

Conversation

untom commented Jun 7, 2015

Uh oh!

untom commented Jun 7, 2015

Uh oh!

amueller Jun 8, 2015

Choose a reason for hiding this comment

Uh oh!

amueller commented Jun 8, 2015

Uh oh!

TomDLT Jun 10, 2015

Choose a reason for hiding this comment

Uh oh!

TomDLT commented Jun 10, 2015

Uh oh!

TomDLT commented Jun 10, 2015

Uh oh!

untom commented Jun 10, 2015

Uh oh!

jnothman Jun 11, 2015

Choose a reason for hiding this comment

Uh oh!

untom Jun 11, 2015

Choose a reason for hiding this comment

Uh oh!

jnothman commented Jun 11, 2015

Uh oh!

jnothman commented Jun 11, 2015

Uh oh!

untom commented Jun 11, 2015

Uh oh!

Jeffrey04 commented Oct 19, 2015

Uh oh!

jnothman commented Oct 19, 2015

Uh oh!

Jeffrey04 commented Oct 19, 2015

Uh oh!

ClimbsRocks commented Nov 3, 2015

Uh oh!

Uh oh!