Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[MRG+2] add MaxAbsScaler #4828

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 11, 2015
Merged

[MRG+2] add MaxAbsScaler #4828

merged 1 commit into from
Jun 11, 2015

Conversation

untom
Copy link
Contributor

@untom untom commented Jun 7, 2015

This PR adds the MaxAbsScaler and maxabs_scale to sklearn.preprocessing. This scaler scales its inputs by the maximum absolute value of each feature. This scaler is especially useful for sparse data, but is probably also always a better alternative to MinMaxScaler when the data is already centered.

The scaler itself was previously discussed in #1799 and #2514.

@untom
Copy link
Contributor Author

untom commented Jun 7, 2015

@amueller : I hope this is what you had in mind

lie between a given minimum and maximum value, often between zero and one.
This can be achieved using :class:`MinMaxScaler`.
lie between a given minimum and maximum value, often between zero and one,
or so that the maximum value of each feature is scaled to unit size.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

absolute value?

@amueller
Copy link
Member

amueller commented Jun 8, 2015

Apart from nitpicks and sparse matrix testing LGTM

@amueller amueller changed the title ENH Add MaxAbsScaler [MRG] dd MaxAbsScaler Jun 8, 2015
@amueller amueller changed the title [MRG] dd MaxAbsScaler [MRG] add MaxAbsScaler Jun 8, 2015
@amueller amueller changed the title [MRG] add MaxAbsScaler [MRG + 1] add MaxAbsScaler Jun 9, 2015
@untom untom force-pushed the maxabs_scaler branch 2 times, most recently from 676d201 to 22074da Compare June 9, 2015 21:34


As with :func:`scale`, the module further provides a
convenience function function :func:`maxabs_scale` if you don't want to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

duplicated word?

@TomDLT
Copy link
Member

TomDLT commented Jun 10, 2015

Looks good :)

@TomDLT
Copy link
Member

TomDLT commented Jun 10, 2015

Since it is not in RobustScaler anymore, do we want to use _handle_zeros_in_scale here ?

@untom
Copy link
Contributor Author

untom commented Jun 10, 2015

Thanks for that 2nd review. I've implemented the changes you've suggested and squashed the commits.

@@ -146,6 +148,61 @@ full formula is::

X_scaled = X_std / (max - min) + min

:class:`MaxAbsScaler` works in a very similar fashion, but scales data so
it lies within the range ``[-1, 1]``, and is meant for data
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yet this is not true of the following example. Either qualify the statement or add a trim option to the scaler.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed the text a bit, please have a look to see if you like the new wording better.

@jnothman
Copy link
Member

Remove backticks, add what's new entry, squash commits, and you can haz merge. LGTM!

@jnothman jnothman changed the title [MRG + 1] add MaxAbsScaler [MRG+2] add MaxAbsScaler Jun 11, 2015
jnothman added a commit that referenced this pull request Jun 11, 2015
@jnothman jnothman merged commit dc8578a into scikit-learn:master Jun 11, 2015
@jnothman
Copy link
Member

Thanks @untom, for your contribution and your perseverance!

@untom
Copy link
Contributor Author

untom commented Jun 11, 2015

Thanks for your review and in general for helping out with this!

@Jeffrey04
Copy link
Contributor

I found a small problem (?) with MaxAbsScaler and not sure whether I should file a bug report for this (because 0.17 is not officially released)

So I have my collection of data scaled with MinMaxScaler. Then I need to transform a new sparse matrix with one row (sparse vector?), eg.

  (0, 56839)    0.462743526481
  (0, 55421)    0.469655562306
  (0, 54368)    0.203714596644
  (0, 54060)    0.0962621236939
  (0, 51441)    0.540495850676
  (0, 48518)    0.056152354043
  (0, 45181)    0.0652388777274
  (0, 38682)    0.230776053348
  (0, 31876)    0.199738544715
  (0, 14641)    0.280892719445
  (0, 434)  0.207189026352

However, if I attempt to transform the 1-row sparse matrix above (so I can do comparison with the collection), I get this assertion error

Traceback (most recent call last):
  File "./address-query.py", line 41, in <module>
    main()
  File "./address-query.py", line 28, in main
    query = scaler.transform(query_)[0].toarray()
  File "/Users/jeffrey04/.local/lib/python3.5/site-packages/sklearn/preprocessing/data.py", line 792, in transform
    inplace_row_scale(X, 1.0 / self.scale_)
  File "/Users/jeffrey04/.local/lib/python3.5/site-packages/sklearn/utils/sparsefuncs.py", line 200, in inplace_row_scale
    inplace_csr_row_scale(X, scale)
  File "/Users/jeffrey04/.local/lib/python3.5/site-packages/sklearn/utils/sparsefuncs.py", line 61, in inplace_csr_row_scale
    assert scale.shape[0] == X.shape[0]

The workaround to the problem is to use a matrix that has more than 1 row, due to this part of the code (if I am not mistaken)

        if sparse.issparse(X):
            if X.shape[0] == 1:
                inplace_row_scale(X, self.scale_)
            else:
                inplace_column_scale(X, self.scale_)

@jnothman
Copy link
Member

Could you please file this as a separate issue, and also report: X.shape
for the training data, X.shape for the new "sparse row vector", and
scaler.scale_.shape. Thank you.

On 19 October 2015 at 14:15, Jeffrey04 [email protected] wrote:

I found a small problem (?) with MaxAbsScaler and not sure whether I
should file a bug report for this

So I have my collection of data scaled with MinMaxScaler. Then I need to
transform a new sparse matrix with one row (sparse vector?), eg.

(0, 56839) 0.462743526481
(0, 55421) 0.469655562306
(0, 54368) 0.203714596644
(0, 54060) 0.0962621236939
(0, 51441) 0.540495850676
(0, 48518) 0.056152354043
(0, 45181) 0.0652388777274
(0, 38682) 0.230776053348
(0, 31876) 0.199738544715
(0, 14641) 0.280892719445
(0, 434) 0.207189026352

However, if I attempt to transform the 1-row sparse matrix above (so I can
do comparison with the collection), I get this assertion error

Traceback (most recent call last):
File "./address-query.py", line 41, in
main()
File "./address-query.py", line 28, in main
query = scaler.transform(query_)[0].toarray()
File "/Users/jeffrey04/.local/lib/python3.5/site-packages/sklearn/preprocessing/data.py", line 792, in transform
inplace_row_scale(X, 1.0 / self.scale_)
File "/Users/jeffrey04/.local/lib/python3.5/site-packages/sklearn/utils/sparsefuncs.py", line 200, in inplace_row_scale
inplace_csr_row_scale(X, scale)
File "/Users/jeffrey04/.local/lib/python3.5/site-packages/sklearn/utils/sparsefuncs.py", line 61, in inplace_csr_row_scale
assert scale.shape[0] == X.shape[0]


Reply to this email directly or view it on GitHub
#4828 (comment)
.

@Jeffrey04
Copy link
Contributor

done, thanks (:

@ClimbsRocks
Copy link
Contributor

Just wanted to say thanks for this feature! I've already tested it out on several datasets and have found it super useful with sparse arrays. Thanks for all the hard work you've put into this, everyone!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants