ENH Add clip parameter to MaxAbsScaler #31790

glevv · 2025-07-19T08:50:58Z

Reference Issues/PRs

Closes #31672

What does this implement/fix? Explain your changes.

Added clip parameter to MaxAbsScaler class;
Added clip parameter to maxabs_scale function;
Added tests for clip parameter in MaxAbsScaler class for sparse and dense arrays;
Adede documentation for clip parameter in MaxAbsScaler class.

Any other comments?

github-actions · 2025-07-19T08:51:52Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: 9a4723d. Link to the linter CI: here}

jeremiedbb

Thanks for the PR @glevv. Here are some comments

doc/whats_new/upcoming_changes/sklearn.preprocessing/31790.enhancement.rst

sklearn/preprocessing/_data.py

jeremiedbb · 2025-07-21T10:47:35Z

sklearn/preprocessing/tests/test_data.py

+    X_test = (
+        sparse_container([np.r_[X_min.data[:2] - 10, X_max.data[2:] + 10]])
+        if sparse_container
+        else [np.r_[X_min[:2] - 10, X_max[2:] + 10]]
+    )


I'd rather use numpy equivalents that have an explicit name

Suggested change

X_test = (

sparse_container([np.r_[X_min.data[:2] - 10, X_max.data[2:] + 10]])

if sparse_container

else [np.r_[X_min[:2] - 10, X_max[2:] + 10]]

)

X_test = np.hstack((X_min[:2] - 10, X_max[2:] + 10)).reshape(1, -1)

if sparse_container:

X_test = sparse_container(X_test)

I switched to np.hstack, but sparse arrays cannot be indexed, so this code snippet won't work.

Right, we need to do something like hstack = sp.hstack if sp.issparse(X) else np.hstack.

I decided to refactor the test function by splitting it into two. It should be cleaner this way.

The only thing I missed is that it should be:

np.max(np.abs(X))

instead of just max. With these test datasets it does not matter, since they both have only 0 or positive numbers, but I will change it

sklearn/preprocessing/tests/test_data.py

sklearn/preprocessing/_data.py

StefanieSenger

Thanks for the PR, @glevv!

We should also test that the `clip=True` does not break the array API compatibility by adding it to `test_scaler_array_api_compliance` or adding different array types to the new test or have an additional test (even though `MinMaxScaler` also doesn't test this for `clip` either, which could be a separate PR).

From the np.clip docs it seems, that we can pass min=-1.0, max=1.0 as kwargs (into _modify_in_place_if_numpy then) and then it's array api compatible (which is a bit surprising to me) but the array API spec seems to allow positional passing. Maybe it's fine as it is, but it's better to have a test.

Oh sorry, I now see you have added MaxAbsScaler(clip=True) to test_preprocessing_array_api_compliance.
I had clicked "send review" by accident and too early. Give me a sec to check everything.

Edit:
That looks all fine to me. :)

sklearn/preprocessing/tests/test_data.py

StefanieSenger · 2025-07-22T09:18:02Z

sklearn/preprocessing/_data.py

        else:
            X /= self.scale_
+            if self.clip:
+                device_ = device(X)


device_ = device(X)

I think that's more consistent to the rest of the codebase to use
xp, _ , device_ = get_namespace_and_device(X)
in the beginning of transform instead.

I believe that was done to be consistent with the handling of clip in MinMaxScaler which does it that way

I was repeating the clip behavior of MinMaxScaler. If it's not correct, I can change it

It is correct, but it has a little risk that we later by accident introduce some change in the device of X between the beginning of the transform method and here, but then the array api tests would fail. I think it's fine and save.

sklearn/preprocessing/_data.py

StefanieSenger

Thanks for your further work and re-structuring the tests, @glevv. These read much more intuitive to me.

I now only have some typo nits.

sklearn/preprocessing/_data.py

StefanieSenger · 2025-07-23T07:00:03Z

sklearn/preprocessing/_data.py

        else:
            X /= self.scale_
+            if self.clip:
+                device_ = device(X)


It is correct, but it has a little risk that we later by accident introduce some change in the device of X between the beginning of the transform method and here, but then the array api tests would fail. I think it's fine and save.

… into maxabs-scaler-clip

StefanieSenger

Thanks @glevv! I checked through everything again and it looks good to me.

StefanieSenger · 2025-07-25T06:59:58Z

sklearn/preprocessing/tests/test_data.py

+    )
+    X = sparse_container(X)
+    scaler = MaxAbsScaler(clip=True).fit(X)
+    X_max = np.max(np.abs(X), axis=0)


Sorry for the nit, but I found this confusing. Using scipy directly is more straightforward I think. You think it's valid?

Suggested change

X_max = np.max(np.abs(X), axis=0)

X_max = X.max(axis=0)

You can look at this discussion #31790 (comment)
I decided to go for max(abs(X)) even though it is unnecessary in mathematical sense, but in terms of readability and versatility of the inputs, this is better in my opinion

As for np.max() or .max(), I almost shure numpy will use scipy internal method for the calculation, so there should be no difference, but I could be wrong

jeremiedbb

I pushed a commit to simplify the test. I made it more similar to the one regarding minmaxscaler and more similar the first one you wrote. There was no real reason to split the test for dense and sparse and create a new toy dataset. Instead I added a comment to explain how and for what purpose we construct the test sample. I also added a similar comment in the minmaxscaler test.

LGTM. Thanks @glevv !

Co-authored-by: Jérémie du Boisberranger <[email protected]>

Added clip parameter to MaxAbsScaler, added tests and documentation

1453dc0

github-actions bot added the module:preprocessing label Jul 19, 2025

glevv changed the title ~~[ENH] Add clip parameter to MaxAbsScaler~~ ENH Add clip parameter to MaxAbsScaler Jul 19, 2025

glevv added 11 commits July 19, 2025 12:15

added changelog, bug fixes

4513d37

fixed test case

0703980

bugfix

0591c35

added test case for sparse arrays

226e711

fix edgecase with coo format

ab3f77c

revert previous, fix test

1839c05

fix tests

ef56622

test bugfix

06c765e

test bugfix

f539c18

test bugfix

a04b540

hotfix

d13618a

jeremiedbb reviewed Jul 21, 2025

View reviewed changes

glevv added 4 commits July 21, 2025 19:28

implemented suggestions

d0a804a

Merge branch 'main' into maxabs-scaler-clip

b65b358

test fix

fccb6b5

fix assert

51395bf

StefanieSenger reviewed Jul 22, 2025

View reviewed changes

sklearn/preprocessing/_data.py Show resolved Hide resolved

glevv added 6 commits July 22, 2025 20:26

implemented suggestions

46947f8

Merge branch 'main' into maxabs-scaler-clip

58892be

more test fixes

036ec99

test refactor

753f3c9

fix shapes

00e853f

removed keepdims

a806a56

StefanieSenger reviewed Jul 23, 2025

View reviewed changes

Merge branch 'main' into maxabs-scaler-clip

e04e190

glevv added 2 commits July 23, 2025 19:01

small changes

80333ec

Merge remote-tracking branch 'refs/remotes/origin/maxabs-scaler-clip'…

e6cf489

… into maxabs-scaler-clip

StefanieSenger approved these changes Jul 25, 2025

View reviewed changes

simplify test

18621a7

jeremiedbb approved these changes Jul 25, 2025

View reviewed changes

jeremiedbb enabled auto-merge (squash) July 25, 2025 11:15

lint

9a4723d

jeremiedbb merged commit 25aeaf3 into scikit-learn:main Jul 25, 2025
36 checks passed

glevv deleted the maxabs-scaler-clip branch July 26, 2025 05:00

lucyleeow pushed a commit to lucyleeow/scikit-learn that referenced this pull request Aug 22, 2025

ENH Add clip parameter to MaxAbsScaler (scikit-learn#31790)

f758a29

Co-authored-by: Jérémie du Boisberranger <[email protected]>

jeremiedbb mentioned this pull request Sep 3, 2025

Release 1.7.2 #32092

Merged

13 tasks

Uh oh!

ENH Add clip parameter to MaxAbsScaler #31790

ENH Add clip parameter to MaxAbsScaler #31790

Uh oh!

Conversation

glevv commented Jul 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

github-actions bot commented Jul 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✔️ Linting Passed

Uh oh!

jeremiedbb left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

glevv Jul 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

StefanieSenger left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

StefanieSenger left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

StefanieSenger left a comment

Choose a reason for hiding this comment

Uh oh!

StefanieSenger Jul 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jeremiedbb left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

glevv commented Jul 19, 2025 •

edited

Loading

github-actions bot commented Jul 19, 2025 •

edited

Loading

glevv Jul 23, 2025 •

edited

Loading

StefanieSenger left a comment •

edited

Loading

StefanieSenger Jul 25, 2025 •

edited

Loading