Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@glevv
Copy link
Contributor

@glevv glevv commented Jul 19, 2025

Reference Issues/PRs

Closes #31672

What does this implement/fix? Explain your changes.

  • Added clip parameter to MaxAbsScaler class;
  • Added clip parameter to maxabs_scale function;
  • Added tests for clip parameter in MaxAbsScaler class for sparse and dense arrays;
  • Adede documentation for clip parameter in MaxAbsScaler class.

Any other comments?

@glevv glevv changed the title [ENH] Add clip parameter to MaxAbsScaler ENH Add clip parameter to MaxAbsScaler Jul 19, 2025
@github-actions
Copy link

github-actions bot commented Jul 19, 2025

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 9a4723d. Link to the linter CI: here

Copy link
Member

@jeremiedbb jeremiedbb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @glevv. Here are some comments

Comment on lines 2535 to 2539
X_test = (
sparse_container([np.r_[X_min.data[:2] - 10, X_max.data[2:] + 10]])
if sparse_container
else [np.r_[X_min[:2] - 10, X_max[2:] + 10]]
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather use numpy equivalents that have an explicit name

Suggested change
X_test = (
sparse_container([np.r_[X_min.data[:2] - 10, X_max.data[2:] + 10]])
if sparse_container
else [np.r_[X_min[:2] - 10, X_max[2:] + 10]]
)
X_test = np.hstack((X_min[:2] - 10, X_max[2:] + 10)).reshape(1, -1)
if sparse_container:
X_test = sparse_container(X_test)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I switched to np.hstack, but sparse arrays cannot be indexed, so this code snippet won't work.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, we need to do something like hstack = sp.hstack if sp.issparse(X) else np.hstack.

Copy link
Contributor Author

@glevv glevv Jul 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to refactor the test function by splitting it into two. It should be cleaner this way.

The only thing I missed is that it should be:

np.max(np.abs(X))

instead of just max. With these test datasets it does not matter, since they both have only 0 or positive numbers, but I will change it

Copy link
Member

@StefanieSenger StefanieSenger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR, @glevv!

We should also test that the `clip=True` does not break the array API compatibility by adding it to `test_scaler_array_api_compliance` or adding different array types to the new test or have an additional test (even though `MinMaxScaler` also doesn't test this for `clip` either, which could be a separate PR).

From the np.clip docs it seems, that we can pass min=-1.0, max=1.0 as kwargs (into _modify_in_place_if_numpy then) and then it's array api compatible (which is a bit surprising to me) but the array API spec seems to allow positional passing. Maybe it's fine as it is, but it's better to have a test.

Oh sorry, I now see you have added MaxAbsScaler(clip=True) to test_preprocessing_array_api_compliance.
I had clicked "send review" by accident and too early. Give me a sec to check everything.

Edit:
That looks all fine to me. :)

else:
X /= self.scale_
if self.clip:
device_ = device(X)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

device_ = device(X)

I think that's more consistent to the rest of the codebase to use
xp, _ , device_ = get_namespace_and_device(X)
in the beginning of transform instead.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that was done to be consistent with the handling of clip in MinMaxScaler which does it that way

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was repeating the clip behavior of MinMaxScaler. If it's not correct, I can change it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is correct, but it has a little risk that we later by accident introduce some change in the device of X between the beginning of the transform method and here, but then the array api tests would fail. I think it's fine and save.

Copy link
Member

@StefanieSenger StefanieSenger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your further work and re-structuring the tests, @glevv. These read much more intuitive to me.

I now only have some typo nits.

else:
X /= self.scale_
if self.clip:
device_ = device(X)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is correct, but it has a little risk that we later by accident introduce some change in the device of X between the beginning of the transform method and here, but then the array api tests would fail. I think it's fine and save.

Copy link
Member

@StefanieSenger StefanieSenger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @glevv! I checked through everything again and it looks good to me.

)
X = sparse_container(X)
scaler = MaxAbsScaler(clip=True).fit(X)
X_max = np.max(np.abs(X), axis=0)
Copy link
Member

@StefanieSenger StefanieSenger Jul 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the nit, but I found this confusing. Using scipy directly is more straightforward I think. You think it's valid?

Suggested change
X_max = np.max(np.abs(X), axis=0)
X_max = X.max(axis=0)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can look at this discussion #31790 (comment)
I decided to go for max(abs(X)) even though it is unnecessary in mathematical sense, but in terms of readability and versatility of the inputs, this is better in my opinion

As for np.max() or .max(), I almost shure numpy will use scipy internal method for the calculation, so there should be no difference, but I could be wrong

Copy link
Member

@jeremiedbb jeremiedbb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed a commit to simplify the test. I made it more similar to the one regarding minmaxscaler and more similar the first one you wrote. There was no real reason to split the test for dense and sparse and create a new toy dataset. Instead I added a comment to explain how and for what purpose we construct the test sample. I also added a similar comment in the minmaxscaler test.

LGTM. Thanks @glevv !

@jeremiedbb jeremiedbb enabled auto-merge (squash) July 25, 2025 11:15
@jeremiedbb jeremiedbb merged commit 25aeaf3 into scikit-learn:main Jul 25, 2025
36 checks passed
@glevv glevv deleted the maxabs-scaler-clip branch July 26, 2025 05:00
lucyleeow pushed a commit to lucyleeow/scikit-learn that referenced this pull request Aug 22, 2025
@jeremiedbb jeremiedbb mentioned this pull request Sep 3, 2025
13 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ENH Add clip parameter to MaxAbsScaler

3 participants