Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Add sample_weight support to QuantileTransformer #30707

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
antipisa opened this issue Jan 22, 2025 · 3 comments · May be fixed by #31147
Open

Add sample_weight support to QuantileTransformer #30707

antipisa opened this issue Jan 22, 2025 · 3 comments · May be fixed by #31147
Labels
Moderate Anything that requires some knowledge of conventions and best practices New Feature

Comments

@antipisa
Copy link

antipisa commented Jan 22, 2025

Describe the workflow you want to enable

Would be good to get sample_weight support for QuantileTransformer for dealing with sparse or imbalanced data, a la #15601.

scaler = QuantileTransformer(output_distribution="normal")

scaler.fit(X, sample_weight=w)

Describe your proposed solution

As far as I know it would just require adding the weight argument to the quantiles_ computation in np.nanpercentile.

KBinsDiscretizer supports sample_weight and with strategy='quantile', encode='ordinal' this behavior can be achieved but it is much, much slower.

Describe alternatives you've considered, if relevant

No response

Additional context

No response

@antipisa antipisa added Needs Triage Issue requires triage New Feature labels Jan 22, 2025
@lesteve lesteve removed the Needs Triage Issue requires triage label Jan 23, 2025
@lesteve
Copy link
Member

lesteve commented Jan 23, 2025

Searching in the issue tracker, looks like this feature is in scope if I understand correctly #20522 (comment).

cc @snath-xoc and @jeremiedbb for an informed opinion, since they have been working on sample weights support recently.

@antipisa antipisa changed the title Add sample_weight support to QuantileTransformer? Add sample_weight support to QuantileTransformer Jan 23, 2025
@ogrisel
Copy link
Member

ogrisel commented Feb 3, 2025

@antipisa would you be interested in contributing a PR? There is already a common test named check_sample_weight_equivalence_on_dense_data that should be triggered as soon as you add the sample_weight kwarg to fit. Once this is done, the following command should pick it up:

pytest -k "check_sample_weight_equivalence and QuantileTransformer" sklearn/tests/test_common.py -v  -s 

You might want to adjust some non-default parameter values for that check in PER_ESTIMATOR_CHECK_PARAMS in sklearn/utils/_test_common/instance_generator.py.

Note that when subsampling is enabled, we need to follow a similar strategy as implemented in KBinsDiscretizer. At the time of writing, the code for KBinsDiscretizer in main is still half broken, but there is a fix that is almost ready to be merged: #29907. You can take a look at this PR for inspiration.

We are working on tooling to help test the case where the estimator's fit is stochastic (depends on random_state which is the case when subsampling is enabled). You can track the latest progress here: #16298 (comment)

@ogrisel ogrisel added the Moderate Anything that requires some knowledge of conventions and best practices label Feb 3, 2025
@ogrisel ogrisel moved this to Todo in Losses and solvers Feb 3, 2025
@kaekkr
Copy link

kaekkr commented Apr 3, 2025

I will take this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Moderate Anything that requires some knowledge of conventions and best practices New Feature
Projects
Development

Successfully merging a pull request may close this issue.

4 participants