Add `sample_weight` support for `QuantileTransformer` when fit on dense data #31147

kaekkr · 2025-04-04T08:59:45Z

Reference Issues/PRs

Fixes #30707
See also the discussion in #30707.

What does this implement/fix? Explain your changes.

This PR adds support for the sample_weight parameter to QuantileTransformer, allowing users to apply weights to samples when computing quantiles. This makes the transformation more flexible, especially in cases where samples have varying importance or are part of imbalanced datasets.

Changes made:

Added sample_weight parameter to fit and _dense_fit.
Implemented weighted quantile logic.
Updated tests to check for correct behavior with and without sample_weight.

Any other comments?

The implementation ensures backward compatibility.
Tests pass and maintain previous behavior when sample_weight is not provided.
Would appreciate feedback on edge cases or numerical accuracy concerns.

Thanks for the review!

github-actions · 2025-04-04T09:01:17Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: 06960cf. Link to the linter CI: here}

ogrisel · 2025-04-04T15:46:22Z

Thanks for the PR. Could you please instead use sklearn.utils.stats._averaged_weighted_percentile instead of reimplementing a new version of weighted quantiles?

For the unweighted case, we should use np.nanquantile/np.percentile with method="averaged_inverted_cdf" instead.

The two changes together should help make the weighting/repetition semantic check of check_sample_weight_equivalence_on_dense_data pass.

Please mark check_sample_weight_equivalence_on_sparse_data XFAIL in the PER_ESTIMATOR_XFAIL_CHECKS dict.

ogrisel · 2025-04-04T15:56:15Z

Please also don't forget to document your change in a changelog entry by adding a file under doc/whats_new/upcoming_changes. See #29907 for an example.

kaekkr · 2025-04-04T17:36:46Z

@ogrisel Okay, thank you for your reply! I will fix that

…rcentile, add XFAIL for sparse_data

lucyleeow · 2025-07-03T01:08:58Z

Are you still interested in working on this?
If so, could you check on the failing tests? Please also add a whats new entry. Thank you

ogrisel · 2025-07-03T08:56:26Z

sklearn/preprocessing/_data.py

@@ -2844,9 +2867,11 @@ def fit(self, X, y=None):
        # Create the quantiles of reference
        self.references_ = np.linspace(0, 1, self.n_quantiles_, endpoint=True)
        if sparse.issparse(X):
+            if sample_weight is not None:
+                raise ValueError("sample_weight is not supported for sparse input.")


Suggested change

raise ValueError("sample_weight is not supported for sparse input.")

raise NotImplementedError(

"sample_weight is not supported for sparse input."

)

ogrisel · 2025-07-03T08:56:41Z

sklearn/preprocessing/tests/test_data.py

+    X_roundtrip = qt_weighted.inverse_transform(Xt_weighted)
+    np.testing.assert_allclose(
+        X[~np.isnan(X)], X_roundtrip[~np.isnan(X)], rtol=1e-2, atol=1e-2
+    )


I think this test is not necessary since there is already a common test check_sample_weight_equivalence_on_dense_data that should be executed when running the following command:

pytest -v -k "QuantileTransformer and check_sample_weight_equivalence" sklearn/tests/test_common.py

ogrisel · 2025-07-03T09:00:08Z

sklearn/preprocessing/_data.py

+                weights_clean = sample_weight[mask]
+                self.quantiles_[:, i] = _averaged_weighted_percentile(
+                    col_clean, sample_weight=weights_clean, quantile=references / 100.0
+                )


Please check the source code of _averaged_weighted_percentile and the docstring of _weighted_percentile to learn how to use it properly.

In particular:

it accepts percentile ranks, not quantile ranks;

it can compute for for a single rank at a time, hence you might need to call this in a list comprehension for all possible ranks passed in the references array.

ogrisel · 2025-07-03T09:00:32Z

sklearn/preprocessing/_data.py

+
+        self.quantiles_ = np.zeros((len(references), n_features))
+
+        for i in range(n_features):


Please rename i to feature_idx to make the code easier to follow.

Karassay and others added 3 commits April 3, 2025 12:53

add sample_weight parameter in fit function

841ed26

add support for sample_weight in QuantileTransformer

eb78c16

Merge branch 'main' into quantile-transformer-sample-weight

7d51a7d

github-actions bot added the module:preprocessing label Apr 4, 2025

ogrisel changed the title ~~Quantile transformer sample weight~~ Add sample_weight support for QuantileTransformer when fit on dense data Apr 4, 2025

Karassay and others added 2 commits April 10, 2025 12:29

remove custom weighted_percentile function, use _averaged_weighted_pe…

0b0f0e6

…rcentile, add XFAIL for sparse_data

Merge branch 'main' into quantile-transformer-sample-weight

06960cf

StefanieSenger added the Waiting for Reviewer label Jun 26, 2025

ogrisel reviewed Jul 3, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Add `sample_weight` support for `QuantileTransformer` when fit on dense data #31147

Add `sample_weight` support for `QuantileTransformer` when fit on dense data #31147

Uh oh!

kaekkr commented Apr 4, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Apr 4, 2025 •

edited

Loading

Uh oh!

ogrisel commented Apr 4, 2025 •

edited

Loading

Uh oh!

ogrisel commented Apr 4, 2025 •

edited

Loading

Uh oh!

kaekkr commented Apr 4, 2025

Uh oh!

lucyleeow commented Jul 3, 2025

Uh oh!

ogrisel Jul 3, 2025

Uh oh!

ogrisel Jul 3, 2025

Uh oh!

ogrisel Jul 3, 2025

Uh oh!

ogrisel Jul 3, 2025

Uh oh!

Uh oh!


		self.quantiles_ = np.zeros((len(references), n_features))

		for i in range(n_features):

Uh oh!

Add sample_weight support for QuantileTransformer when fit on dense data #31147

Are you sure you want to change the base?

Add sample_weight support for QuantileTransformer when fit on dense data #31147

Uh oh!

Conversation

kaekkr commented Apr 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

github-actions bot commented Apr 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✔️ Linting Passed

Uh oh!

ogrisel commented Apr 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ogrisel commented Apr 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kaekkr commented Apr 4, 2025

Uh oh!

lucyleeow commented Jul 3, 2025

Uh oh!

ogrisel Jul 3, 2025

Choose a reason for hiding this comment

Uh oh!

ogrisel Jul 3, 2025

Choose a reason for hiding this comment

Uh oh!

ogrisel Jul 3, 2025

Choose a reason for hiding this comment

Uh oh!

ogrisel Jul 3, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Add `sample_weight` support for `QuantileTransformer` when fit on dense data #31147

Add `sample_weight` support for `QuantileTransformer` when fit on dense data #31147

kaekkr commented Apr 4, 2025 •

edited

Loading

github-actions bot commented Apr 4, 2025 •

edited

Loading

ogrisel commented Apr 4, 2025 •

edited

Loading

ogrisel commented Apr 4, 2025 •

edited

Loading