-
-
Notifications
You must be signed in to change notification settings - Fork 26k
Add sample_weight
support for QuantileTransformer
when fit on dense data
#31147
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add sample_weight
support for QuantileTransformer
when fit on dense data
#31147
Conversation
Thanks for the PR. Could you please instead use For the unweighted case, we should use The two changes together should help make the weighting/repetition semantic check of Please mark |
sample_weight
support for QuantileTransformer
when fit on dense data
Please also don't forget to document your change in a changelog entry by adding a file under |
@ogrisel Okay, thank you for your reply! I will fix that |
…rcentile, add XFAIL for sparse_data
Are you still interested in working on this? |
@@ -2844,9 +2867,11 @@ def fit(self, X, y=None): | |||
# Create the quantiles of reference | |||
self.references_ = np.linspace(0, 1, self.n_quantiles_, endpoint=True) | |||
if sparse.issparse(X): | |||
if sample_weight is not None: | |||
raise ValueError("sample_weight is not supported for sparse input.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
raise ValueError("sample_weight is not supported for sparse input.") | |
raise NotImplementedError( | |
"sample_weight is not supported for sparse input." | |
) |
X_roundtrip = qt_weighted.inverse_transform(Xt_weighted) | ||
np.testing.assert_allclose( | ||
X[~np.isnan(X)], X_roundtrip[~np.isnan(X)], rtol=1e-2, atol=1e-2 | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this test is not necessary since there is already a common test check_sample_weight_equivalence_on_dense_data
that should be executed when running the following command:
pytest -v -k "QuantileTransformer and check_sample_weight_equivalence" sklearn/tests/test_common.py
weights_clean = sample_weight[mask] | ||
self.quantiles_[:, i] = _averaged_weighted_percentile( | ||
col_clean, sample_weight=weights_clean, quantile=references / 100.0 | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please check the source code of _averaged_weighted_percentile
and the docstring of _weighted_percentile
to learn how to use it properly.
In particular:
- it accepts percentile ranks, not quantile ranks;
- it can compute for for a single rank at a time, hence you might need to call this in a list comprehension for all possible ranks passed in the
references
array.
|
||
self.quantiles_ = np.zeros((len(references), n_features)) | ||
|
||
for i in range(n_features): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please rename i
to feature_idx
to make the code easier to follow.
Reference Issues/PRs
Fixes #30707
See also the discussion in #30707.
What does this implement/fix? Explain your changes.
This PR adds support for the
sample_weight
parameter toQuantileTransformer
, allowing users to apply weights to samples when computing quantiles. This makes the transformation more flexible, especially in cases where samples have varying importance or are part of imbalanced datasets.Changes made:
sample_weight
parameter tofit
and_dense_fit
.sample_weight
.Any other comments?
sample_weight
is not provided.Thanks for the review!