-
-
Notifications
You must be signed in to change notification settings - Fork 26.5k
[MRG] Dynamically set n_quantiles to min(n_samples, n_quantiles) in QuantileTransformer #13333
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
b3e207b to
6e1c01f
Compare
glemaitre
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need an entry in what's new. If we considered it as a bug fix then we don't need a deprecation.
| transformer.transform, 10) | ||
| # check that a warning is raised is n_quantiles > n_samples | ||
| transformer = QuantileTransformer(n_quantiles=100) | ||
| assert_warns_message(UserWarning, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you use pytest to catch warning instead?
sklearn/preprocessing/data.py
Outdated
| n_samples = X.shape[0] | ||
|
|
||
| if self.n_quantiles > n_samples: | ||
| self.n_quantiles = n_samples |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be called n_quantiles_ isn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes indeed I just fixed it
|
Thanks @glemaitre |
| n_samples = X.shape[0] | ||
|
|
||
| if self.n_quantiles > n_samples: | ||
| warnings.warn("n_quantiles (%s) is greater than the total number " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is more verbose than is helpful. "n_quantiles (%d) is being reduced to the number of samples (%d)" if you want to add another phrase explaining why, okay, but I think it's intuitively okay.
jnothman
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
otherwise LGTM
Co-Authored-By: albertcthomas <[email protected]>
d105f79 to
cfb2b19
Compare
|
Thanks @jnothman |
…eTransformer (scikit-learn#13333)" This reverts commit 6dd1404.
…eTransformer (scikit-learn#13333)" This reverts commit 6dd1404.
Fixes #13315
Values of
n_quantileslarger thann_sampleseither do not lead to a better approximation of the cdf estimator or to a wrong approximation. See #13315 for details.Don't know if this is considered as a bug or if this requires a deprecation cycle?