-
-
Notifications
You must be signed in to change notification settings - Fork 26k
MRG FIX: order of values of self.quantiles_ in QuantileTransformer #15751
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
b13fcf5
7bd7846
1196430
3816224
b47a786
3eb3aa9
ebe7590
ec222ea
c9cf96e
911859d
29bce0c
f81b17c
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -25,6 +25,7 @@ | |
from sklearn.utils._testing import assert_allclose | ||
from sklearn.utils._testing import assert_allclose_dense_sparse | ||
from sklearn.utils._testing import skip_if_32bit | ||
from sklearn.utils._testing import _convert_container | ||
|
||
from sklearn.utils.sparsefuncs import mean_variance_axis | ||
from sklearn.preprocessing._data import _handle_zeros_in_scale | ||
|
@@ -1532,6 +1533,26 @@ def test_quantile_transform_nan(): | |
assert not np.isnan(transformer.quantiles_[:, 1:]).any() | ||
|
||
|
||
@pytest.mark.parametrize("array_type", ['array', 'sparse']) | ||
def test_quantile_transformer_sorted_quantiles(array_type): | ||
# Non-regression test for: | ||
# https://github.com/scikit-learn/scikit-learn/issues/15733 | ||
# Taken from upstream bug report: | ||
# https://github.com/numpy/numpy/issues/14685 | ||
X = np.array([0, 1, 1, 2, 2, 3, 3, 4, 5, 5, 1, 1, 9, 9, 9, 8, 8, 7] * 10) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Shoot, I try to produce the failure running differnent size and way of generating unsuccessfully. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nice that you got one There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The trick was to make that dataset larger than 100 samples, otherwise So I just duplicated the samples 10 times and the monotonicity issue was fortunately still present :) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. BTW, I am not sure if But that's unrelated to the topic of this PR. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We raise warning in this case so I think that this is fine (at least we expected it) |
||
X = 0.1 * X.reshape(-1, 1) | ||
X = _convert_container(X, array_type) | ||
|
||
n_quantiles = 100 | ||
qt = QuantileTransformer(n_quantiles=n_quantiles).fit(X) | ||
|
||
# Check that the estimated quantile threasholds are monotically | ||
# increasing: | ||
quantiles = qt.quantiles_[:, 0] | ||
assert len(quantiles) == 100 | ||
assert all(np.diff(quantiles) >= 0) | ||
|
||
|
||
def test_robust_scaler_invalid_range(): | ||
for range_ in [ | ||
(-1, 90), | ||
|
Uh oh!
There was an error while loading. Please reload this page.