-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
[MRG+1] Changing default model for IterativeImputer to BayesianRidge #13038
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG+1] Changing default model for IterativeImputer to BayesianRidge #13038
Conversation
Please update the documentation. |
Whoops, forgot. Thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this does not also affect documentation in the user guide.
We probably should have had a test that would have broken with this change, i.e. checking the class of each predictor. We should make sure there is such a test at least for custom predictors to ensure no regressions....
Sorry, I don't follow. The test would ensure what is going on with custom predictors? |
I meant like #13039, but also for predictor=None |
I think you mean this?
|
Yeah that sort of thing. Historically we should have tested the effect of
sample_posterior is what I really mean... But now it is more
straightforward.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would refactor the test with the previous one. It would require just a single statement more.
Otherwise LGTM
sklearn/tests/test_impute.py
Outdated
@@ -572,6 +572,24 @@ def test_iterative_imputer_predictors(predictor): | |||
assert len(set(hashes)) == len(hashes) | |||
|
|||
|
|||
def test_iterative_imputer_bayesianridge_default(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be better to just make this case in the previous test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant:
@pytest.mark.parametrize(
"predictor",
[None, DummyRegressor(), BayesianRidge(), ARDRegression(), RidgeCV()]
)
def test_iterative_imputer_predictors(predictor):
rng = np.random.RandomState(0)
n = 100
d = 10
X = sparse_random_matrix(n, d, density=0.10, random_state=rng).toarray()
imputer = IterativeImputer(missing_values=0,
n_iter=1,
predictor=predictor,
random_state=rng)
imputer.fit_transform(X)
# check that types are correct for predictors
hashes = []
for triplet in imputer.imputation_sequence_:
expected_type = type(predictor) if predictor is not None else type(BayesianRidge())
assert isinstance(triplet.predictor, expected_type)
hashes.append(id(triplet.predictor))
# check that each predictor is unique
assert len(set(hashes)) == len(hashes)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea. Done.
Merging when CI turn green |
Thanks for the change |
As discussed in #13026 It turns out having
RidgeCV
as the default is problematic for reproducibility. This default should be gentler.Note, this may not pass tests until a more stable example is merged via #13026.
Paging @jnothman