Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[MRG+1] Changing default model for IterativeImputer to BayesianRidge #13038

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

sergeyf
Copy link
Contributor

@sergeyf sergeyf commented Jan 23, 2019

As discussed in #13026 It turns out having RidgeCV as the default is problematic for reproducibility. This default should be gentler.

Note, this may not pass tests until a more stable example is merged via #13026.

Paging @jnothman

@jnothman
Copy link
Member

Please update the documentation.

@sergeyf
Copy link
Contributor Author

sergeyf commented Jan 23, 2019

Whoops, forgot. Thanks.

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this does not also affect documentation in the user guide.

We probably should have had a test that would have broken with this change, i.e. checking the class of each predictor. We should make sure there is such a test at least for custom predictors to ensure no regressions....

@sergeyf
Copy link
Contributor Author

sergeyf commented Jan 23, 2019

Sorry, I don't follow. The test would ensure what is going on with custom predictors?

@jnothman
Copy link
Member

I meant like #13039, but also for predictor=None

@sergeyf
Copy link
Contributor Author

sergeyf commented Jan 24, 2019

I think you mean this?

def test_iterative_imputer_bayesianridge_default():
    rng = np.random.RandomState(0)

    n = 100
    d = 10
    X = sparse_random_matrix(n, d, density=0.10, random_state=rng).toarray()

    imputer = IterativeImputer(missing_values=0,
                               n_iter=1,
                               predictor=None,
                               random_state=rng)
    imputer.fit_transform(X)

    # check that types are correct for predictors
    hashes = []
    for triplet in imputer.imputation_sequence_:
        assert isinstance(triplet.predictor, type(BayesianRidge()))
        hashes.append(id(triplet.predictor))

    # check that each predictor is unique
    assert len(set(hashes)) == len(hashes)

@sergeyf sergeyf changed the title Changing default model for IterativeImputer to BayesianRIdge [WIP] Changing default model for IterativeImputer to BayesianRidge Jan 24, 2019
@jnothman
Copy link
Member

jnothman commented Jan 24, 2019 via email

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@sergeyf sergeyf changed the title [WIP] Changing default model for IterativeImputer to BayesianRidge [MRG+1] Changing default model for IterativeImputer to BayesianRidge Jan 24, 2019
Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would refactor the test with the previous one. It would require just a single statement more.
Otherwise LGTM

@@ -572,6 +572,24 @@ def test_iterative_imputer_predictors(predictor):
assert len(set(hashes)) == len(hashes)


def test_iterative_imputer_bayesianridge_default():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to just make this case in the previous test

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant:

@pytest.mark.parametrize(
    "predictor",
    [None, DummyRegressor(), BayesianRidge(), ARDRegression(), RidgeCV()]
)
def test_iterative_imputer_predictors(predictor):
    rng = np.random.RandomState(0)
    n = 100
    d = 10
    X = sparse_random_matrix(n, d, density=0.10, random_state=rng).toarray()
    imputer = IterativeImputer(missing_values=0,
                               n_iter=1,
                               predictor=predictor,
                               random_state=rng)
    imputer.fit_transform(X)
    # check that types are correct for predictors
    hashes = []
    for triplet in imputer.imputation_sequence_:
        expected_type = type(predictor) if predictor is not None else type(BayesianRidge())
        assert isinstance(triplet.predictor, expected_type)
        hashes.append(id(triplet.predictor))
    # check that each predictor is unique
    assert len(set(hashes)) == len(hashes)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. Done.

@glemaitre
Copy link
Member

Merging when CI turn green

@glemaitre glemaitre merged commit cf4670c into scikit-learn:iterativeimputer Jan 24, 2019
@glemaitre
Copy link
Member

Thanks for the change

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants