Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@rusdes
Copy link
Contributor

@rusdes rusdes commented Sep 9, 2022

Reference Issues/PRs

Fixes #24265.

What does this implement/fix? Explain your changes.

This PR deprecates sparse parameter in OneHotEncoder and introduces sparse_output.

Any other comments?

Launched:

pytest sklearn/preprocessing/tests/test_encoders.py -v

and all tests pass with 0 warnings.

Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You will also need an entry in the 1.2 changelog to acknowledge the deprecation.

"will be removed in 1.4.",
FutureWarning,
)
self.sparse_output = self.sparse
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an annoying case here.

Supposedly, we should be raising an error if both sparse and sparse_output are set to non-default values. However, to detect it we need to add a None option to sparse_output that defaults to True by default. However, we don't want None and we also need to depreciate it at the inclusion such that we remove it. It means that in practice we will always be raising a warning and force users to set it to a bool.

@jeremiedbb I recall that we had a similar pattern somewhere else (DictionaryLearning of NMF). How did we do it?

@lorentzenchr @thomasjpfan do you see a better way?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To keep it simple, I would raise in the following case

if self.sparse != "deprecated" and self.sparse != self.sparse_output:
    raise ValueError("Some informative message")

Copy link
Member

@jeremiedbb jeremiedbb Sep 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't work because you don't want to raise when the user sets sparse to a value different than the default of sparse_output.

To me the appropriate way is to do as explained in our docs, i.e. if a user sets sparse we raise a warning saying that it's deprecated and that sparse_output should be used instead and is ignored, and then use the value of sparse.
We don't really need to raise when both are set, the future warning is enough

@glemaitre glemaitre changed the title Rename OneHotEncoder option sparse to sparse_output API Rename OneHotEncoder option sparse to sparse_output Sep 12, 2022
Copy link
Member

@lorentzenchr lorentzenchr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is already looking quite good.

"will be removed in 1.4.",
FutureWarning,
)
self.sparse_output = self.sparse
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To keep it simple, I would raise in the following case

if self.sparse != "deprecated" and self.sparse != self.sparse_output:
    raise ValueError("Some informative message")

Copy link
Member

@jeremiedbb jeremiedbb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @rusdes. Here are just a few comments. LGTM Otherwise

Copy link
Member

@jeremiedbb jeremiedbb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @rusdes.

@glemaitre, are you ok with this handling of the param renaming (see #24412 (comment)) ?

@glemaitre glemaitre merged commit 7134e41 into scikit-learn:main Sep 16, 2022
@glemaitre
Copy link
Member

True this is fine with me. Thanks @rusdes Good to be merged.

@rusdes rusdes deleted the rename_sparse_output branch September 16, 2022 11:51
@rusdes
Copy link
Contributor Author

rusdes commented Sep 28, 2022

@lorentzenchr @glemaitre @jeremiedbb I was going through some code and found that sklearn.feature_extraction.DictVectorizer uses the parameter sparse. In my opinion, we should be renaming that to sparse_output as well. If that's indeed the case, I'm more than happy to work on that. Feel free to create a new issue and assign that to me :)

@rusdes rusdes restored the rename_sparse_output branch September 28, 2022 17:51
@jeremiedbb
Copy link
Member

sklearn.feature_extraction.DictVectorizer uses the parameter sparse. In my opinion, we should be renaming that to sparse_output as well

@rusdes I agree with you. Feel free to directly open a PR to make the deprecation. Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Rename OneHotEncoder option sparse to sparse_output

4 participants