API Rename OneHotEncoder option sparse to sparse_output #24412

rusdes · 2022-09-09T19:03:33Z

Reference Issues/PRs

What does this implement/fix? Explain your changes.

This PR deprecates sparse parameter in OneHotEncoder and introduces sparse_output.

Any other comments?

Launched:

pytest sklearn/preprocessing/tests/test_encoders.py -v

and all tests pass with 0 warnings.

glemaitre

You will also need an entry in the 1.2 changelog to acknowledge the deprecation.

sklearn/compose/tests/test_column_transformer.py

sklearn/preprocessing/_encoders.py

sklearn/preprocessing/tests/test_encoders.py

sklearn/preprocessing/_encoders.py

glemaitre · 2022-09-12T09:19:55Z

sklearn/preprocessing/_encoders.py

+                "will be removed in 1.4.",
+                FutureWarning,
+            )
+            self.sparse_output = self.sparse


This is an annoying case here.

Supposedly, we should be raising an error if both sparse and sparse_output are set to non-default values. However, to detect it we need to add a None option to sparse_output that defaults to True by default. However, we don't want None and we also need to depreciate it at the inclusion such that we remove it. It means that in practice we will always be raising a warning and force users to set it to a bool.

@jeremiedbb I recall that we had a similar pattern somewhere else (DictionaryLearning of NMF). How did we do it?

@lorentzenchr @thomasjpfan do you see a better way?

To keep it simple, I would raise in the following case

if self.sparse != "deprecated" and self.sparse != self.sparse_output: raise ValueError("Some informative message")

This doesn't work because you don't want to raise when the user sets sparse to a value different than the default of sparse_output.

To me the appropriate way is to do as explained in our docs, i.e. if a user sets sparse we raise a warning saying that it's deprecated and that sparse_output should be used instead and is ignored, and then use the value of sparse.
We don't really need to raise when both are set, the future warning is enough

lorentzenchr

This is already looking quite good.

doc/whats_new/v1.2.rst

sklearn/compose/tests/test_column_transformer.py

sklearn/ensemble/_hist_gradient_boosting/tests/test_gradient_boosting.py

sklearn/ensemble/_hist_gradient_boosting/tests/test_grower.py

sklearn/preprocessing/_encoders.py

lorentzenchr · 2022-09-12T16:14:35Z

sklearn/preprocessing/_encoders.py

+                "will be removed in 1.4.",
+                FutureWarning,
+            )
+            self.sparse_output = self.sparse


To keep it simple, I would raise in the following case

if self.sparse != "deprecated" and self.sparse != self.sparse_output: raise ValueError("Some informative message")

sklearn/preprocessing/tests/test_encoders.py

Co-authored-by: Christian Lorentzen <[email protected]>

jeremiedbb

Thanks for the PR @rusdes. Here are just a few comments. LGTM Otherwise

sklearn/preprocessing/tests/test_encoders.py

doc/whats_new/v1.2.rst

sklearn/preprocessing/_encoders.py

Co-authored-by: Jérémie du Boisberranger <[email protected]>

jeremiedbb

LGTM. Thanks @rusdes.

@glemaitre, are you ok with this handling of the param renaming (see #24412 (comment)) ?

glemaitre · 2022-09-16T11:47:13Z

True this is fine with me. Thanks @rusdes Good to be merged.

rusdes · 2022-09-28T17:51:36Z

@lorentzenchr @glemaitre @jeremiedbb I was going through some code and found that sklearn.feature_extraction.DictVectorizer uses the parameter sparse. In my opinion, we should be renaming that to sparse_output as well. If that's indeed the case, I'm more than happy to work on that. Feel free to create a new issue and assign that to me :)

jeremiedbb · 2022-09-28T19:26:17Z

sklearn.feature_extraction.DictVectorizer uses the parameter sparse. In my opinion, we should be renaming that to sparse_output as well

@rusdes I agree with you. Feel free to directly open a PR to make the deprecation. Thanks

rusdes added 2 commits September 9, 2022 19:13

added sparse_output

f5e61f5

fix comment

ba99ed0

github-actions bot added the module:preprocessing label Sep 9, 2022

rusdes added 5 commits September 10, 2022 00:41

Merge remote-tracking branch 'upstream/main' into rename_sparse_output

42aaf36

fixed linting issue

ba798a6

Merge remote-tracking branch 'upstream/main' into rename_sparse_output

18f0138

fixed test issues and added sparse_ouyput to examples

f7805c6

fixed linting issue

c5fda65

glemaitre reviewed Sep 12, 2022

View reviewed changes

glemaitre changed the title ~~Rename OneHotEncoder option sparse to sparse_output~~ API Rename OneHotEncoder option sparse to sparse_output Sep 12, 2022

rusdes added 2 commits September 12, 2022 16:02

resolved issues pointed out

254eb2a

added entry in changelog

7ed87f6

lorentzenchr reviewed Sep 12, 2022

View reviewed changes

rusdes and others added 4 commits September 12, 2022 23:16

Update doc/whats_new/v1.2.rst

d3003ea

Co-authored-by: Christian Lorentzen <[email protected]>

fixed issues pointed out

639fb5e

changed to in _forest.py

60cbcc2

fixed a few test files

d798558

jeremiedbb reviewed Sep 15, 2022

View reviewed changes

sklearn/preprocessing/tests/test_encoders.py Outdated Show resolved Hide resolved

sklearn/preprocessing/tests/test_encoders.py Outdated Show resolved Hide resolved

doc/whats_new/v1.2.rst Outdated Show resolved Hide resolved

jeremiedbb reviewed Sep 15, 2022

View reviewed changes

sklearn/preprocessing/_encoders.py Outdated Show resolved Hide resolved

jeremiedbb reviewed Sep 15, 2022

View reviewed changes

sklearn/preprocessing/_encoders.py Outdated Show resolved Hide resolved

rusdes and others added 4 commits September 16, 2022 13:29

Merge remote-tracking branch 'upstream/main' into rename_sparse_output

3a7a3af

few changes according to latest review comments

dd957a3

Update sklearn/preprocessing/_encoders.py

36ace5d

Co-authored-by: Jérémie du Boisberranger <[email protected]>

Update doc/whats_new/v1.2.rst

2167283

Co-authored-by: Jérémie du Boisberranger <[email protected]>

jeremiedbb approved these changes Sep 16, 2022

View reviewed changes

glemaitre merged commit 7134e41 into scikit-learn:main Sep 16, 2022

rusdes deleted the rename_sparse_output branch September 16, 2022 11:51

rusdes restored the rename_sparse_output branch September 28, 2022 17:51

ndousis mentioned this pull request Dec 21, 2022

sklearn OneHotEncoder FutureWarning modernatx/seqlike#66

Closed

pedwrds mentioned this pull request Feb 1, 2024

Unable to import LazyRegressor from lazypredict.Supervised pedwrds/lazypredict#1

Closed

Uh oh!

API Rename OneHotEncoder option sparse to sparse_output #24412

API Rename OneHotEncoder option sparse to sparse_output #24412

Uh oh!

Conversation

rusdes commented Sep 9, 2022

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

glemaitre left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

glemaitre Sep 12, 2022

Choose a reason for hiding this comment

Uh oh!

lorentzenchr Sep 12, 2022

Choose a reason for hiding this comment

Uh oh!

jeremiedbb Sep 15, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lorentzenchr left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lorentzenchr Sep 12, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jeremiedbb left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jeremiedbb left a comment

Choose a reason for hiding this comment

Uh oh!

glemaitre commented Sep 16, 2022

Uh oh!

rusdes commented Sep 28, 2022

Uh oh!

jeremiedbb commented Sep 28, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jeremiedbb Sep 15, 2022 •

edited

Loading