ENH Add get_feature_names_out for RandomTreesEmbedding module #21762

MaxwellLZH · 2021-11-23T15:21:38Z

Reference Issues/PRs

Part of issue #21308 . This is the same PR as #21459 (I got into some git issue with the last PR )

What does this implement/fix? Explain your changes.

Implementing get_feature_names_out for RandomForestEmbedding, VotingClassifier, VotingRegressor, StackingClassifier and StackingRegressor, with corresponding test cases.

thomasjpfan

Thank you for the PR @MaxwellLZH ! I recommend breaking this PR into 3 PRs:

Keep this one for RandomTreesEmbedding.
Another for Voting*
Another for Stacking* (Do not start this one until Voting* is complete since they are related and will have the same discussions)

This is because I think there is an argument for generating more descriptive names for each of the cases above.

sklearn/ensemble/tests/test_forest.py

sklearn/ensemble/_forest.py

thomasjpfan · 2021-11-26T19:34:52Z

sklearn/ensemble/tests/test_forest.py

+    assert_array_equal(
+        [f"randomtreesembedding{i}" for i in range(hasher._n_features_out)], names
+    )


I think it is better to explicitly test the public API. We can transform and get the number of features out:

n_features_out = hasher.transform(X).shape[1] assert_array_equal( [f"randomtreesembedding{i}" for i in range(n_features_out)], names )

There is an argument for using something like randomtreesembedding_3_10, where 3 is represents the tree that used to generate the leaf, and 10 is the leaf index.

Shall I leave the naming as it is for now? if we decided to go for randomtreesembedding_{i}_{j} then we can change the test cases accordingly later?

We need to make a decision in this PR. I like using randomtreesembedding_{i}_{j}, can we update this PR to use this formatting and see what other reviewers think?

This means a custom get_feature_names_out for tree embedding.

I've added a custom get_feature_names_out for tree embedding, where i is tree index starting from 1 and j is leaf index as suggested.

sklearn/ensemble/tests/test_voting.py

sklearn/ensemble/_forest.py

Co-authored-by: Thomas J. Fan <[email protected]>

thomasjpfan

Please add an entry to the change log at doc/whats_new/v1.1.rst with tag |Enhacement|. Like the other entries there, please reference this pull request with :pr: and credit yourself (and other contributors if applicable) with :user:.

sklearn/ensemble/_forest.py

sklearn/ensemble/tests/test_forest.py

sklearn/ensemble/_forest.py

Co-authored-by: Thomas J. Fan <[email protected]>

thomasjpfan

Thanks for the update! I am happy with the naming for the features here. Let's see what other reviewers think.

ogrisel

LGTM once the suggestions below as taken into account. I found the feature names surprising so I think it's was necessary to make the docstring more explicit and add an inline comment in the test.

sklearn/ensemble/_forest.py

sklearn/ensemble/tests/test_forest.py

doc/whats_new/v1.1.rst

thomasjpfan · 2022-03-07T16:00:18Z

I found the feature names surprising so I think it's was necessary to make the docstring more explicit and add an inline comment in the test.

Do you think it is better to ignore the internal indices from the trees and use 0, 1, 2, etc for the leaf indices?

I am okay with that option as well.

ogrisel · 2022-03-07T16:06:40Z

Do you think it is better to ignore the internal indices from the trees and use 0, 1, 2, etc for the leaf indices?

The current implementation is more informative but potentially a bit confusing. Using contiguous, leaf-only indexing would make the feature names less dependent on the internal tree data-structure but in a way this internal detail is already part of the public API because those are the indices returned by the apply public method.

So +0 for keeping the current indexing / naming scheme.

Co-authored-by: Olivier Grisel <[email protected]>

jeremiedbb · 2022-03-08T17:40:33Z

Do you think it is better to ignore the internal indices from the trees and use 0, 1, 2, etc for the leaf indices?

I don't have a strong preference. I'm fine with reflecting the tree structure with the additional comments from Olivier. It might make it easier to debug if needed as well.

jeremiedbb

LGTM. Thanks @MaxwellLZH

…-learn#21762) Co-authored-by: Thomas J. Fan <[email protected]> Co-authored-by: Jérémie du Boisberranger <[email protected]> Co-authored-by: Olivier Grisel <[email protected]>

MaxwellLZH added 2 commits November 23, 2021 23:00

add feature_names_out for ensemble module

8931db7

bug fix

7dca042

github-actions bot added the module:ensemble label Nov 23, 2021

thomasjpfan reviewed Nov 26, 2021

View reviewed changes

thomasjpfan mentioned this pull request Jan 3, 2022

Implement get_feature_names_out for all estimators #21308

Closed

14 tasks

MaxwellLZH and others added 8 commits February 10, 2022 11:14

move Mixins to the left

da0b311

Co-authored-by: Thomas J. Fan <[email protected]>

fix typo

9b4379f

Co-authored-by: Thomas J. Fan <[email protected]>

change fit logic and test cases for RandomTreesEmbedding

1203f48

black formatting

b1aac98

bug fix

862d2ff

revert changes in voting and stacking

f05ec1f

use randomtreesembedding_{i}_{j} for feature_names_out

11b5be7

fix failed docstring test

ebd255d

thomasjpfan reviewed Feb 11, 2022

View reviewed changes

sklearn/ensemble/_forest.py Outdated Show resolved Hide resolved

MaxwellLZH and others added 9 commits February 14, 2022 11:26

validate input_features

2b83260

Co-authored-by: Thomas J. Fan <[email protected]>

keep tree index starts with 0

302a178

Co-authored-by: Thomas J. Fan <[email protected]>

better variable naming

be6d5ba

Co-authored-by: Thomas J. Fan <[email protected]>

remove _ClassNamePrefixFeaturesOutMixin and include TreansformerMixin

1d9033a

Co-authored-by: Thomas J. Fan <[email protected]>

return ndarray of object dtype

41c55ad

Co-authored-by: Thomas J. Fan <[email protected]>

update documentation

4c5b329

Co-authored-by: Thomas J. Fan <[email protected]>

fix import error

05bcf72

Update test case & apply black formatting

2a6108e

Add entry in whatsnew

6726028

thomasjpfan approved these changes Feb 15, 2022

View reviewed changes

thomasjpfan changed the title ~~ENH Add get_feature_names_out for ensemble module~~ ENH Add get_feature_names_out for RandomTreesEmbedding module Feb 15, 2022

ogrisel approved these changes Mar 7, 2022

View reviewed changes

sklearn/ensemble/_forest.py Outdated Show resolved Hide resolved

sklearn/ensemble/tests/test_forest.py Show resolved Hide resolved

doc/whats_new/v1.1.rst Outdated Show resolved Hide resolved

Update sklearn/ensemble/tests/test_forest.py

d8df622

Co-authored-by: Olivier Grisel <[email protected]>

jeremiedbb and others added 3 commits March 8, 2022 18:32

Update sklearn/ensemble/_forest.py

26d54d4

Co-authored-by: Olivier Grisel <[email protected]>

Update doc/whats_new/v1.1.rst

9dfe863

Co-authored-by: Olivier Grisel <[email protected]>

Merge branch 'main' into fet/ensemble-feature-names-out

a373726

jeremiedbb approved these changes Mar 8, 2022

View reviewed changes

jeremiedbb merged commit 26f5b26 into scikit-learn:main Mar 8, 2022

Uh oh!

ENH Add get_feature_names_out for RandomTreesEmbedding module #21762

ENH Add get_feature_names_out for RandomTreesEmbedding module #21762

Uh oh!

Conversation

MaxwellLZH commented Nov 23, 2021

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Uh oh!

thomasjpfan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

thomasjpfan Nov 26, 2021

Choose a reason for hiding this comment

Uh oh!

MaxwellLZH Feb 10, 2022

Choose a reason for hiding this comment

Uh oh!

thomasjpfan Feb 10, 2022

Choose a reason for hiding this comment

Uh oh!

MaxwellLZH Feb 11, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

thomasjpfan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

thomasjpfan left a comment

Choose a reason for hiding this comment

Uh oh!

ogrisel left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

thomasjpfan commented Mar 7, 2022

Uh oh!

ogrisel commented Mar 7, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jeremiedbb commented Mar 8, 2022

Uh oh!

jeremiedbb left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ogrisel commented Mar 7, 2022 •

edited

Loading