Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

yenchenlin
Copy link
Contributor

This is a PR for #6425 .
I've added get_feature_names to OneHotEncoder.
Can @jnothman please have a look at this?

feature_names = []
for (i, n_value) in enumerate(self.n_values_):
for j in xrange(n_value):
feature_names.append(input_features[i])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you want something like "{}={}".format(name, value)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry can you elaborate more?

"{}={}".format(name, value)

What is name and value here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.format(input_features[i], j) rather

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh you mean adding j into feature_names to make it more clear, then the output will become something like

['x0 0', 'x0 1', 'x1 0', 'x1 1', 'x1 2', 'x2 0', 'x2 1', 'x2 2', 'x2 3']

Am I wrong?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather the = in there...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I agree that the following output is better:

['x0=0', 'x0=1', 'x1=0', 'x1=1', 'x1=2', 'x2=0', 'x2=1', 'x2=2', 'x2=3']

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the code.
Please have a look.
Thanks!

@yenchenlin yenchenlin changed the title [MRG] ENH Add get_feature_names for OneHotEncoder [WIP] ENH Add get_feature_names for OneHotEncoder Feb 24, 2016
@yenchenlin yenchenlin force-pushed the add-get_feature_names-for-onehotencoder branch 2 times, most recently from f4b4d5b to acb09d2 Compare February 24, 2016 06:48
else:
if len(input_features) != len(self.n_values_):
raise ValueError("Number of input_features must equal to "
"n_feature. it has to be of shape "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is n_feature?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh it should be n_features,
like n_features in this line:
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/preprocessing/data.py#L1710

I've updated the code.
Thanks!

@yenchenlin yenchenlin force-pushed the add-get_feature_names-for-onehotencoder branch from acb09d2 to 7aa1754 Compare February 24, 2016 10:48
input_features = ['x%d' % i for i in range(len(self.n_values_))]
else:
if len(input_features) != len(self.n_values_):
raise ValueError("Number of input_features must equal to "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is clunky still. How about Length of input_features is {0} but it must equal number of features when fitted: {1}.?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, and showing len(self.n_values_) in error message may be more informative too.

Code updated.

@yenchenlin yenchenlin force-pushed the add-get_feature_names-for-onehotencoder branch 2 times, most recently from 61e1331 to ac47fab Compare February 24, 2016 13:31
@yenchenlin yenchenlin changed the title [WIP] ENH Add get_feature_names for OneHotEncoder [MRG] ENH Add get_feature_names for OneHotEncoder Feb 25, 2016
@yenchenlin yenchenlin force-pushed the add-get_feature_names-for-onehotencoder branch from ac47fab to 15a4a75 Compare February 27, 2016 14:40
@yenchenlin
Copy link
Contributor Author

Hello @jnothman ,
Is there any problem in this?

@jnothman
Copy link
Member

LGTM

@jnothman jnothman changed the title [MRG] ENH Add get_feature_names for OneHotEncoder [MRG+1] ENH Add get_feature_names for OneHotEncoder Feb 28, 2016
@jnothman
Copy link
Member

but it may be subject to an embargo :p

@yenchenlin
Copy link
Contributor Author

Oh yeah ...
However, I think this function is really useful for OneHotEncoder since it makes OneHotEncoder's output become more clear than before.

@MechCoder
Copy link
Member

LGTM as well.

@MechCoder MechCoder changed the title [MRG+1] ENH Add get_feature_names for OneHotEncoder [MRG+2] ENH Add get_feature_names for OneHotEncoder Feb 29, 2016
@MechCoder
Copy link
Member

Actually, I take back my +1. n_values_ returns the maximum categorical value of every feature and not the number of categories.

data = [[1, 100], [10, 200]]
enc = OneHotEncoder(handle_unknown="error")
enc.fit(data)
enc.n_values_
[11, 201]

This should probably wait for #5270 and the unique_samples_ attribute in that PR

@jnothman
Copy link
Member

jnothman commented Mar 1, 2016

oh, i forgot about that... Withholding my +1.

@jnothman jnothman changed the title [MRG+2] ENH Add get_feature_names for OneHotEncoder [MRG] ENH Add get_feature_names for OneHotEncoder Mar 1, 2016
@amueller
Copy link
Member

amueller commented Oct 8, 2016

I think this should wait for the refactoring of OneHotEncoder for accepting strings in #7327

@jorisvandenbossche
Copy link
Member

This has been added in #10198 in the meantime. So closing this, but @yenchenlin thanks for working on it anyway!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants