-
-
Notifications
You must be signed in to change notification settings - Fork 26.3k
[MRG] ENH Add get_feature_names for OneHotEncoder #6441
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG] ENH Add get_feature_names for OneHotEncoder #6441
Conversation
sklearn/preprocessing/data.py
Outdated
feature_names = [] | ||
for (i, n_value) in enumerate(self.n_values_): | ||
for j in xrange(n_value): | ||
feature_names.append(input_features[i]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you want something like "{}={}".format(name, value)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry can you elaborate more?
"{}={}".format(name, value)
What is name
and value
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.format(input_features[i], j)
rather
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh you mean adding j
into feature_names
to make it more clear, then the output will become something like
['x0 0', 'x0 1', 'x1 0', 'x1 1', 'x1 2', 'x2 0', 'x2 1', 'x2 2', 'x2 3']
Am I wrong?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather the = in there...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I agree that the following output is better:
['x0=0', 'x0=1', 'x1=0', 'x1=1', 'x1=2', 'x2=0', 'x2=1', 'x2=2', 'x2=3']
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated the code.
Please have a look.
Thanks!
f4b4d5b
to
acb09d2
Compare
sklearn/preprocessing/data.py
Outdated
else: | ||
if len(input_features) != len(self.n_values_): | ||
raise ValueError("Number of input_features must equal to " | ||
"n_feature. it has to be of shape " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is n_feature?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh it should be n_features
,
like n_features
in this line:
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/preprocessing/data.py#L1710
I've updated the code.
Thanks!
acb09d2
to
7aa1754
Compare
sklearn/preprocessing/data.py
Outdated
input_features = ['x%d' % i for i in range(len(self.n_values_))] | ||
else: | ||
if len(input_features) != len(self.n_values_): | ||
raise ValueError("Number of input_features must equal to " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is clunky still. How about Length of input_features is {0} but it must equal number of features when fitted: {1}.
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, and showing len(self.n_values_)
in error message may be more informative too.
Code updated.
61e1331
to
ac47fab
Compare
ac47fab
to
15a4a75
Compare
Hello @jnothman , |
LGTM |
but it may be subject to an embargo :p |
Oh yeah ... |
LGTM as well. |
Actually, I take back my +1.
This should probably wait for #5270 and the |
oh, i forgot about that... Withholding my +1. |
I think this should wait for the refactoring of OneHotEncoder for accepting strings in #7327 |
This has been added in #10198 in the meantime. So closing this, but @yenchenlin thanks for working on it anyway! |
This is a PR for #6425 .
I've added
get_feature_names
toOneHotEncoder
.Can @jnothman please have a look at this?