-
-
Notifications
You must be signed in to change notification settings - Fork 26.3k
[MRG] ENH Add get_feature_names for OneHotEncoder #6441
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG] ENH Add get_feature_names for OneHotEncoder #6441
Conversation
sklearn/preprocessing/data.py
Outdated
| feature_names = [] | ||
| for (i, n_value) in enumerate(self.n_values_): | ||
| for j in xrange(n_value): | ||
| feature_names.append(input_features[i]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you want something like "{}={}".format(name, value)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry can you elaborate more?
"{}={}".format(name, value)
What is name and value here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.format(input_features[i], j) rather
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh you mean adding j into feature_names to make it more clear, then the output will become something like
['x0 0', 'x0 1', 'x1 0', 'x1 1', 'x1 2', 'x2 0', 'x2 1', 'x2 2', 'x2 3']Am I wrong?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather the = in there...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I agree that the following output is better:
['x0=0', 'x0=1', 'x1=0', 'x1=1', 'x1=2', 'x2=0', 'x2=1', 'x2=2', 'x2=3']There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated the code.
Please have a look.
Thanks!
f4b4d5b to
acb09d2
Compare
sklearn/preprocessing/data.py
Outdated
| else: | ||
| if len(input_features) != len(self.n_values_): | ||
| raise ValueError("Number of input_features must equal to " | ||
| "n_feature. it has to be of shape " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is n_feature?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh it should be n_features,
like n_features in this line:
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/preprocessing/data.py#L1710
I've updated the code.
Thanks!
acb09d2 to
7aa1754
Compare
sklearn/preprocessing/data.py
Outdated
| input_features = ['x%d' % i for i in range(len(self.n_values_))] | ||
| else: | ||
| if len(input_features) != len(self.n_values_): | ||
| raise ValueError("Number of input_features must equal to " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is clunky still. How about Length of input_features is {0} but it must equal number of features when fitted: {1}.?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, and showing len(self.n_values_) in error message may be more informative too.
Code updated.
61e1331 to
ac47fab
Compare
ac47fab to
15a4a75
Compare
|
Hello @jnothman , |
|
LGTM |
|
but it may be subject to an embargo :p |
|
Oh yeah ... |
|
LGTM as well. |
|
Actually, I take back my +1. This should probably wait for #5270 and the |
|
oh, i forgot about that... Withholding my +1. |
|
I think this should wait for the refactoring of OneHotEncoder for accepting strings in #7327 |
|
This has been added in #10198 in the meantime. So closing this, but @yenchenlin thanks for working on it anyway! |
This is a PR for #6425 .
I've added
get_feature_namestoOneHotEncoder.Can @jnothman please have a look at this?