-
-
Notifications
You must be signed in to change notification settings - Fork 26.3k
ENH Add get_feature_names_out to FunctionTransformer #21569
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH Add get_feature_names_out to FunctionTransformer #21569
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For validate=False and the feature_names_out parameter is set, I am propose we set feature_names_in_ and n_features_in_, but not validate it during fit or transform.
As for the API, I am thinking of restricting feature_names_out two options at first:
None: No feature names out- callable: User provide function to compute feature names out
Two more options for follow up PRs:
'one-to-one': Feature names out == feature names in- array-like of strings: I am currently unsure about the use case for this option that the callable can not resolve. But we can discuss in a follow up.
|
Thanks @thomasjpfan . I'll remove the option to set |
I think the default still needs to be Let's add |
…ake default 'one-to-one'
|
I just read your message, I had already updated the PR to remove the option to pass an array-like of strings, and I set the default to 'one-to-one'. |
It is, but I do not think we can assume it. If a user pass a function that creates a column then
We can use scikit-learn/sklearn/utils/metaestimators.py Line 140 in 48e83df
|
|
Thanks @thomasjpfan . I updated the PR to make None the default. Right now get_feature_names_out raises a ValueError if |
|
I ran black, and flake8, make test-coverage, etc., but they didn't catch the issues with the numpydoc (a newline was missing) or with v1.1.rst (someone else had forgotten a `). I looked in the Contributing doc, but I can't find instructions to catch these errors before I push the code to github. Did I miss something? |
|
Hi @thomasjpfan, is there anything else you need me to do for this PR? |
For some reason the numpydoc validation was done externally and not part as the main test suite. I am not sure why we do that. We should probably run those checks as part of the main test suite to avoid the confusion. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I think the PR in its current state should cover most useful cases. I did not see any particular defect. Just a small improvement suggestion for one of the exception messages below:
…geron/scikit-learn into function_transformer_feature_names_out
|
Thanks for reviewing, Olivier. I just made the change you suggested. |
|
|
In such cases, should I pull and merge |
That would not hurt, and if the PR is "CI green ticked", it might get a better chance to attract reviewers' attention :) |
|
Thanks @ogrisel , I merged main, now there's a beautiful green tick. 😊 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update @ageron !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
Thanks for the review. 👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I copied your function in my scikit enviroment and tried to use it in my enviroment. However I still get the error as below, where preprocessor is my columntransformer and I try the following code:
preprocessor.get_feature_names_out()
Transformer argument looks like this:
('log', FunctionTransformer(np.log1p, validate=True), log_features)
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Input In [10], in <cell line: 3>()
1 xt = preprocessor.transform(X_test)
2 #mapie.single_estimator_[1].estimator
----> 3 preprocessor.get_feature_names_out()
File ~\miniconda3\envs\Master_ML\lib\site-packages\sklearn\compose\_column_transformer.py:481, in ColumnTransformer.get_feature_names_out(self, input_features)
479 transformer_with_feature_names_out = []
480 for name, trans, column, _ in self._iter(fitted=True):
--> 481 feature_names_out = self._get_feature_name_out_for_transformer(
482 name, trans, column, input_features
483 )
484 if feature_names_out is None:
485 continue
File ~\miniconda3\envs\Master_ML\lib\site-packages\sklearn\compose\_column_transformer.py:446, in ColumnTransformer._get_feature_name_out_for_transformer(self, name, trans, column, feature_names_in)
444 # An actual transformer
445 if not hasattr(trans, "get_feature_names_out"):
--> 446 raise AttributeError(
447 f"Transformer {name} (type {type(trans).__name__}) does "
448 "not provide get_feature_names_out."
449 )
450 if isinstance(column, Iterable) and not all(
451 isinstance(col, str) for col in column
452 ):
453 column = _safe_indexing(feature_names_in, column)
AttributeError: Transformer log (type FunctionTransformer) does not provide get_feature_names_out.
|
This feature is not released yet and will be released in v1.1. If you want to try out the feature now, you can install the nightly build: pip install --pre --extra-index https://pypi.anaconda.org/scipy-wheels-nightly/simple scikit-learn |
|
Are you sure this working as intended? I just installed Nightly and I still get exactly this error. The code is in my enviroment, at least function_transformer_.py has this method implemented. |
|
import numpy as np
import pandas as pd
from sklearn.preprocessing import FunctionTransformer
mean_transformer = FunctionTransformer(
func=np.log1p,
feature_names_out="one-to-one",
validate=True
)
X = pd.DataFrame({"my_feature": [1, 2, 3]})
X_trans = mean_transformer.fit_transform(X)
print(mean_transformer.get_feature_names_out())
# ['my_feature'] |
|
Thank you Thomas ... sorry for asking all these question that might be totally obvious :( |

Reference Issues/PRs
Follow-up on #18444.
Part of #21308.
This new feature was discussed in #21079.
What does this implement/fix? Explain your changes.
Adds the
get_feature_names_outmethod and a new parameterfeature_names_outtopreprocessing.FunctionTransformer. By default,get_feature_names_outreturns the input feature names, but you can setfeature_names_outto return a different list, which is especially useful when the number of output features differs from the number of input features.For example, here's a
FunctionTransformerthat outputs a single feature, equal to the input's mean along axis=1:The
feature_names_outparameter may also be a callable. This is useful if the output feature names depend on the input feature names, and/or if they depend on parameters likekw_args. Here's an example that uses both. It's a transformer that appendsnrandom features to existing features:Any other comments?
I have some concerns regarding the fact that
validateisFalseby default, which means thatn_features_in_andfeature_names_in_are not set automatically. So if you create aFunctionTransformerwith the defaultvalidate=Falseandfeature_names_out=None, then when you callget_feature_names_outwithout any argument, it will raise an exception (unlesstransformwas called before andfuncsetn_feature_in_orfeature_names_in_). I tried to make this clear in the error message, but I'm worried that this will confuse users. Wdyt?And if
validate=Falseand you setfeature_names_outto a callable, and callget_feature_names_outwith no arguments, then the callable will getinput_features=Noneas input (unlesstransformwas called before andfuncsetn_features_in_orfeature_names_in_). Users may be surprised by this. Should we output a warning in this case? Wdyt?Moreover, as shown in the second code example above, the output feature names may depend on
kw_args, so iffeature_names_outis a callable,get_feature_names_outpassesselfto it, plus theinput_features. I considered checkingfeature_names_out.__code__.co_varnamesto decide whether to pass no arguments, or just theinput_features, or theinput_featuresandself. But__code__is not used anywhere in the code base, andinspectis not used much, so I'm not sure whether such introspection would be frowned upon? I decided that it was simple enough to require users to always have two arguments: the transformer itself, and theinput_features. Wdyt?Lastly, when users want to create a
FunctionTransformerthat outputs a single feature, I expect that many will be tempted to setfeature_names_outto a string instead of a list. To keep things consistent, I decided to raise an exception in this case, and have a clear error message to tell them to use["foo"]instead. Wdyt?