-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
What happend to the idea of adding a 'handle_missing' parameter to the OneHotEncoder? #26543
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I am not really keen to drop missing values with an option. I would prefer some code allowing to make it more explicit than just an option. We currently can do that: import sklearn
from sklearn.preprocessing import FunctionTransformer, OneHotEncoder
from sklearn.pipeline import make_pipeline
sklearn.set_config(transform_output="pandas")
def _drop_None_cols(df):
col_names = [col for col in df.columns if "None" in col]
if len(col_names):
return df.drop(columns=col_names)
return df
encoder_dropping_None = make_pipeline(
OneHotEncoder(sparse_output=False),
FunctionTransformer(_drop_None_cols),
)
encoder_dropping_None.fit_transform(test_df)
|
I think it would make sense to have:
The |
@glemaitre: Thanks for your answer. What you suggest is what I am actually doing already. In order to avoid collinearity, I could use the parameter |
@ogrisel: Sorry, I'm not sure to understand what you are suggesting. |
I am willing to work on that and based on @ogrisel comment I'm thinking of adding two parameters:
I chose the name missing_values for consistency with sklearn.impute, and the description aligns with their style. I'll be waiting for feedback on that and will start working on it if it looks good to everyone. |
Discussed in #26531
Originally posted by woodly0 June 7, 2023
Hello,
I'm having trouble understanding what finally happened to the idea of introducing a
handle_missing
parameter for theOneHotEncoder
. My current project could still benefit from such an implementation.There are many existing issues regarding this topic, however, I cannot deduct what was finally decided/implemented and what wasn't.
Considering the following features:
when using the encoder:
I get the output:
but what I'm actually looking for is to remove the
None
, i.e. not create a new feature but set all the others to zero:Is there a way to achieve this without using another transformer object?
The text was updated successfully, but these errors were encountered: