-
-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Fix column_transformer to use fitparams like Pipeline #21311
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix column_transformer to use fitparams like Pipeline #21311
Conversation
fit_params = {'standard_scaler__sample_weight': df['equal_sample_weight']} | ||
ct_wEqualWeight = ct.fit_transform(X=df[['x1']], y=df['y'], **fit_params) | ||
|
||
assert_array_equal(sc1_xWeight, ct_xWeight, err_msg= "These should be equal") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pep8: there shouldn't be a space after "=" here. But you can also just remove the err_msg I think.
return "(%d of %d) Processing %s" % (idx, total, name) | ||
|
||
def _fit_transform(self, X, y, func, fitted=False, column_as_strings=False): | ||
def _check_fit_params(self, **fit_params): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this copied from pipeline or somewhere else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, it came from pipeline. I tried to make it work in the same way.
Thanks for the PR, looks good overall, I think :) This makes sense to unblock this use-case but really we need to fix how to do this properly, cc @adrinjalali ;) |
Thanks for the PR. But I would rather prioritize sample props and #21284 for the next release, in which case we wouldn't need this solution. |
@adrinjalali that makes sense. How would the code look like for a simple pipeline with a column transformer to pass sample_weight everywhere? |
So taking an example from here: from sklearn.compose import ColumnTransformer
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.preprocessing import OneHotEncoder
column_trans = ColumnTransformer(
[('categories', OneHotEncoder(dtype='int').fit_requests(sample_weight=True), ['city']),
('title_bow', CountVectorizer().fit_requests(sample_weight=True), 'title')],
remainder='drop', verbose_feature_names_out=False) And if you put this column transformer in a pipeline, you don't need to do anything extra, since the consumer has already requested the metadata. Then if you call |
Sorry, just getting back to this. That will work, thank you! |
ColumnTransformer now supports metadata routing. |
Reference Issues/PRs
Fixes #19465.
What does this implement/fix? Explain your changes.
This fix implements fitparams in the same way as Pipelines to meet parity
Any other comments?