Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

jschmidtml
Copy link

Reference Issues/PRs

Fixes #19465.

What does this implement/fix? Explain your changes.

This fix implements fitparams in the same way as Pipelines to meet parity

Any other comments?

fit_params = {'standard_scaler__sample_weight': df['equal_sample_weight']}
ct_wEqualWeight = ct.fit_transform(X=df[['x1']], y=df['y'], **fit_params)

assert_array_equal(sc1_xWeight, ct_xWeight, err_msg= "These should be equal")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pep8: there shouldn't be a space after "=" here. But you can also just remove the err_msg I think.

return "(%d of %d) Processing %s" % (idx, total, name)

def _fit_transform(self, X, y, func, fitted=False, column_as_strings=False):
def _check_fit_params(self, **fit_params):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this copied from pipeline or somewhere else?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it came from pipeline. I tried to make it work in the same way.

@amueller
Copy link
Member

amueller commented Oct 12, 2021

Thanks for the PR, looks good overall, I think :)

This makes sense to unblock this use-case but really we need to fix how to do this properly, cc @adrinjalali ;)

@adrinjalali
Copy link
Member

Thanks for the PR. But I would rather prioritize sample props and #21284 for the next release, in which case we wouldn't need this solution.

@amueller
Copy link
Member

@adrinjalali that makes sense. How would the code look like for a simple pipeline with a column transformer to pass sample_weight everywhere?

@adrinjalali
Copy link
Member

So taking an example from here:

from sklearn.compose import ColumnTransformer
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.preprocessing import OneHotEncoder
column_trans = ColumnTransformer(
    [('categories', OneHotEncoder(dtype='int').fit_requests(sample_weight=True), ['city']),
     ('title_bow', CountVectorizer().fit_requests(sample_weight=True), 'title')],
    remainder='drop', verbose_feature_names_out=False)

And if you put this column transformer in a pipeline, you don't need to do anything extra, since the consumer has already requested the metadata. Then if you call pipeline.fit(X, y, sample_weight=my_weights), it will forward them all to where they're requested.

@jschmidtml
Copy link
Author

Sorry, just getting back to this. That will work, thank you!

@adrinjalali
Copy link
Member

ColumnTransformer now supports metadata routing.

@adrinjalali adrinjalali closed this Mar 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Passing fitting parameters to transformers of ColumnTransformer
3 participants