`fit_params` in conjunction with `FeatureUnion` #7136

BenjaminBossan · 2016-08-04T09:21:14Z

Description

Using fit_params in conjunction with FeatureUnion may not work as expected. It would be helpful if fit_params names would be resolved to fit the estimators in transformer_list.

Steps/Code to Reproduce

from sklearn.pipeline import Pipeline, FeatureUnion
from sklearn.preprocessing import FunctionTransformer
from sklearn.datasets import make_classification

X, y = make_classification()

class MyTransformer(FunctionTransformer):
    def fit(self, X, y=None, **fit_params):
        print("Fit params are: ", fit_params)
        return super().fit(X, y)

pipe = Pipeline([
    ('step0', FunctionTransformer()),
    ('step1', FeatureUnion([
        ('feature0', MyTransformer()),
        ('feature1', MyTransformer()),
    ])),
    ('step2', FunctionTransformer()),
])

pipe.fit(X, y, step1__feature1__someparam=123)

Expected Results

# prints
Fit params are:  {'someparam': 123}

Actual Results

# prints
Fit params are:  {'feature1__someparam': 123}  # by feature0
Fit params are:  {'feature1__someparam': 123}  # by feature1

Comment

Maybe the actual outcome is what it is supposed to be, but my expectation would be that FeatureUnion resolves the estimator name and only passes the fit_params to the corresponding estimator. At least it would be very useful if it did.

Versions

Python 3.4.5
NumPy 1.10.4
SciPy 0.17.0
Scikit-Learn 0.17.1

The text was updated successfully, but these errors were encountered:

amueller · 2016-08-04T16:13:37Z

What exactly are you trying to do?
Usually fit_params is only sample_weights which are usually the same across transformers.
Though one might argue that the current version is not very useful, as it will raise errors in most cases (not all transformers support the same fit_params.
It's a bit tricky to change this now from a backward compatibility perspective though we could check whether there is a __ in the parameter name.

It looks like the pipeline acts the way you expect, and I think they should act the same way...

BenjaminBossan · 2016-08-04T17:13:08Z

Thanks for the reply. Pipelines work as expected, but intuitively, I would have thought that FeatureUnions work the same way.

My goal would be to elicit specific behavior in the transformer "feature1" but not "feature0" for some custom transformers I built. However, since the proposed change could indeed break existing code, I will probably just subclass FeatureUnion.

I brought this up because I was not sure whether the current behavior was intended. If you'd like to change it, I could add my implementation as a PR once it's finished, else I close this issue.

amueller · 2016-08-04T17:17:41Z

I think we should change it, but in the most backward-compatible way. So if someone passes parameters that don't have the proper estimator names in them, they should be passed to all models. It might be a bit tricky.

Generally, you should only use fit parameters for things that have shape n_samples. Anything that changes behavior should usually be an __init__ parameter.

BenjaminBossan · 2016-08-04T17:43:42Z

Okay, with that functionality, it will be a little tricky. If you pass step1__feature1__someparam=123, it should only be forwarded to feature1 but if you pass step1__feature2__someparam=123, it should be passed to feature1 and feature0, did I get that right?

Thanks for the explanation with the n_samples. I did not actually find any explanation for the scope of fit_params.

amueller · 2016-08-04T17:48:37Z

@BenjaminBossan do you want to add something to the dev docs or the roll your own estimator docs about when to use fit_params?

So let's leave out the step1 as that is handled by Pipeline and FeatureUnion never sees it.

If you pass feature1__someparam=123 I'd pass it to feature1 but if you pass someparam=123 I'd pass it to all.
The more tricky case is that if you pass feature3__someparam=123 and feature3 doesn't exist in the FeatureUnion, we should probably pass it to all (as the feature union might contain another pipeline).

BenjaminBossan · 2016-08-04T19:15:07Z

@BenjaminBossan do you want to add something to the dev docs or the roll your own estimator docs about when to use fit_params?

I could but some hours ago I did not even know this about fit_params so maybe someone who knows better should do it ;)

The more tricky case is that if you pass feature3__someparam=123 and feature3 doesn't exist in the FeatureUnion, we should probably pass it to all (as the feature union might contain another pipeline).

Correct me if I'm wrong, but would the same scenario, with Pipeline, not raise a KeyError because it would split on __ and then fail to look up the name in fit_params_steps?

amueller · 2016-08-04T19:46:45Z

Sorry, maybe I was a bit to terse (or did not think it through).
I was thinking about

from sklearn.pipeline import Pipeline, make_union
from sklearn.preprocessing import StandardScaler, MinMaxScaler
from sklearn.decomposition import PCA, NMF

thing = make_union(Pipeline([('scaler', StandardScaler()), ('factorization', PCA())]),
                   Pipeline([('scaler', MinMaxScaler()), ('factorization', NMF())]))
X = [[1, 0], [0, 1]]
thing.fit(X, factorization__y=X)

Turns out that this doesn't even work, because FeatureUnion.fit doesn't have fit_params. I think that's a related but separate bug.

amueller · 2016-08-04T19:54:09Z

Let's try again. So the following code doesn't make sense, but someone could have used it like this, and I don't want to break it:

from sklearn.pipeline import Pipeline, make_union
from sklearn.preprocessing import StandardScaler, MinMaxScaler
from sklearn.decomposition import PCA

class MyPCA(PCA):
    def fit(self, X, y=None, factor=1):
        PCA.fit(self, X * factor, y)
        return self
    def fit_transform(self, X, y=None, factor=1):
        return self.fit(X, y=None, factor=factor).transform(X)

thing = make_union(Pipeline([('scaler', StandardScaler()), ('factorization', MyPCA())]),
                   Pipeline([('scaler', MinMaxScaler()), ('factorization', MyPCA())]))
X = [[1, 0], [0, 1]]
thing.fit_transform(X, factorization__factor=3)

BenjaminBossan · 2016-08-04T20:16:41Z

Turns out that this doesn't even work, because FeatureUnion.fit doesn't have fit_params.

I did not even notice that. Neither does it use transformer_weights, btw.

For your examples, I agree that they should work.

For the case of feature3__someparam=123, when feature3 does not exist, though, I could imagine that this might be a source of error when it is silently passed and later ignored. As I said, Pipeline would raise a KeyError in the same situation.

amueller · 2016-09-12T18:43:32Z

Do you see a way to fix this without breaking the code example (sorry this discussion paged out of my brain in the meantime)

BenjaminBossan · 2016-09-12T20:05:54Z

A proposal: We split this issue into

fixing the bugs in FeatureUnion
dealing with the fit_params.

For the latter, a possibility could be to implement it in a way similar to Pipeline, i.e. raising a KeyError when there is a dunder without a matching name. For me this sounds like the cleaner solution and my guess is few people's code would be affected.

Before that latter change, though, we just check for presence of a non matching name and raise a DeprecationWarning that such things will raise errors in the future. How about that?

amueller · 2016-09-12T20:36:07Z

So deprecate current behavior and then do the same as pipeline? Sounds good :)

jnothman · 2016-09-13T01:02:25Z

Before we jump in the deep end, I think we need to clarify something. The API policy is that fit() and other methods should take only parameters that are data-dependent, i.e. have the same shape[0] as X. I suspect that Pipeline's fit_params design hails from an era when this policy was not established or was unclear.

Currently neither Pipeline nor FeatureUnion explicitly supports sample_weight, though they should. Perhaps they should be consistent with fit_params, but I think the right solution is still unclear and needs motivating examples.

amueller · 2016-09-14T15:14:48Z

I think fit_params is more or less just a more general way to implement sample_weights, with a slightly different API. I tried to implement sample_props once and it was mostly renaming fit_params (and sometimes moving it from __init__ to fit)

amueller · 2016-09-14T15:16:03Z

So you don't think fit_params should have parameter delegation? It seems to have it in Pipeline and not have it in FeatureUnion, which seems pretty inconsistent

owlas · 2018-06-08T14:38:47Z

Is there currently anybody working on a solution to ensure that either sample_weights or fit_params are being passed correctly through Pipeline and FeatureUnion objects?

jnothman · 2018-06-09T11:24:45Z

#9566 is my latest attempt at making this handling more controllable. It could do with some love. It might be an overly-complicated solution, and potentially has no simple solution for the case of passing kwargs to methods other than fit.

owlas · 2018-06-09T14:08:40Z

@jnothman that looks quite interesting. Why did you decide to move away from fit_params? This seems to be a good solution for now and just needs to be implemented consistently across the API

jnothman · 2018-06-09T14:27:39Z

it's already implemented in a way that can't be made consistent in a backwards-compatible way if you consider how it is implemented in Pipeline (prefix-based routing) vs FeatureUnion (pass all fit_params to all steps); and when we need to support the option of passing some params to the estimator, the CV splitter and the scorer when fitting a GridSearchCV, let alone a nested grid search, then I'm not sure there's a much simpler way forward (especially if we require these params to have valid Python identifiers for names). Although, yes, FeatureUnion should support the same in fit as it does in fit_transform.

…

amueller · 2019-03-07T17:14:27Z

This issue is basically sampleprops #4497. This is part of the general roadmap, but retagging this issue for every release seems not helpful. Therefore untagging.

adrinjalali · 2024-08-22T15:47:44Z

Now fixed with metadata routing.

amueller added the API label Aug 4, 2016

amueller mentioned this issue Aug 4, 2016

arguments to fit in "rolling your own estimator" #7142

Closed

amueller added the Bug label Sep 12, 2016

amueller modified the milestone: 0.19 Sep 29, 2016

rth mentioned this issue Oct 4, 2016

[RFC] Standardize parallel meta-estimators #7570

Closed

jnothman modified the milestones: 0.20, 0.19 Jun 13, 2017

owlas mentioned this issue Jun 8, 2018

Add fit_params to FeatureUnion #8441

Closed

glemaitre modified the milestones: 0.20, 0.21 Jun 13, 2018

amueller removed this from the 0.21 milestone Mar 7, 2019

thomasjpfan mentioned this issue Mar 12, 2019

[MRG+1] Add verbose option to Pipeline, FeatureUnion, and ColumnTransformer #11364

Merged

a-wozniakowski mentioned this issue Aug 3, 2020

Support fit_params in stacking #18028

Closed

adrinjalali mentioned this issue Oct 29, 2020

[WIP] sample props (proposal 4) #16079

Closed

adrinjalali mentioned this issue Jun 24, 2021

sample-props alternate implementation #20350

Closed

adrinjalali mentioned this issue Aug 18, 2022

FEAT SLEP006: metadata routing infrastructure #24027

Merged

adrinjalali mentioned this issue Apr 26, 2023

SLEP006 - Metadata Routing task list #22893

Open

28 tasks

adrinjalali closed this as completed Aug 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`fit_params` in conjunction with `FeatureUnion` #7136

`fit_params` in conjunction with `FeatureUnion` #7136

BenjaminBossan commented Aug 4, 2016

amueller commented Aug 4, 2016

BenjaminBossan commented Aug 4, 2016

amueller commented Aug 4, 2016

BenjaminBossan commented Aug 4, 2016

amueller commented Aug 4, 2016

BenjaminBossan commented Aug 4, 2016

amueller commented Aug 4, 2016 •

edited

Loading

amueller commented Aug 4, 2016

BenjaminBossan commented Aug 4, 2016

amueller commented Sep 12, 2016

BenjaminBossan commented Sep 12, 2016

amueller commented Sep 12, 2016

jnothman commented Sep 13, 2016

amueller commented Sep 14, 2016

amueller commented Sep 14, 2016

owlas commented Jun 8, 2018

jnothman commented Jun 9, 2018 via email

owlas commented Jun 9, 2018

jnothman commented Jun 9, 2018 via email

amueller commented Mar 7, 2019

adrinjalali commented Aug 22, 2024

fit_params in conjunction with FeatureUnion #7136

fit_params in conjunction with FeatureUnion #7136

Comments

BenjaminBossan commented Aug 4, 2016

Description

Steps/Code to Reproduce

Expected Results

Actual Results

Comment

Versions

amueller commented Aug 4, 2016

BenjaminBossan commented Aug 4, 2016

amueller commented Aug 4, 2016

BenjaminBossan commented Aug 4, 2016

amueller commented Aug 4, 2016

BenjaminBossan commented Aug 4, 2016

amueller commented Aug 4, 2016 • edited Loading

amueller commented Aug 4, 2016

BenjaminBossan commented Aug 4, 2016

amueller commented Sep 12, 2016

BenjaminBossan commented Sep 12, 2016

amueller commented Sep 12, 2016

jnothman commented Sep 13, 2016

amueller commented Sep 14, 2016

amueller commented Sep 14, 2016

owlas commented Jun 8, 2018

jnothman commented Jun 9, 2018 via email

owlas commented Jun 9, 2018

jnothman commented Jun 9, 2018 via email

amueller commented Mar 7, 2019

adrinjalali commented Aug 22, 2024

`fit_params` in conjunction with `FeatureUnion` #7136

`fit_params` in conjunction with `FeatureUnion` #7136

amueller commented Aug 4, 2016 •

edited

Loading