-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
fit_params
in conjunction with FeatureUnion
#7136
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
What exactly are you trying to do? It looks like the pipeline acts the way you expect, and I think they should act the same way... |
Thanks for the reply. Pipelines work as expected, but intuitively, I would have thought that FeatureUnions work the same way. My goal would be to elicit specific behavior in the transformer "feature1" but not "feature0" for some custom transformers I built. However, since the proposed change could indeed break existing code, I will probably just subclass FeatureUnion. I brought this up because I was not sure whether the current behavior was intended. If you'd like to change it, I could add my implementation as a PR once it's finished, else I close this issue. |
I think we should change it, but in the most backward-compatible way. So if someone passes parameters that don't have the proper estimator names in them, they should be passed to all models. It might be a bit tricky. Generally, you should only use |
Okay, with that functionality, it will be a little tricky. If you pass Thanks for the explanation with the |
@BenjaminBossan do you want to add something to the dev docs or the roll your own estimator docs about when to use So let's leave out the If you pass |
I could but some hours ago I did not even know this about
Correct me if I'm wrong, but would the same scenario, with Pipeline, not raise a KeyError because it would split on |
Sorry, maybe I was a bit to terse (or did not think it through). from sklearn.pipeline import Pipeline, make_union
from sklearn.preprocessing import StandardScaler, MinMaxScaler
from sklearn.decomposition import PCA, NMF
thing = make_union(Pipeline([('scaler', StandardScaler()), ('factorization', PCA())]),
Pipeline([('scaler', MinMaxScaler()), ('factorization', NMF())]))
X = [[1, 0], [0, 1]]
thing.fit(X, factorization__y=X) Turns out that this doesn't even work, because |
Let's try again. So the following code doesn't make sense, but someone could have used it like this, and I don't want to break it: from sklearn.pipeline import Pipeline, make_union
from sklearn.preprocessing import StandardScaler, MinMaxScaler
from sklearn.decomposition import PCA
class MyPCA(PCA):
def fit(self, X, y=None, factor=1):
PCA.fit(self, X * factor, y)
return self
def fit_transform(self, X, y=None, factor=1):
return self.fit(X, y=None, factor=factor).transform(X)
thing = make_union(Pipeline([('scaler', StandardScaler()), ('factorization', MyPCA())]),
Pipeline([('scaler', MinMaxScaler()), ('factorization', MyPCA())]))
X = [[1, 0], [0, 1]]
thing.fit_transform(X, factorization__factor=3) |
I did not even notice that. Neither does it use For your examples, I agree that they should work. For the case of |
Do you see a way to fix this without breaking the code example (sorry this discussion paged out of my brain in the meantime) |
A proposal: We split this issue into
For the latter, a possibility could be to implement it in a way similar to Before that latter change, though, we just check for presence of a non matching name and raise a |
So deprecate current behavior and then do the same as pipeline? Sounds good :) |
Before we jump in the deep end, I think we need to clarify something. The API policy is that Currently neither |
I think |
So you don't think |
Is there currently anybody working on a solution to ensure that either |
#9566 is my latest attempt at making this handling more controllable. It
could do with some love. It might be an overly-complicated solution, and
potentially has no simple solution for the case of passing kwargs to
methods other than fit.
|
@jnothman that looks quite interesting. Why did you decide to move away from |
it's already implemented in a way that can't be made consistent in a
backwards-compatible way if you consider how it is implemented in Pipeline
(prefix-based routing) vs FeatureUnion (pass all fit_params to all steps);
and when we need to support the option of passing some params to the
estimator, the CV splitter and the scorer when fitting a GridSearchCV, let
alone a nested grid search, then I'm not sure there's a much simpler way
forward (especially if we require these params to have valid Python
identifiers for names).
Although, yes, FeatureUnion should support the same in fit as it does in
fit_transform.
… |
This issue is basically sampleprops #4497. This is part of the general roadmap, but retagging this issue for every release seems not helpful. Therefore untagging. |
Now fixed with metadata routing. |
Description
Using
fit_params
in conjunction withFeatureUnion
may not work as expected. It would be helpful iffit_params
names would be resolved to fit the estimators intransformer_list
.Steps/Code to Reproduce
Expected Results
Actual Results
Comment
Maybe the actual outcome is what it is supposed to be, but my expectation would be that
FeatureUnion
resolves the estimator name and only passes thefit_params
to the corresponding estimator. At least it would be very useful if it did.Versions
Python 3.4.5
NumPy 1.10.4
SciPy 0.17.0
Scikit-Learn 0.17.1
The text was updated successfully, but these errors were encountered: