-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG+1] ENH: Adds FunctionTransformer #4798
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
9d37fe2
to
17df2a7
Compare
Custom Transformers | ||
=================== | ||
|
||
Often, you will want to convert an existing python function into transformer to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"into a transformer"
17df2a7
to
e0db0a7
Compare
CallableTransformer allows a user to convert a standard python callable into a transformer for use in a Pipeline.
e0db0a7
to
190caaf
Compare
lgtm |
@GaelVaroquaux Could you take a look please? |
be passed after X and y. | ||
kwargs : dict, optional | ||
A dictionary of keyword arguments to be passed to func. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a simple / short example here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same as in the user guide is fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for including a simple example as a doctest in the docstring of the class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion for example: func=partial(getattr, 'data')
, and feed the transformer a dict {'data': X, other stuff...}
Sorry, I can't think of better names.
I often end up doing this when I do e.g. text classification on conversational data and I have messages in both directions. I store my samples as {'from': from, 'to': to} and use a FeatureUnion
of two pipelines, each grabbing the respective field and then doing a CountVectorizer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
except that partial(getattr, 'data')
is more-or-less operator.attrgetter
Sorry, @llllllllll, I am too tired to review this tonight. The jetlag is still killing me. |
@GaelVaroquaux No worries |
func : callable, optional default=None | ||
The callable to use for the transformation. This will be passed | ||
the same arguments as transform, with args and kwargs forwarded. | ||
If func is None, then func will be the identity function. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please insert one blankline before the documentation of the next parameter.
I'm concerned about what happens when a We fundamentally want this to work with |
This is a good point, do you think that this should accept an optional function to act on y? |
I think it needs to be the same function, as it might need to use the value of The very simplest solution is to just not allow this, and just ignore the The next simplest solution is probably to add an attribute ( I don't know what other solutions are, introspecting the callable, catching the exception? In which case maybe the call should be |
If we agree that a supervised pipeline with |
Maybe we can have it accept a tuple for the |
After thinking about this a bit more, I think that a good thing might just be to make the call: try:
return func(X, y)
except TypeError:
return func(X) I would say that using a Also, to address the |
I understand your argument. But this also adds more of a maintenance burden. Also, it might not be that simple to do Simply not returning What's the most common use case here? I think it's when So my preferences would be, in order (and with a pretty big gap between 2 and 3):
As for kwargs vs partial, I still vote for partial. Grid search would still look alright with:
And we wouldn't have to reimplement stdlib functionality. |
|
+1 on what @jnothman said. inverse_transform would be great, and an argument to pass y would be, too. I don't think the "searching over func" breaks that. I am not entirely certain about preprocessing vs pipeline module. This only really makes sense when using a pipeline, which I think is a good argument. |
@llllllllll Sorry if this is taking more of your time than you anticipated. |
Jumping into the conversation, as I have written myself several times such as class for personal use. However, I often find myself needing to apply a function element-wise rather than on the full X. In this setting, I dont know if proposing a shortcut for vectorizing a user-defined function would be something to consider, e.g. as a flag? (using numpy.vectorize internally, this is easy) I fear this kind of use case might pop up sooner than later once such a transformer would be shipped in scikit-learn. Just my 2 cents -- I dont want this make this longer than it should :) |
Can you give examples of element-wise functions that aren't compositions of On 4 June 2015 at 05:12, Gilles Louppe [email protected] wrote:
|
@amueller I prefer |
Just so everyone knows, I have not forgotten about this PR; however, I have been busy with work. I will address the comments made sometime this weekend so that they can go under another round of review. Thank you all for the feedback. |
Thanks @llllllllll, your contribution is much appreciated :) |
Makes `pass_y` an argument to FunctionTransformer to indicate that the labels should be passed to the wrapped function.
|
||
validate : bool, optional default=True | ||
Indicate that the input X array should be checked before calling | ||
func. If validate is false, there will be no input validation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this will ensure the input is a non-empty, 2-dimensional array (or sparse matrix) of finite numbers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DOC expand FunctionTransform docstring
I think we can keep inverses out of this PR. I'm ok with this living in Merge conflict, please rebase or merge in master. |
Merged via #5059. Thanks everybody, in particular @llllllllll for his contribution :) |
As Olivier would say, 🍻 |
HurraH! On 4 August 2015 at 06:13, Lars [email protected] wrote:
|
CallableTransformer allows a user to convert a standard python callable
into a transformer for use in a Pipeline.
Addresses: #3560