Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Ability to cache FeatureUnion transformers #9008

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
jnothman opened this issue Jun 6, 2017 · 10 comments
Open

Ability to cache FeatureUnion transformers #9008

jnothman opened this issue Jun 6, 2017 · 10 comments

Comments

@jnothman
Copy link
Member

jnothman commented Jun 6, 2017

It seems reasonable to support a memory parameter to FeatureUnion like was recently added to Pipeline (#7990). It is valuable in the sense that parameters in some constituent transformers can be searched over while others are unchanged; those that are unchanged should not need to be re-fit from scratch.

@jnothman jnothman changed the title cache FeatureUnion transformers Ability to cache FeatureUnion transformers Jun 6, 2017
@lsorber
Copy link

lsorber commented Jun 6, 2017

Couldn't this effect be obtained by wrapping the FeatureUnion transformers in cached Pipelines? That is, assuming the full Pipeline would be cached as suggested in #9007.

@jnothman
Copy link
Member Author

jnothman commented Jun 6, 2017 via email

@GaelVaroquaux
Copy link
Member

GaelVaroquaux commented Jun 6, 2017 via email

@GaelVaroquaux
Copy link
Member

GaelVaroquaux commented Jun 6, 2017 via email

@caioaao
Copy link
Contributor

caioaao commented Jun 7, 2017

can this be assigned to me? I'm really interested in this as it should be very useful with cases described in #8960

@jnothman
Copy link
Member Author

jnothman commented Jun 7, 2017

We can't use github assignment: it only allows assignment to team members. But as far as I'm concerned, you're welcome to contribute a patch.

@psinger
Copy link

psinger commented Jul 18, 2018

Has there be any update on this? It seems to me that FeatureUnion is not cached at all withing a Pipeline.

@jnothman
Copy link
Member Author

jnothman commented Jul 19, 2018 via email

@psinger
Copy link

psinger commented Jul 20, 2018

@jnothman At the beginning I was only doing for test purposes a single FeatureUnion within a pipeline and this did not get cached. Apparently, more than one step need to be done in the pipeline, even if the FeatureUnion consists of multiple steps.

Anyways, it was more of a gut feeling after following the discussion in this thread. I have some FeatureUnion operations including BOW Vectorizers inside and can't see any speed improvements with consecutive executions after using cache. I think the main reason is that, if I am correct, transforms are not cached, rather only fits. And I am not 100% sure if it works properly for FeatureUnion.

By and large, I don't have clear tests on that and thus, I will get back to this thread when I have some more insights into the topic.

@nxorable
Copy link
Contributor

This enhancement applies also to ColumnTransformer as well based on my experience.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants