Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[RFC] Standardize parallel meta-estimators #7570

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
rth opened this issue Oct 4, 2016 · 3 comments
Closed

[RFC] Standardize parallel meta-estimators #7570

rth opened this issue Oct 4, 2016 · 3 comments

Comments

@rth
Copy link
Member

rth commented Oct 4, 2016

As there was recently a request on adding an equivalent of VotingClassifiers for regressors (issue #7555), it might be useful to have a look at the way the API for such estimators can be done consistently.

Just the way it is possible to assemble multiple estimators in series using a pipeline, it should be possible to interact consistently with multiple estimators assembled in parallel (with optionally some reduction transformation applied to the output e.g. voting).

Currently scikit learn has,

FeatureUnion

VotingClassifier

  • parallel classifiers. In addition does reduction of the results by voting, except for transform with voting == 'soft', when it doesn't
  • init defintion: estimators, voting='hard', weights=None, n_jobs=1
  • inherits from BaseEstimator, ClassifierMixin, TransformerMixin
  • stores estimators in self.named_estimators as a dict

"CommitteeRegressor"

The names currently do not reflect at all that these perform similar base functionality (even if they are used in different contexts). Also it might be useful to have some common base class, for instance, _BaseUnion that would handle at least parameter/estimator setting/getting in a consistent manner and everything else that could be factorized. For estimators that support it, the way cross-validation is handled should also be defined (e.g. should we provide a grid search for every estimator, or should such _BaseUnion objects accept a cv parameter). This might affect #7136, #7288, #7484, #7230, and I'm not sure if this would be in conflict with issue #2034.

Backward compatibility would be a major issue though.

Then there is a whole thing with Stacking Classifiers/Regressors (#4816). It this issue, it was said that "VotingClassifier is just special case of stacking", and because there was no constraints on the implementation that lead to re-implementing the parameter setting/getting for parallel estimators in PR #6674 continued in #7427. From the point of view of modularity it might be better to see stacking as a pipeline of _BaseUnion with some voting estimator IMO.

Not sure how to address all of it, but I think this should be considered, before more such meta-estimators get merged into scikit learn and then everything will be constrained by backward compatibility.

@jnothman
Copy link
Member

jnothman commented Oct 5, 2016

FeatureUnion has bad parameter names, IMO. I'm not sure that's enough to motivate changing them and breaking people's code, though. I don't object to shared base classes: we've just introduced _BasePipeline. However, we tend to avoid refactoring when clarity is at stake.

I'll have to take another look at this when I'm more awake.

@jstriebel
Copy link
Contributor

In this context also MultiOutputRegressor and MultiOutputClassifier can be considered, as they also have parallel sub-estimators, even though they are not meta-estimators.

@rth
Copy link
Member Author

rth commented Jun 6, 2017

Closing this issue as outdated.

@rth rth closed this as completed Jun 6, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants