You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As there was recently a request on adding an equivalent of VotingClassifiers for regressors (issue #7555), it might be useful to have a look at the way the API for such estimators can be done consistently.
Just the way it is possible to assemble multiple estimators in series using a pipeline, it should be possible to interact consistently with multiple estimators assembled in parallel (with optionally some reduction transformation applied to the output e.g. voting).
The names currently do not reflect at all that these perform similar base functionality (even if they are used in different contexts). Also it might be useful to have some common base class, for instance, _BaseUnion that would handle at least parameter/estimator setting/getting in a consistent manner and everything else that could be factorized. For estimators that support it, the way cross-validation is handled should also be defined (e.g. should we provide a grid search for every estimator, or should such _BaseUnion objects accept a cv parameter). This might affect #7136, #7288, #7484, #7230, and I'm not sure if this would be in conflict with issue #2034.
Backward compatibility would be a major issue though.
Then there is a whole thing with Stacking Classifiers/Regressors (#4816). It this issue, it was said that "VotingClassifier is just special case of stacking", and because there was no constraints on the implementation that lead to re-implementing the parameter setting/getting for parallel estimators in PR #6674 continued in #7427. From the point of view of modularity it might be better to see stacking as a pipeline of _BaseUnion with some voting estimator IMO.
Not sure how to address all of it, but I think this should be considered, before more such meta-estimators get merged into scikit learn and then everything will be constrained by backward compatibility.
The text was updated successfully, but these errors were encountered:
FeatureUnion has bad parameter names, IMO. I'm not sure that's enough to motivate changing them and breaking people's code, though. I don't object to shared base classes: we've just introduced _BasePipeline. However, we tend to avoid refactoring when clarity is at stake.
I'll have to take another look at this when I'm more awake.
In this context also MultiOutputRegressor and MultiOutputClassifier can be considered, as they also have parallel sub-estimators, even though they are not meta-estimators.
As there was recently a request on adding an equivalent of VotingClassifiers for regressors (issue #7555), it might be useful to have a look at the way the API for such estimators can be done consistently.
Just the way it is possible to assemble multiple estimators in series using a pipeline, it should be possible to interact consistently with multiple estimators assembled in parallel (with optionally some reduction transformation applied to the output e.g. voting).
Currently scikit learn has,
FeatureUnion
transformer_list, n_jobs=1, transformer_weights=None
_BasePipeline
,TransformerMixin
self.transformer_list
withtosequence
(as a list)VotingClassifier
transform
withvoting == 'soft'
, when it doesn'testimators, voting='hard', weights=None, n_jobs=1
BaseEstimator
,ClassifierMixin
,TransformerMixin
self.named_estimators
as a dict"CommitteeRegressor"
The names currently do not reflect at all that these perform similar base functionality (even if they are used in different contexts). Also it might be useful to have some common base class, for instance,
_BaseUnion
that would handle at least parameter/estimator setting/getting in a consistent manner and everything else that could be factorized. For estimators that support it, the way cross-validation is handled should also be defined (e.g. should we provide a grid search for every estimator, or should such_BaseUnion
objects accept acv
parameter). This might affect #7136, #7288, #7484, #7230, and I'm not sure if this would be in conflict with issue #2034.Backward compatibility would be a major issue though.
Then there is a whole thing with Stacking Classifiers/Regressors (#4816). It this issue, it was said that "VotingClassifier is just special case of stacking", and because there was no constraints on the implementation that lead to re-implementing the parameter setting/getting for parallel estimators in PR #6674 continued in #7427. From the point of view of modularity it might be better to see stacking as a pipeline of _BaseUnion with some voting estimator IMO.
Not sure how to address all of it, but I think this should be considered, before more such meta-estimators get merged into scikit learn and then everything will be constrained by backward compatibility.
The text was updated successfully, but these errors were encountered: