Description
I need to make a VotingRegressor ensemble with some estimators that accept sample weights during fitting and some that don't. Currently, mixed ensembles raise an exception:
from sklearn.datasets import make_regression
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestRegressor, VotingRegressor
from sklearn.neighbors import KNeighborsRegressor
X, y = make_regression()
weights = abs(y)
rgr = VotingRegressor(estimators=[('LR', LinearRegression()),
('KNN', KNeighborsRegressor()),
('XGBR', RandomForestRegressor())])
rgr.fit(X, y, sample_weight=weights)
result:
TypeError: Underlying estimator KNeighborsRegressor does not support sample weights.
A possible solution would be to have the ensemble class (e.g. VotingRegressor, VotingClassifier, StackingRegressor, StackingClassifier) read the fit()
signatures of the estimators in the ensemble and not pass sample_weight to estimators that don't accept sample_weight. Or more realistically, catch exceptions caused by calls to fit()
with the sample_weight parameter and then default to calling fit()
without this parameter. This behavior could be default, or activated by flag like "enable_mixed_sample_weight" in the ensemble class's __init__
method. If it's important to notify the user when an estimator doesn't accept the sample_weight parameter, notification and the exception currently in place could be enabled with a flag like "enforce_sample_weight".
As a workaround I'm using the Ensemble
class from the pipecaster library (https://github.com/ajcallegari/pipecaster) which allows mixed ensembles by catching exceptions caused by fit()
and then defaulting to a fit()
call without the sample_weight parameter. This Ensemble
class has the scikit-learn interface and supports classification, regression, voting, and model stacking.