Closed
Description
In the OpenML connector, we use the classes_ attribute of an estimator to map a predicted label to a label in the database.
Now there are some special cases:
- Normal estimators work fine, obviously
- In the pipeline, (which is apparently not an estimator itself, but presumably has an estimator as its last step) the classes_ attribute of the last step is automatically retrieved by sklearn.
- In BaseSearchCV, this is attribute absent. In the OpenML connector we work around this by picking the classes_ attribute from the best_estimator attribute, as this one is used to make predictions anyway.
- Now when we combine these (we have a pipeline with BaseSearchCV as last step) sklearn throws an error:
from sklearn.tree import DecisionTreeClassifier
from sklearn.preprocessing.imputation import Imputer
from sklearn.feature_selection import VarianceThreshold
from sklearn.model_selection import RandomizedSearchCV
from sklearn.pipeline import Pipeline
from sklearn.datasets import load_iris
p = Pipeline(steps=[('Imputer', Imputer(strategy='median')),
('VarianceThreshold', VarianceThreshold()),
('Estimator', RandomizedSearchCV(DecisionTreeClassifier(),
{'min_samples_split': [2 ** x for x in range(1, 7 + 1)],
'min_samples_leaf': [2 ** x for x in range(0, 6 + 1)]},
cv=3, n_iter=10))])
data = load_iris()
p.fit(data.data, data.target)
print(p.classes_)
Now I know that pipeline officially does not have a classes_ attribute, but it is kind of strange that it sometimes works (normal estimator), and sometimes doesn't (BaseSearchCV).
Would it be good to add this classes_ attribute to the BaseSearchCV definition?
Metadata
Metadata
Assignees
Labels
No labels