CalibratedClassifierCV doesn't interact properly with Pipeline estimators 

Hi, 

I'm trying to use CalibratedClassifierCV to calibrate the probabilities from a Gradient Boosted Tree model. The GBM is wrapped in a Pipeline estimator, where the initial stages of the Pipeline convert categoricals (using DictVectorizer) prior to the GBM being fit. The issue is that when I try to similarly use CalibratedClassifierCV, with a prefit estimator, it fails when I pass in the data. Here's a small example: 

```py
from sklearn.ensemble import RandomForestClassifier
from sklearn.feature_extraction import DictVectorizer
from sklearn.calibration import CalibratedClassifierCV, _CalibratedClassifier
from sklearn.pipeline import Pipeline

fake_features = [
    {'state':'NY','age':'adult'},
    {'state':'TX','age':'adult'},
    {'state':'VT','age':'child'}
]

labels = [1,0,1]

pipeline = Pipeline([
            ('vectorizer',DictVectorizer()),
            ('clf',RandomForestClassifier())
    ])

pipeline.fit(fake_features, labels)

clf_isotonic = CalibratedClassifierCV(base_estimator=pipeline, cv='prefit', method='isotonic')
clf_isotonic.fit(fake_features, labels)

```

When running that, I get the following error on the last line:
```
TypeError: float() argument must be a string or a number, not 'dict'
```


On the other hand, if I replace the last two lines with the following, things work fine: 
```
clf_isotonic = _CalibratedClassifier(base_estimator=pipeline, method='isotonic')
clf_isotonic.fit(fake_features, labels)
```


It seems that CalibratedClassifierCV checks to see if the X data is valid prior to invoking anything about the base estimator (https://github.com/scikit-learn/scikit-learn/blob/14031f6/sklearn/calibration.py#L126). In my case, this logic seems slightly off since I'm using the pipeline to convert the data into the proper form before feeding it into estimator. 

On the other hand, _CalibratedClassifier doesn't make this check first, so the code works (i.e. the data is fed into the pipeline, the model is fit, and then probabilities are calibrated appropriately). 

My use case (which is not reflected in the example) is to use the initial stages of the pipeline to select columns from a dataframe, encode the categoricals, and then fit the model. I then pickle the fitted pipeline (after using GridSearchCV to select hyperparameters). Later on, I can load the model and use it to predict on new data, while abstracting away from what needs to be transformed in the raw data. I now want to calibrate the model after fitting it but ran into this problem. 



For reference, here's all my system info: 
```
Linux-3.10.0-514.2.2.el7.x86_64-x86_64-with-redhat-7.3-Maipo
Python 3.6.0 |Continuum Analytics, Inc.| (default, Dec 23 2016, 12:22:00) 
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)]
NumPy 1.12.0
SciPy 0.18.1
Scikit-Learn 0.18.1
```

Thanks for reading (and for all of your hard work on scikit-learn!). 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

CalibratedClassifierCV doesn't interact properly with Pipeline estimators #8710

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

CalibratedClassifierCV doesn't interact properly with Pipeline estimators #8710

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions