Assymetry of roc_auc_score

#### Description
For binary tasks with wrapped down custom scoring functions by the `metrics.make_scorerer`-scheme `roc_auc_score` is behaving unexpectedly. When requiring the probability output from a binary classifier, which is a shape( n , 2) object, while the training/testing lables are an expected shape (n, ) input, A scoring will fail.
However, the binary task and internal handling of different y shapes is incidentially correctly understood by `metrics.log_loss` by internal evaluations, but `roc_auc_score` currently fails at this. This especially cumbersome, if the scoring function is wrapped down in a `cross_val_score` and a `make_scorer` much deeper in the code with possible nested pipelines etc, where the automatic correct evaluation of this particular metric is required.

#### Steps/Code to Reproduce
This should illustrate what is failing
```python
from sklearn.metrics import roc_auc_score, log_loss
y_true0 = np.array([False, False, True, True])
y_true1 = ~y_true0
y_true = np.matrix([y_true0, y_true1]).T

y_proba0 = np.array([0.1, 0.4, 0.35, 0.8]) #predict_proba component [:,0]
y_proba1 = 1 - y_proba0 #predict_proba component [:,1]
y_proba = np.matrix([y_proba0, y_proba1]).T #as obtained by classifier.predict_proba()

log_loss(y_true1, y_proba1) # compute for positive class component >>> OK
log_loss(y_true, y_proba) # compute for all class >>> OK
log_loss(y_true1, y_proba) # compute for mixed component >>> OK

roc_auc_score(y_true1, y_proba1) # compute for positive class component >>> OK
roc_auc_score(y_true, y_proba) # compute for all class >>> OK
roc_auc_score(y_true1, y_proba) # compute for mixed component >>> FAIL: bad input shape (4, 2)

#last above line is source of error in this snippet: of binary classification and scoring task
from sklearn.datasets import make_hastie_10_2
from sklearn.ensemble import GradientBoostingClassifier

X, y = make_hastie_10_2(random_state=0)
X_train, X_test = X[:2000], X[2000:]
y_train, y_test = y[:2000], y[2000:]

clf = GradientBoostingClassifier(n_estimators=100, learning_rate=1.0,
    max_depth=1, random_state=0).fit(X_train, y_train)

from sklearn.metrics import make_scorer, roc_auc_score
make_scorer(roc_auc_score, needs_proba=True)(clf, X_test, y_test) # >>> FAIL: bad input shape (1000, 2)
#compare
make_scorer(log_loss, greater_is_better=True, needs_proba=True)(clf, X_test, y_test) # >>> OK
```

#### Expected Results
`roc_auc_score` should behave in a similar way as `log_loss`, guessing the binary classification task and handle different shape input correctly


#### Versions
Windows-10-10.0.15063-SP0
Python 3.6.1 |Anaconda 4.4.0 (64-bit)| (default, May 11 2017, 13:25:24) [MSC v.1900 64 bit (AMD64)]
NumPy 1.13.3
SciPy 0.19.1
Scikit-Learn 0.19.1


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Assymetry of roc_auc_score #10247

Description

Steps/Code to Reproduce

Expected Results

Versions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Assymetry of roc_auc_score #10247

Description

Description

Steps/Code to Reproduce

Expected Results

Versions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions