-
-
Notifications
You must be signed in to change notification settings - Fork 26.5k
Description
Describe the workflow you want to enable
Currently, using .predict_proba() for multioutput predictions returns a list of np.ndarray, consisting of 2D arrays of shape (n_samples, 2), with probabilities of class 0 and 1. This is quite surprising, since in all other cases pure np.ndarray is returned. This also makes reshaping the output inconvenient, requiring a call to np.array().
For example, to get positive class probabilities, e.g. to compute multioutput AUROC, I have to do (typing for clarity):
preds: list[np.ndarray] = clf.predict_proba(X_test)
preds: np.ndarray = np.array(preds)
y_score = preds[:, :, 1].T
Only then the resulting y_score has shape (n_samples, n_tasks), with predicted class 1 probability in columns.
Describe your proposed solution
Return np.ndarray instead of list of arrays in the multioutput case. Just calling np.array() internally would be enough.
Describe alternatives you've considered, if relevant
It could also be nice to include a utility function to extract positive class probabilities. y_score = preds[:, :, 1].T is quite non-obvious transformation, while also being necessary in practice to compute column-wise metrics based on probability.
Additional context
No response
EDIT:
I also found another bug caused by the current implementation. In grid search CV, when using any multioutput prediction, this line will error:
scikit-learn/sklearn/utils/_response.py
Line 52 in 545d99e
| if target_type == "binary" and y_pred.shape[1] < 2: |
Traceback (most recent call last):
File "/home/jakub/.cache/pypoetry/virtualenvs/scikit-fingerprints-VjWItXgH-py3.9/lib/python3.9/site-packages/sklearn/model_selection/_validation.py", line 971, in _score
scores = scorer(estimator, X_test, y_test, **score_params)
File "/home/jakub/.cache/pypoetry/virtualenvs/scikit-fingerprints-VjWItXgH-py3.9/lib/python3.9/site-packages/sklearn/metrics/_scorer.py", line 279, in __call__
return self._score(partial(_cached_call, None), estimator, X, y_true, **_kwargs)
File "/home/jakub/.cache/pypoetry/virtualenvs/scikit-fingerprints-VjWItXgH-py3.9/lib/python3.9/site-packages/sklearn/metrics/_scorer.py", line 371, in _score
y_pred = method_caller(
File "/home/jakub/.cache/pypoetry/virtualenvs/scikit-fingerprints-VjWItXgH-py3.9/lib/python3.9/site-packages/sklearn/metrics/_scorer.py", line 89, in _cached_call
result, _ = _get_response_values(
File "/home/jakub/.cache/pypoetry/virtualenvs/scikit-fingerprints-VjWItXgH-py3.9/lib/python3.9/site-packages/sklearn/utils/_response.py", line 214, in _get_response_values
y_pred = _process_predict_proba(
File "/home/jakub/.cache/pypoetry/virtualenvs/scikit-fingerprints-VjWItXgH-py3.9/lib/python3.9/site-packages/sklearn/utils/_response.py", line 49, in _process_predict_proba
if target_type == "binary" and y_pred.shape[1] < 2:
AttributeError: 'list' object has no attribute 'shape'
So this is not only a feature request, but also a bugfix.