Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Return 3D array instead of list of arrays for multioutput .predict_proba() #30013

@j-adamczyk

Description

@j-adamczyk

Describe the workflow you want to enable

Currently, using .predict_proba() for multioutput predictions returns a list of np.ndarray, consisting of 2D arrays of shape (n_samples, 2), with probabilities of class 0 and 1. This is quite surprising, since in all other cases pure np.ndarray is returned. This also makes reshaping the output inconvenient, requiring a call to np.array().

For example, to get positive class probabilities, e.g. to compute multioutput AUROC, I have to do (typing for clarity):

preds: list[np.ndarray] = clf.predict_proba(X_test)
preds: np.ndarray = np.array(preds)
y_score = preds[:, :, 1].T

Only then the resulting y_score has shape (n_samples, n_tasks), with predicted class 1 probability in columns.

Describe your proposed solution

Return np.ndarray instead of list of arrays in the multioutput case. Just calling np.array() internally would be enough.

Describe alternatives you've considered, if relevant

It could also be nice to include a utility function to extract positive class probabilities. y_score = preds[:, :, 1].T is quite non-obvious transformation, while also being necessary in practice to compute column-wise metrics based on probability.

Additional context

No response

EDIT:

I also found another bug caused by the current implementation. In grid search CV, when using any multioutput prediction, this line will error:

if target_type == "binary" and y_pred.shape[1] < 2:
. Error:

Traceback (most recent call last):
  File "/home/jakub/.cache/pypoetry/virtualenvs/scikit-fingerprints-VjWItXgH-py3.9/lib/python3.9/site-packages/sklearn/model_selection/_validation.py", line 971, in _score
    scores = scorer(estimator, X_test, y_test, **score_params)
  File "/home/jakub/.cache/pypoetry/virtualenvs/scikit-fingerprints-VjWItXgH-py3.9/lib/python3.9/site-packages/sklearn/metrics/_scorer.py", line 279, in __call__
    return self._score(partial(_cached_call, None), estimator, X, y_true, **_kwargs)
  File "/home/jakub/.cache/pypoetry/virtualenvs/scikit-fingerprints-VjWItXgH-py3.9/lib/python3.9/site-packages/sklearn/metrics/_scorer.py", line 371, in _score
    y_pred = method_caller(
  File "/home/jakub/.cache/pypoetry/virtualenvs/scikit-fingerprints-VjWItXgH-py3.9/lib/python3.9/site-packages/sklearn/metrics/_scorer.py", line 89, in _cached_call
    result, _ = _get_response_values(
  File "/home/jakub/.cache/pypoetry/virtualenvs/scikit-fingerprints-VjWItXgH-py3.9/lib/python3.9/site-packages/sklearn/utils/_response.py", line 214, in _get_response_values
    y_pred = _process_predict_proba(
  File "/home/jakub/.cache/pypoetry/virtualenvs/scikit-fingerprints-VjWItXgH-py3.9/lib/python3.9/site-packages/sklearn/utils/_response.py", line 49, in _process_predict_proba
    if target_type == "binary" and y_pred.shape[1] < 2:
AttributeError: 'list' object has no attribute 'shape'

So this is not only a feature request, but also a bugfix.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions