Thanks to visit codestin.com
Credit goes to github.com

Skip to content

precision_recall_curve - assumed limits can be misleading #4223

Closed
@trevorstephens

Description

@trevorstephens

Referencing "The last precision and recall values are 1. and 0. respectively and do not have a corresponding threshold. This ensures that the graph starts on the x axis."

Say you have a tie in your highest predicted probability output from the classifier with some false positives and some true positives, for example:

y_true = [ 1,  0,  0,  1,  0,  1,  0,  0,  0,  1,  0]
y_pred = [.9, .9, .9, .8, .8, .7, .7, .6, .5, .4, .3]
precision, recall, thresholds = precision_recall_curve(y_true, y_pred)

Yields:

Pricision: [0.400, 0.333, 0.375, 0.428, 0.400, 0.333, 1.000]
Recall:    [1.000, 0.750, 0.750, 0.750, 0.500, 0.250, 0.000]

download 1

I understand that recall goes to zero in the limit, but precision might not be so clear-cut. A really difficult problem where your top prediction is a false positive would have precision go to zero in the limit I believe. For plotting with this function, this case is probably not a big deal a vertical line from 0 to 1 would be hidden by the y-axis.

But for the drawn top-preds, as seen above, the plot becomes misleading and makes the viewer think that at least their top prediction was true positive. This may seem like a corner case, but I ran into it on a tough classification problem I was working on and was a little baffled by the output until I checked the code out.

A quick fix to preserve the intention of a clean P-R plot might be to draw a horizontal line from the highest threshold to the y-axis, this should not alter the output of most cases like here. Trying to actually calculate where it's going in the limit might be a bit overkill :-)

I would also think that adding a 1 to the end of the thresholds vector would be helpful when plotting both precision and recall against the probabilities.

I'm happy to open a PR if anyone thinks this is worth addressing.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions