Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[MRG+2] Added sample weight support to confusion matrix. #4001

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 14 additions & 2 deletions sklearn/metrics/classification.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
# Noel Dawe <[email protected]>
# Jatin Shah <[email protected]>
# Saurabh Jha <[email protected]>
# Bernardo Stein <[email protected]>
# License: BSD 3 clause

from __future__ import division
Expand Down Expand Up @@ -178,7 +179,7 @@ def accuracy_score(y_true, y_pred, normalize=True, sample_weight=None):
return _weighted_sum(score, sample_weight, normalize)


def confusion_matrix(y_true, y_pred, labels=None):
def confusion_matrix(y_true, y_pred, labels=None, sample_weight=None):
"""Compute confusion matrix to evaluate the accuracy of a classification

By definition a confusion matrix :math:`C` is such that :math:`C_{i, j}`
Expand All @@ -201,6 +202,8 @@ def confusion_matrix(y_true, y_pred, labels=None):
If none is given, those that appear at least once
in ``y_true`` or ``y_pred`` are used in sorted order.

sample_weight : array-like of shape = [n_samples], optional
Sample weights.

Returns
-------
Expand Down Expand Up @@ -239,6 +242,13 @@ def confusion_matrix(y_true, y_pred, labels=None):
else:
labels = np.asarray(labels)

if sample_weight is None:
sample_weight = np.ones(y_true.shape[0], dtype=np.int)
else:
sample_weight = np.asarray(sample_weight)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: can you add a check here, to see if sample_weight is the same size as y_true and y_pred using check_consistent_length?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MechCoder sure, seems like a good addition. I added the check after the highlighted code just to make sure that in the future, if some part of the previous code changes, the check still catches any future problems.

It would be good to have some tests against this behavior, but I'm a little short on time in the following weeks, so it's probably better to merge this and later I'll add another PR to improve the tests. What do you think?

check_consistent_length(sample_weight, y_true, y_pred)

n_labels = labels.size
label_to_ind = dict((y, x) for x, y in enumerate(labels))
# convert yt, yp into index
Expand All @@ -249,8 +259,10 @@ def confusion_matrix(y_true, y_pred, labels=None):
ind = np.logical_and(y_pred < n_labels, y_true < n_labels)
y_pred = y_pred[ind]
y_true = y_true[ind]
# also eliminate weights of eliminated items
sample_weight = sample_weight[ind]

CM = coo_matrix((np.ones(y_true.shape[0], dtype=np.int), (y_true, y_pred)),
CM = coo_matrix((sample_weight, (y_true, y_pred)),
shape=(n_labels, n_labels)
).toarray()

Expand Down
15 changes: 15 additions & 0 deletions sklearn/metrics/tests/test_classification.py
Original file line number Diff line number Diff line change
Expand Up @@ -511,6 +511,21 @@ def test(y_true, y_pred, string_type=False):
string_type=True)


def test_confusion_matrix_sample_weight():
"""Test confusion matrix - case with sample_weight"""
y_true, y_pred, _ = make_prediction(binary=False)

weights = [.1] * 25 + [.2] * 25 + [.3] * 25

cm = confusion_matrix(y_true, y_pred, sample_weight=weights)

true_cm = (.1 * confusion_matrix(y_true[:25], y_pred[:25]) +
.2 * confusion_matrix(y_true[25:50], y_pred[25:50]) +
.3 * confusion_matrix(y_true[50:], y_pred[50:]))

assert_array_almost_equal(cm, true_cm)


def test_confusion_matrix_multiclass_subset_labels():
# Test confusion matrix - multi-class case with subset of labels
y_true, y_pred, _ = make_prediction(binary=False)
Expand Down
6 changes: 5 additions & 1 deletion sklearn/metrics/tests/test_common.py
Original file line number Diff line number Diff line change
Expand Up @@ -360,7 +360,11 @@
# No Sample weight support
METRICS_WITHOUT_SAMPLE_WEIGHT = [
"cohen_kappa_score",
"confusion_matrix",
"confusion_matrix", # Left this one here because the tests in this file do
# not work for confusion_matrix, as its output is a
# matrix instead of a number. Testing of
# confusion_matrix with sample_weight is in
# test_classification.py
"median_absolute_error",
]

Expand Down