metrics.confusion_matrix far too slow for Boolean cases #15388

GregoryMorse · 2019-10-28T23:31:26Z

Description

When using metrics.confusion_matrix with np.bool_ cases (e.g. with only True/False values), it is far more fast to not use list comprehensions as the current code does. numpy has sum and Boolean logic functions to deal with this very efficiently to scale far better.
Code found in: https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/metrics/_classification.py which uses list comprehensions and never checks for np.bool_ types trying to avoid all the excessive code. This is a very common and reasonable use case in practice. Assuming normalization and no sample weights though those also can be efficiently dealt with.

Steps/Code to Reproduce

import numpy as np

N = 4096
p = 0.5
a = np.random.choice(a=[False, True], size=(N), p=[p, 1-p])
b = np.random.choice(a=[False, True], size=(N), p=[p, 1-p])
from sklearn.metrics import confusion_matrix
for i in range(1024): confusion_matrix(a, b)

Expected Results

Fast execution time.
e.g. substituting confusion_matrix with this conf_mat (not efficient as possible but easier to read and more efficient than current library even with 4 sums, 4 logical ANDs and 4 logical NOTs:

        def conf_mat(x, y):
            return np.array([[np.sum(~x & ~y), np.sum(~x & y)], #true negatives, false positives
                             [np.sum(x & ~y), np.sum(x & y)]]) #false negatives, true positives

Or even faster with 1 logical AND with 3 sums:

        def conf_mat(x, y):
            truepos, totalpos, totaltrue = np.sum(x & y), np.sum(y), np.sum(x)
            totalfalse = len(x) - totaltrue
            return np.array([[totalfalse - falseneg, totalpos - truepos], #true negatives, false positives
                             [totaltrue - truepos, truepos]]) #false negatives, true positives

Actual Results

Slow execution time around 60 times slower than efficient code. The np.bool_ case could be identified and efficient code applied otherwise in serious cases of scale, the current code is too slow to be practically usable.

Versions

All - including 0.21

The text was updated successfully, but these errors were encountered:

rth · 2019-10-29T06:57:47Z

Thanks for the report! What data size are you using? With n_samples=4096 from the example, it takes 2ms on my laptop per iteration. With n_samples = 10_000_000 it takes 6s per iterations, which is still not that slow.

We can make a specialized faster solution for boolean, but even better would be to improve the current generic one. I have quickly looked at the code and couldn't see obvious bottlenecks, list comprehension is not necessarily it, though maybe we can do the same things with numpy functions. Profiling the code would be a good start. Would you be interested in investigating?

GregoryMorse · 2019-10-30T04:11:54Z

I was using it with batches of about 4096 but doing it a great many times as part of a modeling algorithm for generic detection of Boolean formulas (sort of a variant of a decision tree classifier). 10000000 is quite different than 1000, 10000 times also as the bottleneck can be increased/decreased based on repeated calls due to the sequence of code in there. I realized whenever I Ctrl+C'ed it was always sitting in confusion_matrix, a sort of poor man's performance bottleneck finder while debugging - always in a list comprehension. Seriously empirical data :). Anyway, I swapped it out with the code mentioned, and it dramatically improved the performance. We could compare but without absolutely no doubt the code using numpy is faster.

Of course generic functions are bound to be slower than specifically optimized variants. But since the data type is inspected, and the function is doing too much it seems, for this simple case which it does have the ability to easily detect, I raised the issue. I feel sending proper Boolean values is a primary use case so was not worth overlooking it.

I do not see where the work is being done in the function, in fact, after studying the code. I should have paid more attention where that Ctrl+C was landing.

A couple questions: What is the typical easy and efficient approach to go about performance profiling code in Python - beyond simplistic time measurements line by line, are there any tool suggestions? Also, would modifying the code to scikit-learn require virtual environments to not interfere with primary development?

GregoryMorse · 2019-10-30T07:02:39Z

Update: 1 logical AND with 3 sums version edited into post above as its even more efficient, also the execution time difference now known exactly.

from sklearn.metrics import confusion_matrix
import numpy as np
import datetime
def conf_mat(x, y):
    totaltrue = np.sum(x)
    return conf_mat_opt(x, y, totaltrue, len(x) - totaltrue)

def conf_mat_opt(x, y, totaltrue, totalfalse):
    truepos, totalpos = np.sum(x & y), np.sum(y)
    falsepos = totalpos - truepos
    return np.array([[totalfalse - falsepos, falsepos], #true negatives, false positives
                     [totaltrue - truepos, truepos]]) #false negatives, true positives

def conf_mat_slower(x, y):
    return np.array([[np.sum(~x & ~y), np.sum(~x & y)], #true negatives, false positives
                     [np.sum(x & ~y), np.sum(x & y)]]) #false negatives, true positives

N = 4096
p = 0.5
a = np.random.choice(a=[False, True], size=(N), p=[p, 1-p])
b = np.random.choice(a=[False, True], size=(N), p=[p, 1-p])
t = datetime.datetime.now()
for i in range(1024): _ = confusion_matrix(a, b)

print(int((datetime.datetime.now() - t).total_seconds() * 1000))
t = datetime.datetime.now()
for i in range(1024): _ = conf_mat_slower(a, b)

print(int((datetime.datetime.now() - t).total_seconds() * 1000))

t = datetime.datetime.now()
for i in range(1024): _ = conf_mat(a, b)

print(int((datetime.datetime.now() - t).total_seconds() * 1000))

Output:

2718
52
46

Removing 3 logical ANDs, 4 logical negations and a sum operation hardly makes a difference in numpy.

2718/46=59 times faster...

As for doing large batches:

N = 4096
p = 0.5
a = np.random.choice(a=[False, True], size=(N), p=[p, 1-p])
b = np.random.choice(a=[False, True], size=(N), p=[p, 1-p])
t = datetime.datetime.now()
for i in range(1024): _ = confusion_matrix(a, b)

print(int((datetime.datetime.now() - t).total_seconds() * 1000))

N = 4096*1024
a = np.random.choice(a=[False, True], size=(N), p=[p, 1-p])
b = np.random.choice(a=[False, True], size=(N), p=[p, 1-p])
t = datetime.datetime.now()
_ = confusion_matrix(a, b)

print(int((datetime.datetime.now() - t).total_seconds() * 1000))

I see:

2368
2640

So apparently its faster in small batches than large batches but still many times slower than native numpy vectorized operations as list comprehensions do not have anyway to match such speed.

jnothman · 2019-10-30T07:29:32Z

Also consider, if x and y are Boolean:

confusion = np.bincount(y_true * 2 + y_pred, minlength=4).reshape(2, 2)

GregoryMorse · 2019-10-30T07:29:46Z

Using Ctrl+C with:

for i in range(1024*1024): _ = confusion_matrix(a, b)

  File "C:\Program Files\Python37\lib\site-packages\sklearn\metrics\classification.py", line 274, in confusion_matrix
    y_pred = np.array([label_to_ind.get(x, n_labels + 1) for x in y_pred])

  File "C:\Program Files\Python37\lib\site-packages\sklearn\metrics\classification.py", line 275, in <listcomp>
    y_true = np.array([label_to_ind.get(x, n_labels + 1) for x in y_true])

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Program Files\Python37\lib\site-packages\sklearn\metrics\classification.py", line 258, in confusion_matrix
    labels = unique_labels(y_true, y_pred)
  File "C:\Program Files\Python37\lib\site-packages\sklearn\utils\multiclass.py", line 96, in unique_labels
    ys_labels = set(chain.from_iterable(_unique_labels(y) for y in ys))
  File "C:\Program Files\Python37\lib\site-packages\sklearn\utils\multiclass.py", line 96, in <genexpr>
    ys_labels = set(chain.from_iterable(_unique_labels(y) for y in ys))
  File "C:\Program Files\Python37\lib\site-packages\sklearn\utils\multiclass.py", line 24, in _unique_multiclass
    return np.unique(np.asarray(y))
  File "<__array_function__ internals>", line 6, in unique
  File "C:\Program Files\Python37\lib\site-packages\numpy\lib\arraysetops.py", line 262, in unique
    ret = _unique1d(ar, return_index, return_inverse, return_counts)
  File "C:\Program Files\Python37\lib\site-packages\numpy\lib\arraysetops.py", line 310, in _unique1d
    ar.sort()

So the list comprehension based on this rather silly performance testing (but nevertheless legitimate and has been highly useful in the past in dramatically reducing bottlenecks), is one of the key bottlenecks compared to numpy arrays. Also going through the labels when they are known to be booleans.

jnothman · 2019-10-30T07:30:20Z

Consider using the ipython %timeit to benchmark

GregoryMorse · 2019-10-30T07:32:01Z

Thanks for the advice, will look into ipython. In the meantime:

t = datetime.datetime.now()
for i in range(1024): _ = np.bincount(a * 2 + b, minlength=4).reshape(2, 2)

print(int((datetime.datetime.now() - t).total_seconds() * 1000))

31 so even around 33% faster - thank you very much for that serious improvement!!! A rather ingenious one-liner solution 👍 Obviously the bin count vectorizes better than 3 separate sums though I would think multiplication/addition and logical and should be equivalent - but 3 vector operations is better than 4 not to mention the simple assignment/array construction code. I think enough info is here now to make the PR. I will see about it.

jnothman · 2019-10-30T07:47:49Z

But note that our API entails overhead in each call so we will never reach the efficient of your solution in any case.

jnothman · 2019-10-30T07:49:21Z

Note also that breaking in list comprehension does not mean it is the slowest, only that it might be the slowest *interruptible* process. Many numpy operations can't be interrupted

GregoryMorse · 2019-10-30T07:52:07Z

Yes I realize that a generic solution cannot be as efficient as a special built one. But since the data type is detected at least appeared to be in code, I think this particular case could be handled differently using your optimized code.

Also a good point, so more detailed profiling might be needed. But anything which is not vectorizable and requiring iteration is the most likely candidate for optimization.

I confirmed that Boolean AND vs multiplication have equal speed with numpy. I am quite sure all boolean and basic mathematical algebraic expressions have processor intrinsics for fast operations - though not sure that is how numpy is doing it. So the number of vector operations is the key metric here.

GregoryMorse · 2019-10-30T08:10:43Z

For the curious, this scenario derived from a binary classification problem. I want to take a Boolean matrix to classify data against a Boolean vector. However here is where it gets interesting. Obviously decision trees come to mind immediately. But the goal is not to strictly classify the data as the dataset is far to chaotic to do that (even GridSearchCV with DecisionTreeClassifier at best scores 52% when testing the model - even though naturally you can overfit the whole dataset to 100% easily without testing the model of course).

Rather the goal is to find Boolean combinations (AND/OR) of columns in the matrix, which can predict the positives at > some % say 75%. And another set of combinations to predict the negatives > 75%. Everything else is considered a gray area. It amounts to a 3-label classification but not quite as the labels would not be known in advance. As far as I know, scikit-learn has no model to accomplish such a task. I simply started ANDing together columns to increase the positive rate, then ORing together those to maximize the total true positives. Hence I decided to use the confusion matrix to calculate the positive rate along the way. Probably there is a better way to do this, and it would be highly useful if there is since its painfully slow.

GregoryMorse · 2019-10-31T23:53:04Z

I probably should open another issue since its fundamentally different. But using pure Boolean values is pretty limiting when dealing with very large data due to memory waste. np.array will use 1 byte per Boolean which although better than 4 or 8, is still 8 times waste.

Easy enough to pack the Boolean value np.array as bytes (np.uint8):

def boolarr_tobytes(arr):
    rem = len(arr) % 8
    if rem != 0: arr = np.concatenate((arr, np.zeros(8 - rem, dtype=np.bool_)))
    arr = np.reshape(arr, (int(len(arr) / 8), 8))
    return np.packbits(arr) #translates boolean to bytes if array shape (n, 8) with high bits first

Then & and | provide bitwise operations already. And np.sum can be written as:

bytebitcounts = np.array([bin(x).count("1") for x in range(256)])
def totalbits_bytearr(arr):
    return np.sum(bytebitcounts[arr])

Now I am truly supposing that the table lookup which uses the imaging table translation function is vectorized properly. I would imagine since it is used heavily for image processing that it is. This would be 2 vector operations (np.sum and table lookup) instead of 1 np.sum. PSHUFB (packed bytes shuffle) is the name of the processor intrinsic which can do byte table lookup translation. However, since the AVX/SSE2 and like instructions have data limits, 8 times less vector operations would occur per vector operation. 1 vector operation * 8 vs 2 vector operations is still 4 times faster.

So if scikit-learn would dare to add a whole data type to use packed byte representations of Booleans instead (which might be a major change which would need to be implemented), it would decrease memory by 8 times, increase vector operations by 8 times except where bit twiddling like mentioned is needed where it would depending on the specific operation still tend to be faster.

I can see no reason why this would not be highly desirable for the library especially since large datasets are pretty typical.

I will open a feature request I suppose to continue this discussion.

jnothman · 2019-11-01T00:16:15Z

I don't think you're going to find that kind of optimisation on our roadmap, sorry. Such data representation is not supported in numpy, apart from anything else.

GregoryMorse · 2019-11-01T00:32:50Z

Yes the primary problem would be endless indexing oddities (1 << bitoffset) & value, along with if set: value |= (1 << bitoffset). But a lot of things are already implicitly supported.

I suppose this first belongs over at numpy as a feature request before it moves here :-). Thanks for the advice. Half of the operations are probably trivial like multiplication, addition as shown, and a few would require some real thinking. Upon further thinking, if numpy added the datatype perhaps no work at all is required on this side at all. Had not thought of it this way (now at: numpy/numpy#14821).

It would make these Python libraries as flexibly scalable as C though so it would be impressive.

GregoryMorse · 2019-11-01T02:59:21Z

@jnothman Apparently we can do even better (at least twice better)...in fact milliseconds is almost no longer an appropriate timing mechanism but microseconds might be good:

import numba
@numba.guvectorize([(numba.boolean[:], numba.boolean[:], numba.int64[:], numba.int64[:])], '(n),(n),(p)->(p)', nopython=True)
def fastbincount(x, y, dim, res):
    res[0], res[1], res[2], res[3] = 0, 0, 0, 0
    for i in range(x.shape[0]):
        res[x[i] * 2 + y[i]] += 1

t = datetime.datetime.now()
for i in range(1024): _ = fastbincount(a, b, np.array([0, 0, 0, 0])).reshape(2,2)
print(int((datetime.datetime.now() - t).total_seconds() * 1000))
#9

I cannot get frozen arguments (so (4) does not work though I read it should in numpy 1.16 - numba/numba#1668 but its not implemented still) so have to pass a size placeholder. I use a 2x2 np.array as direct output but since Booleans wont index without at least casting it is basically same speed bincount so best to do the math and a single dimensional vector.

import numba
@numba.guvectorize([(numba.boolean[:], numba.boolean[:], numba.int64[:,:], numba.int64[:,:])], '(n),(n),(p,q)->(p,q)', nopython=True)
def fastbincount(x, y, dim, res):
    res[0,0], res[1,0], res[0,1], res[1,1] = 0, 0, 0, 0
    for i in range(x.shape[0]):
        res[int(x[i])][int(y[i])] += 1
        

t = datetime.datetime.now()
for i in range(1024): _ = fastbincount(a, b, np.array([[0, 0], [0, 0]]))
print(int((datetime.datetime.now() - t).total_seconds() * 1000))
#24

I suppose numba is not used in scikit-learn currently, is there a reason any good reason why not to go this route? E.g. platform support (because its native C apparently going through a complicated LLVM optimization pipelines), extra dependency (though its required by numpy already I think).

I am doubting further optimization is possible. A single multiplication and addition is really hard to beat unless there is some other Boolean comparison trick.

jnothman · 2019-11-01T04:29:27Z

Yes, a cython solution may also be fast, but this is really not the bottle neck in most people's machine learning pipelines

GregoryMorse · 2019-11-03T01:11:21Z

Just for fun, I provide the best solution I could find for packed byte confusion matrices even though I understand such a format is not currently properly supported by numpy where such a change would need to start. First approach is similar to bincount but uses vectorized bit twiddling to achieve it, and the other approach uses 3 vectorized bit sums and a bitwise AND operation. I am thinking to try it again with a 64k byte by byte lookup table to 2 bits + 2 bits + 2 bits + 2 bits though the whole idea of reducing memory consumption starts to get defeated with such strategies. Also that lookup also requires bit twiddling unless making it 256k for bytes instead of 2 bit groupings. Update: so I found the best approach so far is using simple byte to bit component lookup tables requiring only 256*8=2k bytes. It halves the speed of all the twiddling.

import timeit
import numpy as np
import numba
N = 4096
p = 0.5
a = np.random.choice(a=[False, True], size=(N), p=[p, 1-p])
b = np.random.choice(a=[False, True], size=(N), p=[p, 1-p])
placehold = np.array([0, 0, 0, 0], dtype=np.int64)
bitextractor = np.array([[x & 1, (x & 2) >> 1, (x & 4) >> 2, (x & 8) >> 3, (x & 16) >> 4, (x & 32) >> 5, (x & 64) >> 6, (x & 128) >> 7] for x in range(256)], dtype=np.uint8)
@numba.guvectorize([(numba.uint8[:], numba.uint8[:], numba.int64[:], numba.int64[:])], '(n),(n),(p)->(p)', nopython=True)
def fastpackedbincount(x, y, dim, res):
    res[0], res[1], res[2], res[3] = 0, 0, 0, 0
    for i in range(x.shape[0]):
        for j in range(8):
            res[bitextractor[x[i],j] * 2 + bitextractor[y[i],j]] += 1
def conf_mat_packedfast(x, y, sz):
    cm = fastpackedbincount(x, y, placehold).reshape(2, 2)
    cm[0,0] -= (len(x) * 8 - sz)
    return cm
@numba.guvectorize([(numba.uint8[:], numba.uint8[:], numba.int64[:], numba.int64[:])], '(n),(n),(p)->(p)', nopython=True)
def packedbincount(x, y, dim, res):
    res[0], res[1], res[2], res[3] = 0, 0, 0, 0
    for i in range(x.shape[0]):
        res[((x[i] & 1) << 1) + (y[i] & 1)] += 1
        res[(x[i] & 2) + ((y[i] & 2) >> 1)] += 1
        res[((x[i] & 4) >> 1) + ((y[i] & 4) == 4)] += 1
        res[((x[i] & 8) >> 2) + ((y[i] & 8) == 8)] += 1
        res[((x[i] & 0x10) >> 3) + ((y[i] & 0x10) == 0x10)] += 1
        res[((x[i] & 0x20) >> 4) + ((y[i] & 0x20) == 0x20)] += 1
        res[((x[i] & 0x40) >> 5) + ((y[i] & 0x40) == 0x40)] += 1
        res[((x[i] & 0x80) >> 6) + (y[i] >> 7)] += 1
def conf_mat_packed(x, y, sz):
    cm = packedbincount(x, y, placehold).reshape(2, 2)
    cm[0,0] -= (len(x) * 8 - sz)
    return cm
bytebitcounts = np.array([bin(x).count("1") for x in range(256)], dtype=np.uint64)
def totalbits_bytearr(arr):
    return np.sum(bytebitcounts[arr])
@numba.guvectorize([(numba.uint8[:], numba.int64[:])], '(n)->()', nopython=True)
def sum_bits(x, res):
    res[0] = 0
    for xi in x:
        res[0] += bytebitcounts[xi]
@numba.guvectorize([(numba.uint8[:], numba.int64[:])], '(n)->()', nopython=True)
def sum_bits_fast(x, res):
    res[0] = 0
    for xi in x:
        z = (xi & 0x55) + (xi >> 1 & 0x55)
        z = (z & 0x33) + (z >> 2 & 0x33)
        res[0] += (z & 0x0f) + (z >> 4 & 0x0f)
def conf_mat_packedopt(x, y, totaltrue, totalfalse):
    truepos, totalpos = sum_bits(x & y), sum_bits(y)
    falsepos = totalpos - truepos
    return np.array([[totalfalse - falsepos, falsepos], #true negatives, false positives
                            [totaltrue - truepos, truepos]]) #false negatives, true positives
def conf_mat_packedsum(x, y, sz):
    totaltrue = sum_bits(x)
    return conf_mat_packedopt(x, y, totaltrue, sz - totaltrue)
def boolarr_tobytes(arr):
    rem = len(arr) % 8
    if rem != 0: arr = np.concatenate((arr, np.zeros(8 - rem, dtype=np.bool_)))
    arr = np.reshape(arr, (int(len(arr) / 8), 8))
    return np.packbits(arr) #translates boolean to bytes if array shape (n, 8) with high bits first
a, b = boolarr_tobytes(a), boolarr_tobytes(b)
assert(np.array_equal(conf_mat_packed(a, b, N), conf_mat_packedsum(a, b, N)))
assert(np.array_equal(conf_mat_packed(a, b, N), conf_mat_packedfast(a, b, N)))
%timeit _ = conf_mat_packed(a, b, N)
%timeit _ = conf_mat_packedsum(a, b, N)
%timeit _ = conf_mat_packedfast(a, b, N)



assert(sum_bits(a) == totalbits_bytearr(a))
assert(sum_bits_fast(a) == totalbits_bytearr(a))
%timeit _ = sum_bits(a)
%timeit _ = sum_bits_fast(a)
%timeit _ = totalbits_bytearr(a)


7.2 µs ± 117 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
8.61 µs ± 160 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
8.1 µs ± 104 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

1.27 µs ± 8.45 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
1.91 µs ± 15.7 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
9.31 µs ± 141 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

So the excessive bit twiddling basically does not cancel the advantage of bincount and a table lookup. Incidentally, the speed is comparable to the above result for guvectorized bincount.

But the fastest known Pythonic approach now without direct use of C - is using bit lookup tables unless I can find something faster :-).

dyollb · 2021-09-24T08:16:38Z

I am evaluation 3D ML-based segmentation predictions and was looking for a fast confusion matrix implementation. My observation was that sklearn.metrics.confusion_matrix is quite slow. In fact it is slower than loading the data and doing inference with a UNet.

I did a comparison of different ways to compute the confusion matrix:

import numpy as np
from numba import njit, generated_jit, types
from sklearn.metrics import confusion_matrix as sk_confusion_matrix
import pandas
from timeit import default_timer as timer

def compute_confusion_naive(a: np.ndarray, b: np.ndarray, num_classes: int = 0):
    if num_classes < 1:
        num_classes = max(np.max(a), np.max(b)) + 1
    cm = np.zeros((num_classes, num_classes))
    for i in range(a.shape[0]):
        cm[a[i], b[i]] += 1
    return cm


def compute_confusion_zip(a: np.ndarray, b: np.ndarray, num_classes: int = 0):
    if num_classes < 1:
        num_classes = max(np.max(a), np.max(b)) + 1
    cm = np.zeros((num_classes, num_classes))
    for ai, bi in zip(a, b):
        cm[ai, bi] += 1
    return cm


@njit
def compute_confusion_numba(a: np.ndarray, b: np.ndarray, num_classes: int = 0):
    if num_classes < 1:
        num_classes = max(np.max(a), np.max(b)) + 1
    cm = np.zeros((num_classes, num_classes))
    for i in range(a.shape[0]):
        cm[a[i], b[i]] += 1
    return cm


def compute_confusion_sklearn(a: np.ndarray, b: np.ndarray):
    return sk_confusion_matrix(a, b)


def compute_confusion_pandas(a: np.ndarray, b: np.ndarray):
    return pandas.crosstab(pandas.Series(a), pandas.Series(b))


if __name__ == "__main__":
    A = np.random.randint(15, size=310*310*310)
    B = np.random.randint(15, size=310*310*310)

    start = timer()
    cm1 = compute_confusion_naive(A, B)
    end = timer()
    print("Naive: %g s" % (end-start))

    start = timer()
    cm1 = compute_confusion_zip(A, B)
    end = timer()
    print("Naive-Zip: %g s" % (end-start))

    start = timer()
    cm1 = compute_confusion_sklearn(A, B)
    end = timer()
    print("sklearn: %g s" % (end-start))

    start = timer()
    cm1 = compute_confusion_numba(A, B, 0)
    end = timer()
    print("Numba: %g s" % (end-start))

    start = timer()
    cm1 = compute_confusion_pandas(A, B)
    end = timer()
    print("pandas: %g s" % (end-start))

The results are:

Naive: 18.6546 s
Naive-Zip: 17.86 s
sklearn: 18.5911 s
Numba: 0.674944 s
pandas: 5.81173 s

The timing for the numba implementation can be optimized further (by half) if num_classes is known, using dispatch via generated_jit to skip computing the max of a and b.

lucyleeow · 2024-03-07T23:05:05Z

@jeremiedbb do you think we should close this? As you established in #28578, using alternative implementation for binary cases does not seem to be the way improve performance here.
Also #26820 is working on confusion matrix performance.

adrinjalali · 2024-04-23T14:28:32Z

good point, closing as duplicate of #26808

GregoryMorse changed the title ~~metrics.confusion_matrix far too slow for binary cases~~ metrics.confusion_matrix far too slow for Boolean cases Oct 29, 2019

rth added the Performance label Oct 29, 2019

jnothman mentioned this issue Oct 30, 2019

ENH fast path for binary confusion matrix #15403

Closed

GregoryMorse mentioned this issue Nov 1, 2019

numpy lacks memory and speed efficiency for Booleans numpy/numpy#14821

Closed

cmarmo added the module:metrics label Mar 28, 2022

lucyleeow mentioned this issue Mar 5, 2024

ENH Add fast path for binary confusion matrix #28578

Closed

adrinjalali closed this as not planned Won't fix, can't repro, duplicate, stale Apr 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

metrics.confusion_matrix far too slow for Boolean cases #15388

metrics.confusion_matrix far too slow for Boolean cases #15388

GregoryMorse commented Oct 28, 2019 •

edited

Loading

rth commented Oct 29, 2019

GregoryMorse commented Oct 30, 2019 •

edited

Loading

GregoryMorse commented Oct 30, 2019 •

edited

Loading

jnothman commented Oct 30, 2019

GregoryMorse commented Oct 30, 2019 •

edited

Loading

jnothman commented Oct 30, 2019

GregoryMorse commented Oct 30, 2019 •

edited

Loading

jnothman commented Oct 30, 2019 via email

jnothman commented Oct 30, 2019 via email

GregoryMorse commented Oct 30, 2019 •

edited

Loading

GregoryMorse commented Oct 30, 2019 •

edited

Loading

GregoryMorse commented Oct 31, 2019 •

edited

Loading

jnothman commented Nov 1, 2019 via email

GregoryMorse commented Nov 1, 2019 •

edited

Loading

GregoryMorse commented Nov 1, 2019 •

edited

Loading

jnothman commented Nov 1, 2019 via email

GregoryMorse commented Nov 3, 2019 •

edited

Loading

dyollb commented Sep 24, 2021

lucyleeow commented Mar 7, 2024

adrinjalali commented Apr 23, 2024

metrics.confusion_matrix far too slow for Boolean cases #15388

metrics.confusion_matrix far too slow for Boolean cases #15388

Comments

GregoryMorse commented Oct 28, 2019 • edited Loading

Description

Steps/Code to Reproduce

Expected Results

Actual Results

Versions

rth commented Oct 29, 2019

GregoryMorse commented Oct 30, 2019 • edited Loading

GregoryMorse commented Oct 30, 2019 • edited Loading

jnothman commented Oct 30, 2019

GregoryMorse commented Oct 30, 2019 • edited Loading

jnothman commented Oct 30, 2019

GregoryMorse commented Oct 30, 2019 • edited Loading

jnothman commented Oct 30, 2019 via email

jnothman commented Oct 30, 2019 via email

GregoryMorse commented Oct 30, 2019 • edited Loading

GregoryMorse commented Oct 30, 2019 • edited Loading

GregoryMorse commented Oct 31, 2019 • edited Loading

jnothman commented Nov 1, 2019 via email

GregoryMorse commented Nov 1, 2019 • edited Loading

GregoryMorse commented Nov 1, 2019 • edited Loading

jnothman commented Nov 1, 2019 via email

GregoryMorse commented Nov 3, 2019 • edited Loading

dyollb commented Sep 24, 2021

lucyleeow commented Mar 7, 2024

adrinjalali commented Apr 23, 2024

GregoryMorse commented Oct 28, 2019 •

edited

Loading

GregoryMorse commented Oct 30, 2019 •

edited

Loading

GregoryMorse commented Oct 30, 2019 •

edited

Loading

GregoryMorse commented Oct 30, 2019 •

edited

Loading

GregoryMorse commented Oct 30, 2019 •

edited

Loading

GregoryMorse commented Oct 30, 2019 •

edited

Loading

GregoryMorse commented Oct 30, 2019 •

edited

Loading

GregoryMorse commented Oct 31, 2019 •

edited

Loading

GregoryMorse commented Nov 1, 2019 •

edited

Loading

GregoryMorse commented Nov 1, 2019 •

edited

Loading

GregoryMorse commented Nov 3, 2019 •

edited

Loading