Thanks to visit codestin.com
Credit goes to github.com

Skip to content

metrics.confusion_matrix far too slow for Boolean cases #15388

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
GregoryMorse opened this issue Oct 28, 2019 · 20 comments
Closed

metrics.confusion_matrix far too slow for Boolean cases #15388

GregoryMorse opened this issue Oct 28, 2019 · 20 comments

Comments

@GregoryMorse
Copy link
Contributor

GregoryMorse commented Oct 28, 2019

Description

When using metrics.confusion_matrix with np.bool_ cases (e.g. with only True/False values), it is far more fast to not use list comprehensions as the current code does. numpy has sum and Boolean logic functions to deal with this very efficiently to scale far better.
Code found in: https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/metrics/_classification.py which uses list comprehensions and never checks for np.bool_ types trying to avoid all the excessive code. This is a very common and reasonable use case in practice. Assuming normalization and no sample weights though those also can be efficiently dealt with.

Steps/Code to Reproduce

import numpy as np

N = 4096
p = 0.5
a = np.random.choice(a=[False, True], size=(N), p=[p, 1-p])
b = np.random.choice(a=[False, True], size=(N), p=[p, 1-p])
from sklearn.metrics import confusion_matrix
for i in range(1024): confusion_matrix(a, b)

Expected Results

Fast execution time.
e.g. substituting confusion_matrix with this conf_mat (not efficient as possible but easier to read and more efficient than current library even with 4 sums, 4 logical ANDs and 4 logical NOTs:

        def conf_mat(x, y):
            return np.array([[np.sum(~x & ~y), np.sum(~x & y)], #true negatives, false positives
                             [np.sum(x & ~y), np.sum(x & y)]]) #false negatives, true positives

Or even faster with 1 logical AND with 3 sums:

        def conf_mat(x, y):
            truepos, totalpos, totaltrue = np.sum(x & y), np.sum(y), np.sum(x)
            totalfalse = len(x) - totaltrue
            return np.array([[totalfalse - falseneg, totalpos - truepos], #true negatives, false positives
                             [totaltrue - truepos, truepos]]) #false negatives, true positives

Actual Results

Slow execution time around 60 times slower than efficient code. The np.bool_ case could be identified and efficient code applied otherwise in serious cases of scale, the current code is too slow to be practically usable.

Versions

All - including 0.21

@GregoryMorse GregoryMorse changed the title metrics.confusion_matrix far too slow for binary cases metrics.confusion_matrix far too slow for Boolean cases Oct 29, 2019
@rth
Copy link
Member

rth commented Oct 29, 2019

Thanks for the report! What data size are you using? With n_samples=4096 from the example, it takes 2ms on my laptop per iteration. With n_samples = 10_000_000 it takes 6s per iterations, which is still not that slow.

We can make a specialized faster solution for boolean, but even better would be to improve the current generic one. I have quickly looked at the code and couldn't see obvious bottlenecks, list comprehension is not necessarily it, though maybe we can do the same things with numpy functions. Profiling the code would be a good start. Would you be interested in investigating?

@GregoryMorse
Copy link
Contributor Author

GregoryMorse commented Oct 30, 2019

I was using it with batches of about 4096 but doing it a great many times as part of a modeling algorithm for generic detection of Boolean formulas (sort of a variant of a decision tree classifier). 10000000 is quite different than 1000, 10000 times also as the bottleneck can be increased/decreased based on repeated calls due to the sequence of code in there. I realized whenever I Ctrl+C'ed it was always sitting in confusion_matrix, a sort of poor man's performance bottleneck finder while debugging - always in a list comprehension. Seriously empirical data :). Anyway, I swapped it out with the code mentioned, and it dramatically improved the performance. We could compare but without absolutely no doubt the code using numpy is faster.

Of course generic functions are bound to be slower than specifically optimized variants. But since the data type is inspected, and the function is doing too much it seems, for this simple case which it does have the ability to easily detect, I raised the issue. I feel sending proper Boolean values is a primary use case so was not worth overlooking it.

I do not see where the work is being done in the function, in fact, after studying the code. I should have paid more attention where that Ctrl+C was landing.

A couple questions: What is the typical easy and efficient approach to go about performance profiling code in Python - beyond simplistic time measurements line by line, are there any tool suggestions? Also, would modifying the code to scikit-learn require virtual environments to not interfere with primary development?

@GregoryMorse
Copy link
Contributor Author

GregoryMorse commented Oct 30, 2019

Update: 1 logical AND with 3 sums version edited into post above as its even more efficient, also the execution time difference now known exactly.

from sklearn.metrics import confusion_matrix
import numpy as np
import datetime
def conf_mat(x, y):
    totaltrue = np.sum(x)
    return conf_mat_opt(x, y, totaltrue, len(x) - totaltrue)

def conf_mat_opt(x, y, totaltrue, totalfalse):
    truepos, totalpos = np.sum(x & y), np.sum(y)
    falsepos = totalpos - truepos
    return np.array([[totalfalse - falsepos, falsepos], #true negatives, false positives
                     [totaltrue - truepos, truepos]]) #false negatives, true positives

def conf_mat_slower(x, y):
    return np.array([[np.sum(~x & ~y), np.sum(~x & y)], #true negatives, false positives
                     [np.sum(x & ~y), np.sum(x & y)]]) #false negatives, true positives

N = 4096
p = 0.5
a = np.random.choice(a=[False, True], size=(N), p=[p, 1-p])
b = np.random.choice(a=[False, True], size=(N), p=[p, 1-p])
t = datetime.datetime.now()
for i in range(1024): _ = confusion_matrix(a, b)

print(int((datetime.datetime.now() - t).total_seconds() * 1000))
t = datetime.datetime.now()
for i in range(1024): _ = conf_mat_slower(a, b)

print(int((datetime.datetime.now() - t).total_seconds() * 1000))

t = datetime.datetime.now()
for i in range(1024): _ = conf_mat(a, b)

print(int((datetime.datetime.now() - t).total_seconds() * 1000))

Output:

2718
52
46

Removing 3 logical ANDs, 4 logical negations and a sum operation hardly makes a difference in numpy.

2718/46=59 times faster...

As for doing large batches:

N = 4096
p = 0.5
a = np.random.choice(a=[False, True], size=(N), p=[p, 1-p])
b = np.random.choice(a=[False, True], size=(N), p=[p, 1-p])
t = datetime.datetime.now()
for i in range(1024): _ = confusion_matrix(a, b)

print(int((datetime.datetime.now() - t).total_seconds() * 1000))

N = 4096*1024
a = np.random.choice(a=[False, True], size=(N), p=[p, 1-p])
b = np.random.choice(a=[False, True], size=(N), p=[p, 1-p])
t = datetime.datetime.now()
_ = confusion_matrix(a, b)

print(int((datetime.datetime.now() - t).total_seconds() * 1000))

I see:

2368
2640

So apparently its faster in small batches than large batches but still many times slower than native numpy vectorized operations as list comprehensions do not have anyway to match such speed.

@jnothman
Copy link
Member

Also consider, if x and y are Boolean:

confusion = np.bincount(y_true * 2 + y_pred, minlength=4).reshape(2, 2)

@GregoryMorse
Copy link
Contributor Author

GregoryMorse commented Oct 30, 2019

Using Ctrl+C with:

for i in range(1024*1024): _ = confusion_matrix(a, b)
  File "C:\Program Files\Python37\lib\site-packages\sklearn\metrics\classification.py", line 274, in confusion_matrix
    y_pred = np.array([label_to_ind.get(x, n_labels + 1) for x in y_pred])
  File "C:\Program Files\Python37\lib\site-packages\sklearn\metrics\classification.py", line 275, in <listcomp>
    y_true = np.array([label_to_ind.get(x, n_labels + 1) for x in y_true])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Program Files\Python37\lib\site-packages\sklearn\metrics\classification.py", line 258, in confusion_matrix
    labels = unique_labels(y_true, y_pred)
  File "C:\Program Files\Python37\lib\site-packages\sklearn\utils\multiclass.py", line 96, in unique_labels
    ys_labels = set(chain.from_iterable(_unique_labels(y) for y in ys))
  File "C:\Program Files\Python37\lib\site-packages\sklearn\utils\multiclass.py", line 96, in <genexpr>
    ys_labels = set(chain.from_iterable(_unique_labels(y) for y in ys))
  File "C:\Program Files\Python37\lib\site-packages\sklearn\utils\multiclass.py", line 24, in _unique_multiclass
    return np.unique(np.asarray(y))
  File "<__array_function__ internals>", line 6, in unique
  File "C:\Program Files\Python37\lib\site-packages\numpy\lib\arraysetops.py", line 262, in unique
    ret = _unique1d(ar, return_index, return_inverse, return_counts)
  File "C:\Program Files\Python37\lib\site-packages\numpy\lib\arraysetops.py", line 310, in _unique1d
    ar.sort()

So the list comprehension based on this rather silly performance testing (but nevertheless legitimate and has been highly useful in the past in dramatically reducing bottlenecks), is one of the key bottlenecks compared to numpy arrays. Also going through the labels when they are known to be booleans.

@jnothman
Copy link
Member

Consider using the ipython %timeit to benchmark

@GregoryMorse
Copy link
Contributor Author

GregoryMorse commented Oct 30, 2019

Thanks for the advice, will look into ipython. In the meantime:

t = datetime.datetime.now()
for i in range(1024): _ = np.bincount(a * 2 + b, minlength=4).reshape(2, 2)

print(int((datetime.datetime.now() - t).total_seconds() * 1000))

31 so even around 33% faster - thank you very much for that serious improvement!!! A rather ingenious one-liner solution 👍 Obviously the bin count vectorizes better than 3 separate sums though I would think multiplication/addition and logical and should be equivalent - but 3 vector operations is better than 4 not to mention the simple assignment/array construction code. I think enough info is here now to make the PR. I will see about it.

@jnothman
Copy link
Member

jnothman commented Oct 30, 2019 via email

@jnothman
Copy link
Member

jnothman commented Oct 30, 2019 via email

@GregoryMorse
Copy link
Contributor Author

GregoryMorse commented Oct 30, 2019

Yes I realize that a generic solution cannot be as efficient as a special built one. But since the data type is detected at least appeared to be in code, I think this particular case could be handled differently using your optimized code.

Also a good point, so more detailed profiling might be needed. But anything which is not vectorizable and requiring iteration is the most likely candidate for optimization.

I confirmed that Boolean AND vs multiplication have equal speed with numpy. I am quite sure all boolean and basic mathematical algebraic expressions have processor intrinsics for fast operations - though not sure that is how numpy is doing it. So the number of vector operations is the key metric here.

@GregoryMorse
Copy link
Contributor Author

GregoryMorse commented Oct 30, 2019

For the curious, this scenario derived from a binary classification problem. I want to take a Boolean matrix to classify data against a Boolean vector. However here is where it gets interesting. Obviously decision trees come to mind immediately. But the goal is not to strictly classify the data as the dataset is far to chaotic to do that (even GridSearchCV with DecisionTreeClassifier at best scores 52% when testing the model - even though naturally you can overfit the whole dataset to 100% easily without testing the model of course).

Rather the goal is to find Boolean combinations (AND/OR) of columns in the matrix, which can predict the positives at > some % say 75%. And another set of combinations to predict the negatives > 75%. Everything else is considered a gray area. It amounts to a 3-label classification but not quite as the labels would not be known in advance. As far as I know, scikit-learn has no model to accomplish such a task. I simply started ANDing together columns to increase the positive rate, then ORing together those to maximize the total true positives. Hence I decided to use the confusion matrix to calculate the positive rate along the way. Probably there is a better way to do this, and it would be highly useful if there is since its painfully slow.

@GregoryMorse
Copy link
Contributor Author

GregoryMorse commented Oct 31, 2019

I probably should open another issue since its fundamentally different. But using pure Boolean values is pretty limiting when dealing with very large data due to memory waste. np.array will use 1 byte per Boolean which although better than 4 or 8, is still 8 times waste.

Easy enough to pack the Boolean value np.array as bytes (np.uint8):

def boolarr_tobytes(arr):
    rem = len(arr) % 8
    if rem != 0: arr = np.concatenate((arr, np.zeros(8 - rem, dtype=np.bool_)))
    arr = np.reshape(arr, (int(len(arr) / 8), 8))
    return np.packbits(arr) #translates boolean to bytes if array shape (n, 8) with high bits first

Then & and | provide bitwise operations already. And np.sum can be written as:

bytebitcounts = np.array([bin(x).count("1") for x in range(256)])
def totalbits_bytearr(arr):
    return np.sum(bytebitcounts[arr])

Now I am truly supposing that the table lookup which uses the imaging table translation function is vectorized properly. I would imagine since it is used heavily for image processing that it is. This would be 2 vector operations (np.sum and table lookup) instead of 1 np.sum. PSHUFB (packed bytes shuffle) is the name of the processor intrinsic which can do byte table lookup translation. However, since the AVX/SSE2 and like instructions have data limits, 8 times less vector operations would occur per vector operation. 1 vector operation * 8 vs 2 vector operations is still 4 times faster.

So if scikit-learn would dare to add a whole data type to use packed byte representations of Booleans instead (which might be a major change which would need to be implemented), it would decrease memory by 8 times, increase vector operations by 8 times except where bit twiddling like mentioned is needed where it would depending on the specific operation still tend to be faster.

I can see no reason why this would not be highly desirable for the library especially since large datasets are pretty typical.

I will open a feature request I suppose to continue this discussion.

@jnothman
Copy link
Member

jnothman commented Nov 1, 2019 via email

@GregoryMorse
Copy link
Contributor Author

GregoryMorse commented Nov 1, 2019

Yes the primary problem would be endless indexing oddities (1 << bitoffset) & value, along with if set: value |= (1 << bitoffset). But a lot of things are already implicitly supported.

I suppose this first belongs over at numpy as a feature request before it moves here :-). Thanks for the advice. Half of the operations are probably trivial like multiplication, addition as shown, and a few would require some real thinking. Upon further thinking, if numpy added the datatype perhaps no work at all is required on this side at all. Had not thought of it this way (now at: numpy/numpy#14821).

It would make these Python libraries as flexibly scalable as C though so it would be impressive.

@GregoryMorse
Copy link
Contributor Author

GregoryMorse commented Nov 1, 2019

@jnothman Apparently we can do even better (at least twice better)...in fact milliseconds is almost no longer an appropriate timing mechanism but microseconds might be good:

import numba
@numba.guvectorize([(numba.boolean[:], numba.boolean[:], numba.int64[:], numba.int64[:])], '(n),(n),(p)->(p)', nopython=True)
def fastbincount(x, y, dim, res):
    res[0], res[1], res[2], res[3] = 0, 0, 0, 0
    for i in range(x.shape[0]):
        res[x[i] * 2 + y[i]] += 1

t = datetime.datetime.now()
for i in range(1024): _ = fastbincount(a, b, np.array([0, 0, 0, 0])).reshape(2,2)
print(int((datetime.datetime.now() - t).total_seconds() * 1000))
#9

I cannot get frozen arguments (so (4) does not work though I read it should in numpy 1.16 - numba/numba#1668 but its not implemented still) so have to pass a size placeholder. I use a 2x2 np.array as direct output but since Booleans wont index without at least casting it is basically same speed bincount so best to do the math and a single dimensional vector.

import numba
@numba.guvectorize([(numba.boolean[:], numba.boolean[:], numba.int64[:,:], numba.int64[:,:])], '(n),(n),(p,q)->(p,q)', nopython=True)
def fastbincount(x, y, dim, res):
    res[0,0], res[1,0], res[0,1], res[1,1] = 0, 0, 0, 0
    for i in range(x.shape[0]):
        res[int(x[i])][int(y[i])] += 1
        

t = datetime.datetime.now()
for i in range(1024): _ = fastbincount(a, b, np.array([[0, 0], [0, 0]]))
print(int((datetime.datetime.now() - t).total_seconds() * 1000))
#24

I suppose numba is not used in scikit-learn currently, is there a reason any good reason why not to go this route? E.g. platform support (because its native C apparently going through a complicated LLVM optimization pipelines), extra dependency (though its required by numpy already I think).

I am doubting further optimization is possible. A single multiplication and addition is really hard to beat unless there is some other Boolean comparison trick.

@jnothman
Copy link
Member

jnothman commented Nov 1, 2019 via email

@GregoryMorse
Copy link
Contributor Author

GregoryMorse commented Nov 3, 2019

Just for fun, I provide the best solution I could find for packed byte confusion matrices even though I understand such a format is not currently properly supported by numpy where such a change would need to start. First approach is similar to bincount but uses vectorized bit twiddling to achieve it, and the other approach uses 3 vectorized bit sums and a bitwise AND operation. I am thinking to try it again with a 64k byte by byte lookup table to 2 bits + 2 bits + 2 bits + 2 bits though the whole idea of reducing memory consumption starts to get defeated with such strategies. Also that lookup also requires bit twiddling unless making it 256k for bytes instead of 2 bit groupings. Update: so I found the best approach so far is using simple byte to bit component lookup tables requiring only 256*8=2k bytes. It halves the speed of all the twiddling.

import timeit
import numpy as np
import numba
N = 4096
p = 0.5
a = np.random.choice(a=[False, True], size=(N), p=[p, 1-p])
b = np.random.choice(a=[False, True], size=(N), p=[p, 1-p])
placehold = np.array([0, 0, 0, 0], dtype=np.int64)
bitextractor = np.array([[x & 1, (x & 2) >> 1, (x & 4) >> 2, (x & 8) >> 3, (x & 16) >> 4, (x & 32) >> 5, (x & 64) >> 6, (x & 128) >> 7] for x in range(256)], dtype=np.uint8)
@numba.guvectorize([(numba.uint8[:], numba.uint8[:], numba.int64[:], numba.int64[:])], '(n),(n),(p)->(p)', nopython=True)
def fastpackedbincount(x, y, dim, res):
    res[0], res[1], res[2], res[3] = 0, 0, 0, 0
    for i in range(x.shape[0]):
        for j in range(8):
            res[bitextractor[x[i],j] * 2 + bitextractor[y[i],j]] += 1
def conf_mat_packedfast(x, y, sz):
    cm = fastpackedbincount(x, y, placehold).reshape(2, 2)
    cm[0,0] -= (len(x) * 8 - sz)
    return cm
@numba.guvectorize([(numba.uint8[:], numba.uint8[:], numba.int64[:], numba.int64[:])], '(n),(n),(p)->(p)', nopython=True)
def packedbincount(x, y, dim, res):
    res[0], res[1], res[2], res[3] = 0, 0, 0, 0
    for i in range(x.shape[0]):
        res[((x[i] & 1) << 1) + (y[i] & 1)] += 1
        res[(x[i] & 2) + ((y[i] & 2) >> 1)] += 1
        res[((x[i] & 4) >> 1) + ((y[i] & 4) == 4)] += 1
        res[((x[i] & 8) >> 2) + ((y[i] & 8) == 8)] += 1
        res[((x[i] & 0x10) >> 3) + ((y[i] & 0x10) == 0x10)] += 1
        res[((x[i] & 0x20) >> 4) + ((y[i] & 0x20) == 0x20)] += 1
        res[((x[i] & 0x40) >> 5) + ((y[i] & 0x40) == 0x40)] += 1
        res[((x[i] & 0x80) >> 6) + (y[i] >> 7)] += 1
def conf_mat_packed(x, y, sz):
    cm = packedbincount(x, y, placehold).reshape(2, 2)
    cm[0,0] -= (len(x) * 8 - sz)
    return cm
bytebitcounts = np.array([bin(x).count("1") for x in range(256)], dtype=np.uint64)
def totalbits_bytearr(arr):
    return np.sum(bytebitcounts[arr])
@numba.guvectorize([(numba.uint8[:], numba.int64[:])], '(n)->()', nopython=True)
def sum_bits(x, res):
    res[0] = 0
    for xi in x:
        res[0] += bytebitcounts[xi]
@numba.guvectorize([(numba.uint8[:], numba.int64[:])], '(n)->()', nopython=True)
def sum_bits_fast(x, res):
    res[0] = 0
    for xi in x:
        z = (xi & 0x55) + (xi >> 1 & 0x55)
        z = (z & 0x33) + (z >> 2 & 0x33)
        res[0] += (z & 0x0f) + (z >> 4 & 0x0f)
def conf_mat_packedopt(x, y, totaltrue, totalfalse):
    truepos, totalpos = sum_bits(x & y), sum_bits(y)
    falsepos = totalpos - truepos
    return np.array([[totalfalse - falsepos, falsepos], #true negatives, false positives
                            [totaltrue - truepos, truepos]]) #false negatives, true positives
def conf_mat_packedsum(x, y, sz):
    totaltrue = sum_bits(x)
    return conf_mat_packedopt(x, y, totaltrue, sz - totaltrue)
def boolarr_tobytes(arr):
    rem = len(arr) % 8
    if rem != 0: arr = np.concatenate((arr, np.zeros(8 - rem, dtype=np.bool_)))
    arr = np.reshape(arr, (int(len(arr) / 8), 8))
    return np.packbits(arr) #translates boolean to bytes if array shape (n, 8) with high bits first
a, b = boolarr_tobytes(a), boolarr_tobytes(b)
assert(np.array_equal(conf_mat_packed(a, b, N), conf_mat_packedsum(a, b, N)))
assert(np.array_equal(conf_mat_packed(a, b, N), conf_mat_packedfast(a, b, N)))
%timeit _ = conf_mat_packed(a, b, N)
%timeit _ = conf_mat_packedsum(a, b, N)
%timeit _ = conf_mat_packedfast(a, b, N)



assert(sum_bits(a) == totalbits_bytearr(a))
assert(sum_bits_fast(a) == totalbits_bytearr(a))
%timeit _ = sum_bits(a)
%timeit _ = sum_bits_fast(a)
%timeit _ = totalbits_bytearr(a)

7.2 µs ± 117 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
8.61 µs ± 160 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
8.1 µs ± 104 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

1.27 µs ± 8.45 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
1.91 µs ± 15.7 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
9.31 µs ± 141 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

So the excessive bit twiddling basically does not cancel the advantage of bincount and a table lookup. Incidentally, the speed is comparable to the above result for guvectorized bincount.

But the fastest known Pythonic approach now without direct use of C - is using bit lookup tables unless I can find something faster :-).

@dyollb
Copy link

dyollb commented Sep 24, 2021

I am evaluation 3D ML-based segmentation predictions and was looking for a fast confusion matrix implementation. My observation was that sklearn.metrics.confusion_matrix is quite slow. In fact it is slower than loading the data and doing inference with a UNet.

I did a comparison of different ways to compute the confusion matrix:

import numpy as np
from numba import njit, generated_jit, types
from sklearn.metrics import confusion_matrix as sk_confusion_matrix
import pandas
from timeit import default_timer as timer

def compute_confusion_naive(a: np.ndarray, b: np.ndarray, num_classes: int = 0):
    if num_classes < 1:
        num_classes = max(np.max(a), np.max(b)) + 1
    cm = np.zeros((num_classes, num_classes))
    for i in range(a.shape[0]):
        cm[a[i], b[i]] += 1
    return cm


def compute_confusion_zip(a: np.ndarray, b: np.ndarray, num_classes: int = 0):
    if num_classes < 1:
        num_classes = max(np.max(a), np.max(b)) + 1
    cm = np.zeros((num_classes, num_classes))
    for ai, bi in zip(a, b):
        cm[ai, bi] += 1
    return cm


@njit
def compute_confusion_numba(a: np.ndarray, b: np.ndarray, num_classes: int = 0):
    if num_classes < 1:
        num_classes = max(np.max(a), np.max(b)) + 1
    cm = np.zeros((num_classes, num_classes))
    for i in range(a.shape[0]):
        cm[a[i], b[i]] += 1
    return cm


def compute_confusion_sklearn(a: np.ndarray, b: np.ndarray):
    return sk_confusion_matrix(a, b)


def compute_confusion_pandas(a: np.ndarray, b: np.ndarray):
    return pandas.crosstab(pandas.Series(a), pandas.Series(b))


if __name__ == "__main__":
    A = np.random.randint(15, size=310*310*310)
    B = np.random.randint(15, size=310*310*310)

    start = timer()
    cm1 = compute_confusion_naive(A, B)
    end = timer()
    print("Naive: %g s" % (end-start))

    start = timer()
    cm1 = compute_confusion_zip(A, B)
    end = timer()
    print("Naive-Zip: %g s" % (end-start))

    start = timer()
    cm1 = compute_confusion_sklearn(A, B)
    end = timer()
    print("sklearn: %g s" % (end-start))

    start = timer()
    cm1 = compute_confusion_numba(A, B, 0)
    end = timer()
    print("Numba: %g s" % (end-start))

    start = timer()
    cm1 = compute_confusion_pandas(A, B)
    end = timer()
    print("pandas: %g s" % (end-start))

The results are:

Naive: 18.6546 s
Naive-Zip: 17.86 s
sklearn: 18.5911 s
Numba: 0.674944 s
pandas: 5.81173 s

The timing for the numba implementation can be optimized further (by half) if num_classes is known, using dispatch via generated_jit to skip computing the max of a and b.

@lucyleeow
Copy link
Member

@jeremiedbb do you think we should close this? As you established in #28578, using alternative implementation for binary cases does not seem to be the way improve performance here.
Also #26820 is working on confusion matrix performance.

@adrinjalali
Copy link
Member

good point, closing as duplicate of #26808

@adrinjalali adrinjalali closed this as not planned Won't fix, can't repro, duplicate, stale Apr 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
7 participants