HDBSCAN performance issues compared to original hdbscan implementation (likely because Boruvka algorithm is not implemented)

### Describe the bug

When switching from Sklearn HDBSCAN implementation to original one from `hdbscan` library, I've notice that Sklearn's implementation has much worse implementation. I've tried investigating different parameters but it doesn't seem to have an effect on the performance.

I've created synthetic benchmark using `make_blobs` function.  And those are my results:

CPU: Ryzen 5 1600, 12 Threads@3.6Ghz*
RAM: 32GB DDR4

```python
# dataset
X, y = make_blobs(n_samples=10000, centers=5, cluster_std=0.60, random_state=0, n_features=10)

# hdbscan params 
og_hdbscan = OGHDBSCAN(core_dist_n_jobs=-1)
sk_hdbscan = SKHDBSCAN(n_jobs=-1)
```

![Image](https://github.com/user-attachments/assets/42bc818c-8547-4297-9020-e87a02b7bd90)

* Tested out on Google Collab with similar results

### Steps/Code to Reproduce

I am starting both algorithms with `n_jobs=-1` to rule out the difference that may occure because of default setting of `core_dist_n_jobs=4` in `hdbscan`

```python
from hdbscan import HDBSCAN as OGHDBSCAN
from sklearn.cluster import HDBSCAN as SKHDBSCAN

import matplotlib.pyplot as plt
import numpy as np
import time

from sklearn.datasets import make_blobs

X, y = make_blobs(n_samples=10000, centers=5, cluster_std=0.60, random_state=0, n_features=10)

og_hdbscan = OGHDBSCAN(core_dist_n_jobs=-1)
sk_hdbscan = SKHDBSCAN(n_jobs=-1)

RUNS = 10

def time_hdbscan(hdbscan, X, runs):
    times = []
    for _ in range(runs):
        start = time.time()
        hdbscan.fit(X)
        end = time.time()
        times.append(end - start)
    return times

times_og = time_hdbscan(og_hdbscan, X, RUNS)
times_sk = time_hdbscan(sk_hdbscan, X, RUNS)

print("Mean time OGHDBSCAN: ", np.mean(times_og))
print("Mean time SKHDBSCAN: ", np.mean(times_sk))

plt.plot(range(RUNS), times_og, label='OGHDBSCAN', marker='o')
plt.plot(range(RUNS), times_sk, label='SKHDBSCAN', marker='x')
plt.xlabel('Run')
plt.ylabel('Time (seconds)')
plt.title('HDBSCAN Timing Comparison')
plt.legend()
plt.show()
```

### Expected Results

Similar performance between algorithms from Sklearn and `hdbscan` library

### Actual Results

Sklearn implementation of `HDBSCAN` gets much worse performance than original library. For example when testing much bigger dataset, i.e. 
```python
X, y = make_blobs(n_samples=100000, centers=5, cluster_std=0.60, random_state=0, n_features=20)
```

`hdbscan` library performs `fit` in 25s on my hardware, while Sklearn needs 5 minutes to perform clustering.

### Versions

```shell
System:
    python: 3.11.9 (tags/v3.11.9:de54cf5, Apr  2 2024, 10:12:12) [MSC v.1938 64 bit (AMD64)]
executable: d:\Documents\Projects\Machine-Learning-Basics\.venv\Scripts\python.exe
   machine: Windows-10-10.0.19045-SP0

Python dependencies:
      sklearn: 1.6.1
          pip: None
   setuptools: 80.9.0
        numpy: 1.26.4
        scipy: 1.15.3
       Cython: None
       pandas: 2.3.0
   matplotlib: 3.10.3
       joblib: 1.5.1
threadpoolctl: 3.6.0

Built with OpenMP: True

threadpoolctl info:
       user_api: blas
   internal_api: openblas
    num_threads: 12
         prefix: libopenblas
       filepath: D:\Documents\Projects\Machine-Learning-Basics\.venv\Lib\site-packages\numpy.libs\libopenblas64__v0.3.23-293-gc2f4bdbb-gcc_10_3_0-2bde3a66a51006b2b53eb373ff767a3f.dll
        version: 0.3.23.dev
threading_layer: pthreads
   architecture: Zen

       user_api: openmp
   internal_api: openmp
    num_threads: 12
         prefix: vcomp
       filepath: D:\Documents\Projects\Machine-Learning-Basics\.venv\Lib\site-packages\sklearn\.libs\vcomp140.dll
        version: None

       user_api: blas
   internal_api: openblas
    num_threads: 12
         prefix: libscipy_openblas
       filepath: D:\Documents\Projects\Machine-Learning-Basics\.venv\Lib\site-packages\scipy.libs\libscipy_openblas-f07f5a5d207a3a47104dca54d6d0c86a.dll
        version: 0.3.28
threading_layer: pthreads
   architecture: Haswell
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

HDBSCAN performance issues compared to original hdbscan implementation (likely because Boruvka algorithm is not implemented) #31503

Describe the bug

Steps/Code to Reproduce

Expected Results

Actual Results

Versions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

HDBSCAN performance issues compared to original hdbscan implementation (likely because Boruvka algorithm is not implemented) #31503

Description

Describe the bug

Steps/Code to Reproduce

Expected Results

Actual Results

Versions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions