Open
Description
Introduction
This is a (hopefully) exhaustive list of ongoing/future work for HDBSCAN. These have all been discussed and are considered wanted, but some still require thorough investigation (especially heuristic evaluations).
Priority List
The higher priority items appear earlier in this list.
- ENH Add
np.float32
data support forHDBSCAN
#26888- ENH Add
float32
implementations forBallTree
andKDTree
#25914 - Finalize
{KD, Ball}Tree
API to avoid writing custom dispatcher
- ENH Add
- Support
np.nan
in Cython implementation for sparse matrices - Reintroduce
Boruvka
algorithm (removed in b7736ef) - Implement PWD backend for weighted
argkmin
in medoid calculation - Investigate PWD backend for
mst_from_*
functions in_linkage.pyx
- Investigate PWD backend for
_reachability.pyx
- Benchmark KD vs Ball Tree efficiency
- Add consistent threading semantics to enable
prange
, e.g. in_reachability.pyx
- Improve partition strategy in
_reachability.pyx
(cf. CLN Cleanedcluster/_hdbscan/_reachability.pyx
#24701 (comment)) - Add support for
np.inf
values whenmetric=='precomputed'
andX
is sparse.