ParChain: A Framework for Parallel Hierarchical Agglomerative Clustering using Nearest-Neighbor Chain
This repository contains a cleaned version of the code for ParChain: A Framework for Parallel Hierarchical Agglomerative Clustering using Nearest-Neighbor Chain.
To get the submodules:
git pull
git submodule update --initCompiler:
- g++ = 7.5.0
- Hardware support for __sync_bool_compare_and_swap_16 needed.
g++ -O3 -std=c++20 -mcx16 -ldl -pthread -I../external/parlaylib/include linkage.cpp -o linkagerun the command in hac folder to compile the version that uses cache tables and range queries.
run the command in hac-matrix folder to compile the version that uses a distance matrix.
./linkage -method [METHOD] -cachesize [cache size] -d [dim] -o [output] [dataset]
METHODcan be "complete", "ward", "avg" (average linkage with Euclidean distance metric), or "avgsq" (average linkage with squared Euclidean distance metric).cache sizeis the size of each hash table. ifcache size=1, no cache will be used.outputis the output file of the dendrogram
example: ./linkage -method complete -d 2 /path/to/dataset