- This repository provides implementations of Ex-DPC, Approx-DPC, and S-Approx-DPC.
- They are fast algorithms for density-peaks clustering (proposed in Science).
- As for the details about these algorithms, please read our SIGMOD2021 paper, Fast Density-Peaks Clustering: Multicore-based Parallelization Approach.
- spatial library
- We used version 2.1.8.
- Boost 1.67.0
- We have not confirmed the availability of the other version.
- The source codes of DPC algorithms have to be changed based on your paths of the above libraries.
- We prepared codes for Windows OS (Visual Studio) and Linux (Ubuntu).
- We assume low-dimensional datasets, as we use a kd-tree.
- Implementation is based on https://github.com/gishi523/kd-tree and spatial library.
- Make a new project -> console application.
- Put our source codes into the project as with
dataset,parameter, andresultfolders. - Include the paths to
spatial libraryandboost library. - Enable
OpenMP.- You may need
/Zc:twoPhase-as an additional option at command line of C/C++ in property.
- You may need
- To complie, use
/Oxoption. - Set NO as
SDL check.
- Ex-DPC
- Compile:
g++ -O3 Ex-DPC.cpp -o exdpc.out -fopenmpand run:./exdpc.out.
- Compile:
- Approc-DPC
- Compile:
g++ -O3 main.cpp -o approxdpc.out -fopenmpand run:./approxdpc.out.
- Compile:
- S-Approc-DPC
- Compile:
g++ -O3 S-Approx-DPC.cpp -o sapproxdpc.out -fopenmpand run:./sapproxdpc.out.
- Compile:
- As an example, we have prepared a 2-dimensional synthetic dataset used in our paper.
- If you want to test your dataset,
- Put the file at
_datasetordatasetdirectory. - Assign a unique dataset ID.
- Set the dimensionality at
data.hpp. - Write codes for inputing the data file in
input_data()function offile_io.hpp. - Add a direcotry in
resultand update the functioncompute_direcotry(). - Compile the code and run .exe or .out file.
- Put the file at
- Set some value in the corresponding txt file in
parameteror_parameter. - For \rho_min and \delta_min, we specify them in
file_io.hpp.
- Uncomment line 211 if you need cluster labels of the exact answer.
- If you are interested in Ex-DPC+, check here.
- If you want to compute rand index, you have to run Ex-DPC and obtain the cluster labels.
If you use our implementation, please cite the following paper.
@inproceedings{amagata2021dpc,
title={Fast Density-Peaks Clustering: Multicore-based Parallelization Approach},
author={Amagata, Daichi and Hara, Takahiro},
booktitle={SIGMOD},
pages={49--61},
year={2021}
}
Copyright (c) 2020 Daichi Amagata
This software is released under the MIT license.