Dynamic Parallelization for High-utility Itemset Mining (HUIM)
-
Only Linux is supported
-
A C++ compiler that supports C++20 or later.
- GCC
- Clang (>12.x) + libc++
-
Boost library
-
libnuma
-
(Optional) libpmem2, libvmem
include,src- These directories have HUIM-specific source codes
dphim::dphim_baseclass (include/dphim/dphim_base.hpp) provides implementations shared among several HUIM algorithmsdphim::dphim_base::parseTransactions(): parsing of input filesdphim::dphim_base::calcTWU(): calculation of TWU (calcTWU)
dphim::DPEFIMclass (include/dphim/dpefim.hpp,src/dpefim.cpp) is a main class of DPHIM implementaion for EFIM algorithmdphim::DPEFIM::calcFirstSU()mainly corresponds to Build step in the paperdphim::DPEFIM::search()mainly corresponds to Search step in the paper
dphim::DPFHMclass (include/dphim/dpfhm.hpp) is a main class of DPHIM implementaion for FHM algorithmdphim::DPEFIM::calcFMAP()mainly corresponds to Build step in the paperdphim::DPEFIM::search()mainly corresponds to Search step in the paper
nova- This directory contains a mechanism for task parallel execution
nova/include/nova/task.hpphas a implementation of a task management structure in C++ coroutine mannernova/include/nova/*_scheduler.hppprovides implementations of various task scheduler- e.g., global task queue, local task queue, NUMA-aware local task queue
- This directory contains a mechanism for task parallel execution
This project can be built using CMake.
$ mkdir build && cd build && cmake .. -DCMAKE_BUILD_TYPE=Release && make
Please execute this command
$ ./run -a ${algorithm} -s ${execution method} -t ${# of thread} -i ${dataset} -o ${output} -m ${minutil}
-
${algorithm}is an algorithm for HUIM- This implementation supports
efimandfhm
- This implementation supports
-
${execution method}should be one ofsp,global,local,local-numaordphim -
To run on persistent memory, you need to add
--pmemoption and execute with root privileges- for example
$ sudo ./run -a efim -t ${# of threads} -i ${dataset} -o ${output} -m ${minutil} --pmem=numa
You can download datasets from SPMF open-source repository.
- Kosarak
$ wget http://www.philippe-fournier-viger.com/spmf/datasets/kosarak_utility_spmf.txt
- Chainstore
$ wget http://www.philippe-fournier-viger.com/spmf/datasets/chainstore.txt
- BMS
$ wget http://www.philippe-fournier-viger.com/spmf/datasets/BMS_utility_spmf.txt
- Accidents
$ wget https://www.philippe-fournier-viger.com/spmf/datasets/accidents_utility_spmf.txt
You can use docker to execute DPHIM.
$ docker build . -t dphim
$ docker run -t dphim ./build/run -a ${algorithm} -i ${dataset} -o ${output} -m ${minutil}
This docker container downloads the above datasets in dataset directory in advance