Please include the following references when citing the YACCLAB project/dataset:
- 
Allegretti, Stefano; Bolelli, Federico; Grana, Costantino "Optimized Block-Based Algorithms to Label Connected Components on GPUs." IEEE Transactions on Parallel and Distributed Systems, 2019. BibTex. PDF. 
- 
Bolelli, Federico; Cancilla, Michele; Baraldi, Lorenzo; Grana, Costantino "Towards Reliable Experiments on the Performance of Connected Components Labeling Algorithms" Journal of Real-Time Image Processing, 2018. BibTex. PDF. 
- 
Grana, Costantino; Bolelli, Federico; Baraldi, Lorenzo; Vezzani, Roberto "YACCLAB - Yet Another Connected Components Labeling Benchmark" Proceedings of the 23rd International Conference on Pattern Recognition, Cancun, Mexico, 4-8 Dec 2016. BibTex. PDF. 
YACCLAB is an open source C++ project that enables researchers to test CCL algorithms under extremely variable points of view, running and testing algorithms on a collection of datasets described below. The benchmark performs the following tests which will be described later in this readme: correctness, average run-time (average), average run-time with steps (average_ws), density, size, granularity and memory accesses (memory).
Notice that 8-connectivity is always used in the project.
This project follows the Reproducible Research paradigms and received the Reproducible Label in Pattern Recognition (RLPR).
To correctly install and run YACCLAB following packages, libraries and utilities are needed:
- CMake 3.18 or higher (https://cmake.org);
- OpenCV 3.0 or higher (http://opencv.org), required packages are core,imgcodecs,imgproc;
- Gnuplot (http://www.gnuplot.info/);
- One of your favourite IDE/compiler with C++14 support.
GPU algorithms also require:
- CUDA Toolkit 9.2 or higher (https://developer.nvidia.com/cuda-toolkit) and OpenCV cudafeatures2dpackage (as of OpenCV 4.5.3, package dependencies entail that required packages for CUDA algorithms arecore,cudafeatures2d,cudaarithm,cudafilters,cudaimgproc,cudawarping,cudev,features2d,imgcodecs,imgproc).
Notes for gnuplot:
- on Windows system: be sure add gnuplot to system path if you want YACCLAB automatically generates charts.
- on MacOS system: 'pdf terminal' seems to be not available due to old version of cairo, 'postscript' is used instead.
- 
Clone the GitHub repository (HTTPS clone URL: https://github.com/prittt/YACCLAB.git) or simply download the full master branch zip file and extract it (e.g YACCLAB folder). 
- 
Install software in YACCLAB/bin subfolder (suggested) or wherever you want using CMake (point 2 of the example image). Note that CMake should automatically find the OpenCV path whether correctly installed on your OS (3), download the YACCLAB Dataset (be sure to check the box if you want to download it (4a) and (4b) or to select the correct path if the dataset is already on your file system (7)), and create a C++ project for the selected IDE/compiler (9-10). Moreover, if you want to test 3D or GPU algorithms tick the corresponding boxes (5) and (6). 
- 
Set the configuration file (config.yaml) placed in the installation folder (bin in this example) in order to select desired tests. 
- 
Open the project, compile and run it: the work is done! 
| Name | Meaning | Default | 
|---|---|---|
| YACCLAB_DOWNLOAD_DATASET | whether to automatically download the 2D YACCLAB dataset or not | OFF | 
| YACCLAB_DOWNLOAD_DATASET_3D | whether to automatically download the 3D YACCLAB dataset or not | OFF | 
| YACCLAB_ENABLE_3D | enable/disable the support for 3D algorithms | OFF | 
| YACCLAB_ENABLE_CUDA | enable/disable CUDA support | OFF | 
| YACCLAB_ENABLE_EPDT_19C | enable/disable the EPDT_19C 3D algorithm which is based on a heuristic decision tree generated from a 3D mask with 19 conditions (may noticeably increase compilation time), it has no effect when YACCLAB_ENABLE_3D is OFF | OFF | 
| YACCLAB_ENABLE_EPDT_22C | enable/disable the EPDT_22C 3D algorithm which is based on a heuristic decision tree generated from a 3D mask with 22 conditions (may noticeably increase compilation time), it has no effect when YACCLAB_ENABLE_3D is OFF | OFF | 
| YACCLAB_ENABLE_EPDT_26C | enable/disable the EPDT_26C 3D algorithm which is based on a heuristic decision tree generated from a 3D mask with 26 conditions (may noticeably increase compilation time), it has no effect when YACCLAB_ENABLE_3D is OFF | OFF | 
| YACCLAB_FORCE_CONFIG_GENERATION | whether to force the generation of the default configuration file ( config.yaml) or not. When this flag is turnedOFFany existing configuration file will not be overwritten | OFF | 
| YACCLAB_INPUT_DATASET_PATH | path to the inputdataset folder, where to find test datasets | ${CMAKE_INSTALL_PREFIX}/input | 
| YACCLAB_OUTPUT_RESULTS_PATH | path to the outputfolder, where to save output results | ${CMAKE_INSTALL_PREFIX}/output | 
| OpenCV_DIR | OpenCV installation path | - | 
If your project requires a Connected Components Labeling algorithm and you are not interested in the whole YACCLAB benchmark you can use the connectedComponent function of the OpenCV library which implements the BBDT and SAUF algorithms since version 3.2., Spaghetti Labeling algorithm and BKE (for GPU only) since version 4.6.
Anyway, when the connectedComponents function is called, a lot of additional code will be executed together with the core function. If your project requires the best performance you can include an algorithm implemented in YACCLAB adding the following files to your project:
- labeling_algorithms.h and labeling_algorithms.cc which define the base class from which every algorithm derives from;
- yacclab_tensor.h, yacclab_tensor.cc which define input and output data tensors;
- label_solver.h and label_solver.cc which cointain the implementation of labels solving algorithms;
- memory_tester.h, performance_evaluator.h, volume_util.h, volume_util.cc, utilities.h, utilities.cc, system_info.h, system_info.cc, check_labeling.h, check_labeling.cc, file_manager.h, file_manager.cc, stream_demultiplexer.h, config_data.h, register.h, yacclab_test.h, progress_bar.h, cuda_mat3.hpp, cuda_types3.hpp, and cuda_mat3.inl.hpp just to make things work without changing the code;
- headers and sources files of the required algorithm/s. The association between algorithms and headers/sources files is reported in the tables below.
| Algorithm Name | Authors | Year | Acronym | Required Files | Templated on Labels Solver | 
|---|---|---|---|---|---|
| - | L. Di Stefano, A. Bulgarelli [3] | 1999 | DiStefano | labeling_distefano_1999.h | ❌ | 
| Contour Tracing | F. Chang, C.J. Chen, C.J. Lu [1] | 1999 | CT | labeling_fchang_2003.h | ❌ | 
| Run-Based Two-Scan | L. He, Y. Chao, K. Suzuki [30] | 2008 | RBTS | labeling_he_2008.h | ✔ | 
| Scan Array-based with Union Find | K. Wu, E. Otoo, K. Suzuki [6] | 2009 | SAUF | labeling_wu_2009.h, labeling_wu_2009_tree.inc | ✔ | 
| Stripe-Based Labeling Algorithm | H.L. Zhao, Y.B. Fan, T.X. Zhang, H.S. Sang [8] | 2010 | SBLA | labeling_zhao_2010.h | ❌ | 
| Block-Based with Decision Tree | C. Grana, D. Borghesani, R. Cucchiara [4] | 2010 | BBDT | labeling_grana_2010.h, labeling_grana_2010_tree.inc | ✔ | 
| Configuration Transition Based | L. He, X. Zhao, Y. Chao, K. Suzuki [7] | 2014 | CTB | labeling_he_2014.h, labeling_he_2014_graph.inc | ✔ | 
| Block-Based with Binary Decision Trees | W.Y. Chang, C.C. Chiu, J.H. Yang [2] | 2015 | CCIT | labeling_wychang_2015.h, labeling_wychang_2015_tree.inc, labeling_wychang_2015_tree_0.inc | ✔ | 
| Light Speed Labeling | L. Cabaret, L. Lacassagne, D. Etiemble [5] | 2016 | LSL_STDI LSL_STDZII LSL_RLEIII | labeling_lacassagne_2016.h, labeling_lacassagne_2016_code.inc | ✔IV | 
| Pixel Prediction | C.Grana, L. Baraldi, F. Bolelli [9] | 2016 | PRED | labeling_grana_2016.h, labeling_grana_2016_forest.inc, labeling_grana_2016_forest_0.inc | ✔ | 
| Directed Rooted Acyclic Graph | F. Bolelli, L. Baraldi, M. Cancilla, C. Grana [23] | 2018 | DRAG | labeling_bolelli_2018.h, labeling_grana_2018_drag.inc | ✔ | 
| Spaghetti Labeling | F. Bolelli, S. Allegretti, L. Baraldi, C. Grana [26] | 2019 | Spaghetti | labeling_bolelli_2019.h, labeling_bolelli_2019_forest.inc, labeling_bolelli_2019_forest_firstline.inc, labeling_bolelli_2019_forest_lastline.inc, labeling_bolelli_2019_forest_singleline.inc | ✔ | 
| PRED++ | F. Bolelli, S. Allegretti, C. Grana [33] | 2021 | PREDpp | labeling_PREDpp_2021.h, labeling_PREDpp_2021_center_line_forest_code.inc.h, labeling_PREDpp_2021_first_line_forest_code.inc.h | ✔ | 
| Tagliatelle Labeling | F. Bolelli, S. Allegretti, C. Grana [33] | 2021 | Tagliatelle | labeling_tagliatelle_2021.h, labeling_tagliatelle_2021_center_line_forest_code.inc.h, labeling_tagliatelle_2021_first_line_forest_code.inc.h, labeling_tagliatelle_2021_last_line_forest_code.inc.h, labeling_tagliatelle_2021_single_line_forest_code.inc.h | ✔ | 
| Bit-Run Two Scan | W. Lee, F. Bolelli, S. Allegretti, C. Grana [32] | 2021 | BRTSVII | labeling_lee_2021_brts.h | ✔ | 
| Bit-Merge-Run Scan | W. Lee, F. Bolelli, S. Allegretti, C. Grana [32] | 2021 | BMRSVII | labeling_lee_2021_bmrs.h | ✔ | 
| Null Labeling | F. Bolelli, M. Cancilla, L. Baraldi, C. Grana [13] | - | NULLV | labeling_null.h | ❌ | 
| SAUF 3D | F. Bolelli, S. Allegretti, C. Grana [33] | 2021 | SAUF_3D | labeling3D_SAUF_2021.h, labeling3D_SAUF_2021_tree_code.inc.h | ✔ | 
| SAUF++ 3D | F. Bolelli, S. Allegretti, C. Grana [33] | 2021 | SAUFpp_3D | labeling3D_SAUFpp_2021.h, labeling3D_SAUFpp_2021_tree_code.inc.h | ✔ | 
| PRED 3D | F. Bolelli, S. Allegretti, C. Grana [33] | 2021 | PRED_3D | labeling3D_PRED_2021.h, labeling3D_PRED_2021_center_line_forest_code.inc.h, labeling3D_PRED_2021_first_line_forest_code.inc.h, labeling3D_PRED_2021_last_line_forest_code.inc.h, labeling3D_PRED_2021_single_line_forest_code.inc.h | ✔ | 
| PRED++ 3D | F. Bolelli, S. Allegretti, C. Grana [33] | 2021 | PREDpp_3D | labeling3D_PREDpp_2021.h, labeling3D_PREDpp_2021_center_line_forest_code.inc.h, labeling3D_PREDpp_2021_first_line_forest_code.inc.h, labeling3D_PREDpp_2021_last_line_forest_code.inc.h, labeling3D_PREDpp_2021_single_line_forest_code.inc.h | ✔ | 
| Entropy Partitioning Decision Tree RLPR | M. Söchting, S. Allegretti, F. Bolelli, C. Grana [31] | 2021 | EPDT_19c and EPDT_22cVI | labeling3D_BBDT_2019.h, labeling_bolelli_2019_forest.inc, labeling_bolelli_2019_forest_firstline.inc, labeling_bolelli_2019_forest_lastline.inc, labeling_bolelli_2019_forest_singleline.inc | ✔ | 
I standard version. 
II with zero-offset optimization. 
III with RLE compression. 
IV only on TTA and UF. 
V it only copies the pixels from the input image to the output one simply defining a lower bound limit for the execution time of CCL algorithms on a given machine and dataset.
VI EPDT_19c and EPDT_22c algorithms are based on very big decision trees that translate to many lines of C++ code. They may thus noticeably increase the build time. For this reason, a special flag (YACCLAB_ENABLE_EPDT_ALGOS) to enable/disable such algorithms is provided in the CMake file. By default the flag is OFF.
VII CCL algorithm for images in bitonal (1 bit per pixel) format. When applied to these algorithms, the average tests also consider the time for 1 byte to 1 bit per pixel conversion. On the other hand, when performing average with steps tests conversion time is ignored.
| Algorithm Name | Authors | Year | Acronym | Required Files | 2D/3D | 
|---|---|---|---|---|---|
| Union Find | V. Oliveira, R. Lotufo [18] | 2010 | UF | labeling_oliveira_2010.cu | 2D and 3D | 
| Optimized Label Equivalence | O. Kalentev, A. Rai, S. Kemnitz, R. Schneider [19] | 2011 | OLE | labeling_kalentev_2011.cu | 2D | 
| Block-run-based | P. Chen, H.L. Zhao, C. Tao, H.S. Sang [25] | 2011 | BRB | labeling_chen_2011.cu | 2D | 
| Stava | O. Stava, B. Benes [38] | 2011 | STAVA | labeling_stava_2011.cu | 2D | 
| Rasmusson | A. Rasmusson, T.S. Sørensen, G. Ziegler [37] | 2013 | RASMUSSON | labeling_rasmusson_2013.cu | 2D | 
| Accelerated CCL | F. N. Paravecino, D. Kaeli [34] | 2014 | ACCL | labeling_paravecino_2014.cu | 2D | 
| 8-Directional Label Selection | Y. Soh, H. Ashraf, Y. Hae, I. Kim [36] | 2014 | DLS | labeling_soh_2014_8DLS.cu | 2D | 
| Modified 8-Directional Label Selection | Y. Soh, H. Ashraf, Y. Hae, I. Kim [36] | 2014 | M8DLS | labeling_soh_2014_M8DLS.cu | 2D | 
| Line-based Union-Find | K. Yonehara, K. Aizawa [39] | 2015 | LBUF | labeling_yonehara_2015.cu | 2D | 
| Block Equivalence | S. Zavalishin, I. Safonov, Y. Bekhtin, I. Kurilin [20] | 2016 | BE | labeling_zavalishin_2016.cu | 2D and 3D | 
| Distanceless Label Propagation | L. Cabaret, L. Lacassagne, D. Etiemble [21] | 2017 | DLP | labeling_cabaret_2017.cu | 2D | 
| Komura Equivalence (8-conn) | S. Allegretti, F. Bolelli, M. Cancilla, C. Grana [22] | 2018 | KE | labeling_allegretti_2018.cu | 2D | 
| Hardware Accelerated 4-connected | A. Hennequin, L. Lacassagne, L. Cabaret, Q. Meunier [35] | 2018 | HA4 | labeling_hennequin_2018_HA4.cu | 2D | 
| Hardware Accelerated 8-connected | A. Hennequin, L. Lacassagne, L. Cabaret, Q. Meunier [35] | 2018 | HA8 | labeling_hennequin_2018_HA8.cu | 2D | 
| CUDA SAUF | S. Allegretti, F. Bolelli, M. Cancilla, C. Grana [29] | 2019 | C-SAUF | labeling_allegretti_2019_SAUF.cu, labeling_wu_2009_tree.inc | 2D | 
| CUDA BBDT | S. Allegretti, F. Bolelli, M. Cancilla, C. Grana [29] | 2019 | C-BBDT | labeling_allegretti_2019_BBDT.cu, labeling_grana_2010_tree.inc | 2D | 
| CUDA DRAG | S. Allegretti, F. Bolelli, M. Cancilla, C. Grana [29] | 2019 | C-DRAG | labeling_allegretti_2019_DRAG.cu | 2D | 
| Block-based Union Find | S. Allegretti, F. Bolelli, C. Grana [24] | 2019 | BUF | labeling_allegretti_2019_BUF.cu | 2D and 3D | 
| Block-based Komura Equivalence | S. Allegretti, F. Bolelli, C. Grana [24] | 2019 | BKE | labeling_allegretti_2019_BKE.cu | 2D and 3D | 
#include "labels_solver.h"
#include "labeling_algorithms.h"
#include "labeling_grana_2010.h" // To include the algorithm code (BBDT in this example)
#include <opencv2/opencv.hpp>
using namespace cv;
int main()
{
    BBDT<UFPC> BBDT_UFPC; // To create an object of the desired algorithm (BBDT in this example)
                          // templated on the labels solving strategy. See the README for the
                          // complete list of the available labels solvers, available algorithms
                          // (N.B. non all the algorithms are templated on the solver) and their
                          // acronyms.
    BBDT_UFPC.img_ = imread("test_image.png", IMREAD_GRAYSCALE); // To load into the CCL object
                                                                 // the BINARY image to be labeled
    threshold(BBDT_UFPC.img_, BBDT_UFPC.img_, 100, 1, THRESH_BINARY); // Just to be sure that the
                                                                      // loaded image is binary
    BBDT_UFPC.PerformLabeling(); // To perform Connected Components Labeling!
    Mat1i output = BBDT_UFPC.img_labels_; // To get the output labeled image  
    unsigned n_labels = BBDT_UFPC.n_labels_; // To get the number of labels found in the input img
    return EXIT_SUCCESS;
}A YAML configuration file placed in the installation folder lets you specify which kinds of tests should be performed, on which datasets and on which algorithms. Four categories of algorithms are supported: 2D CPU, 2D GPU, 3D CPU and 3D GPU. For each of them, the configuration parameters are reported below.
- execute - boolean value which specifies whether the current category of algorithms will be tested:
execute:    true- perform - dictionary which specifies the kind of tests to perform:
perform:
  correctness:        false
  average:            true
  average_with_steps: false
  density:            false
  granularity:        false
  memory:             false
  blocksize:          false - correctness_tests - dictionary indicating the kind of correctness tests to perform:
correctness_tests:
  eight_connectivity_standard:  true
  eight_connectivity_steps:     true
  eight_connectivity_memory:    true
  eight_connectivity_blocksize: true      - tests_number - dictionary which sets the number of runs for each test available:
tests_number:
  average:            10
  average_with_steps: 10
  density:            10
  granularity:        10- algorithms - list of algorithms on which apply the chosen tests:
algorithms:
  - SAUF_RemSP
  - SAUF_TTA
  - BBDT_RemSP
  - BBDT_UFPC
  - CT
  - labeling_NULL- check_datasets, average_datasets, average_ws_datasets, memory_datasets and blocksize_datasets- lists of datasets on which, respectively, correctness, average, average_ws, memory and blocksize tests should be run:
...
average_datasets: ["3dpes", "fingerprints", "hamlet", "medical", "mirflickr", "tobacco800", "xdocs"]
...- blocksize - only for the 2D GPU and 3D GPU categories, this dictionary configures blocksize test parameters. For each axis, a list of three values specifies [<first>, <last>, <step>]:
blocksize:
  x: [2, 64, 2]
  y: [2, 64, 2]
  z: [2, 64, 2]Finally, the following configuration parameters are common to all categories.
- paths - dictionary with both input (datasets) and output (results) paths. It is automatically filled by Cmake during the creation of the project:
paths: {input: "<datasets_path>", output: "<output_results_path>"}- write_n_labels - whether to report the number of connected components in the output files:
write_n_labels: false- color_labels - whether to output a colored version of labeled images during tests:
color_labels: {average: false, density: false}- save_middle_tests - dictionary specifying, separately for every test, whether to save the output of single runs, or only a summary of the whole test:
save_middle_tests: {average: false, average_with_steps: false, density: false, granularity: false}YACCLAB has been designed with extensibility in mind, so that new resources can be easily integrated into the project. A CCL algorithm is coded with a .h header file (placed in the include folder), a .cc source file (placed in the src folder), and optional additional files containing a tree/drag definition (placed in the include folder).
The source file should be as follows:
#include "<header_file_name>.h"
REGISTER_LABELING_WITH_EQUIVALENCES_SOLVERS(<algorithm_name>);
// Replace the above line with "REGISTER_LABELING(<algorithm_name>);" if the algorithm
// is not template on the equivalence solver algorithm.The header file should follows the structure below (see include/labeling_bolelli_2018.h to have a complete example):
// [...]
template <typename LabelsSolver> // Remove this line if the algorithm is not template 
                                 // on the equivalence solver algorithm
class <algorithm_name> : public Labeling2D<Connectivity2D::CONN_8> { // the class must extend one of the labeling
                                                     // classes Labeling2D, Labeling3D, .. that
                                                     // are template on the connectivity type
                                                    
public:
    <algorithm_name>() {}
    // This member function should implement the labeling procedure reading data from the
    // input image "img_" (OpenCV Mat1b) and storing labels into the output one "img_labels_"
    // (OpenCV Mat1i)
    void PerformLabeling()
    {
      // [...]
      LabelsSolver::Alloc(UPPER_BOUND_8_CONNECTIVITY); // Memory allocation of the labels solver
      LabelsSolver::Setup(); // Labels solver initialization
      // [...]
      
      LabelsSolver::GetLabel(<label_id>) // To get label value from its index
      LabelsSolver::NewLabel(); // To create a new label
      LabelsSolver::Flatten(); // To flatten the equivalence solver array
    }
    // This member function should implement the with step version of the labeling procedure.
    // This is required to perform tests with steps.
    void PerformLabelingWithSteps()
    {
      double alloc_timing = Alloc(); // Alloc() should be a member function responsible
                                     // for memory allocation of the required data structures
      perf_.start();
      FirstScan(); // FirsScan should be a member function that implements the 
                   // first scan step of the algorithm (if it has one)
      perf_.stop();
      perf_.store(Step(StepType::FIRST_SCAN), perf_.last());
      perf_.start();
      SecondScan(); // SecondScan should be a member function that implements the 
                    // second scan step of the algorithm (if it has one)
      perf_.stop();
      perf_.store(Step(StepType::SECOND_SCAN), perf_.last());
      // If the algorithm does not have a distinct firs and second scan replace the lines
      // above with the following ones:
      // perf_.start();
      // AllScans(); // AllScans() should be a member function which implements the entire
                     // algorithm but the allocation/deallocation 
      // perf_.stop();
      // perf_.store(Step(StepType::ALL_SCANS), perf_.last());
      perf_.start();
      Dealloc(); // Dealloc() should be a member function responsible for memory
                 // deallocation.
      perf_.stop();
      perf_.store(Step(StepType::ALLOC_DEALLOC), perf_.last() + alloc_timing);
      // [...]
    }
    // This member function should implement the labeling procedure using the OpenCV Mat
    // wrapper (MemMat) implemented by YACCLAB 
    void PerformLabelingMem(std::vector<uint64_t>& accesses){
      // [...]
    }
}When implementing a GPU algorithm only the .cu file is required. The file should be placed in the cuda/src folder. The general structure of a GPU algorithm is the following:
// [...]
// Kernel definitions:
__global__ void <kernel_name_1>(...)
{
  ...
}
__global__ void <kernel_name_2>(...)
{
  ...
}
                                 
class <algorithm_name> : public GpuLabeling2D<Connectivity2D::CONN_8> { // the class must extend one of the labeling
                                                     // classes GpuLabeling2D, GpuLabeling3D, .. that
                                                     // are template on the connectivity type
                                                    
public:
    <algorithm_name>() {}
    // This member function should implement the labeling procedure reading data from the
    // input image "d_img_" (OpenCV cuda::GpuMat) and storing labels into the output one "d_img_labels_"
    // (OpenCV cuda::GpuMat)
    void PerformLabeling()
    {
      // Create the output image
      d_img_labels_.create(d_img_.size(), CV_32SC1);
      // [...]
      // Call necessary kernels
      <kernel_name_1> <<<...>>> (...);
      <kernel_name_2> <<<...>>> (...);
      // [...]
      
      // Wait for the end of the last kernel
      cudaDeviceSynchronize();
    }
    // This member function should implement the with step version of the labeling procedure.
    // This is required to perform tests with steps.
    void PerformLabelingWithSteps()
    {
      double alloc_timing = Alloc(); // Alloc() should be a member function responsible
                                     // for memory allocation of the required data structures
      perf_.start();
      FirstScan(); // FirsScan should be a member function that implements the 
                   // first scan step of the algorithm (if it has one)
      perf_.stop();
      perf_.store(Step(StepType::FIRST_SCAN), perf_.last());
      perf_.start();
      SecondScan(); // SecondScan should be a member function that implements the 
                    // second scan step of the algorithm (if it has one)
      perf_.stop();
      perf_.store(Step(StepType::SECOND_SCAN), perf_.last());
      // If the algorithm does not have a distinct first and second scan replace the lines
      // above with the following ones:
      // perf_.start();
      // AllScans(); // AllScans() should be a member function which implements the entire
                     // algorithm but the allocation/deallocation 
      // perf_.stop();
      // perf_.store(Step(StepType::ALL_SCANS), perf_.last());
      perf_.start();
      Dealloc(); // Dealloc() should be a member function responsible for memory
                 // deallocation.
      perf_.stop();
      perf_.store(Step(StepType::ALLOC_DEALLOC), perf_.last() + alloc_timing);
      // [...]
    }
    void PerformLabelingBlocksize(int x, int y, int z)
    {
      // Create the output image
      d_img_labels_.create(d_img_.size(), CV_32SC1);
      // [...]
      // Call necessary kernels through a macro that measures times separately
      BLOCKSIZE_KERNEL(<kernel_name_1>, <grid_size>, <block_size>, <dynamic_shared_mem>, <arguments>...);
      BLOCKSIZE_KERNEL(<kernel_name_2>, <grid_size>, <block_size>, <dynamic_shared_mem>, <arguments>...);
      // [...]
    }
}
REGISTER_LABELING(<algorithm_name>);
// Only necessary for blocksize test
REGISTER_KERNELS(<algorithm_name>, <kernel_name_1>, <kernel_name_2>, ...);
Once an algorithm has been added to YACCLAB, it is ready to be tested and compared to the others. Don't forget to update the configuration file! We look at YACCLAB as a growing effort towards better reproducibility of CCL algorithms, so implementations of new and existing labeling methods are very welcome.
The YACCLAB dataset includes both synthetic and real images and it is suitable for a wide range of applications, ranging from document processing to surveillance, and features a significant variability in terms of resolution, image density, variance of density, and number of components. All images are provided in 1 bit per pixel PNG format, with 0 (black) being background and 1 (white) being foreground. The dataset will be automatically downloaded by CMake during the installation process as described in the installation paragraph.
- 
MIRflickr [10]: Otsu-binarized version of the MIRflickr dataset, publicly available under a Creative Commons License. It contains 25,000 standard resolution images taken from Flickr. These images have an average resolution of 0.17 megapixels, there are few connected components (495 on average) and are generally composed of not too complex patterns, so the labeling is quite easy and fast. 
- 
Hamlet: A set of 104 images scanned from a version of the Hamlet found on Project Gutenberg (http://www.gutenberg.org). Images have an average amount of 2.71 million of pixels to analyze and 1447 components to label, with an average foreground density of 0.0789. 
- 
A set of 1290 document images. It is a realistic database for document image analysis research as these documents were collected and scanned using a wide variety of equipment over time. Resolutions of documents in Tobacco800 vary significantly from 150 to 300 DPI and the dimensions of images range from 1200 by 1600 to 2500 by 3200 pixels. Since CCL is one of the initial preprocessing steps in most layout analysis or OCR algorithms, hamlet and tobacco800 allow to test the algorithm performance in such scenarios. 
- 
3DPeS [14]: It comes from 3DPeS (3D People Surveillance Dataset), a surveillance dataset designed mainly for people re-identification in multi camera systems with non-overlapped fields of view. 3DPeS can be also exploited to test many other tasks, such as people detection, tracking, action analysis and trajectory analysis. The background models for all cameras are provided, so a very basic technique of motion segmentation has been applied to generate the foreground binary masks, i.e., background subtraction and fixed thresholding. The analysis of the foreground masks to remove small connected components and for nearest neighbor matching is a common application for CCL. 
- 
Medical [15]: This dataset is composed by histological images and allow us to cover this fundamental medical field. The process used for nuclei segmentation and binarization is described in [15]. The resulting dataset is a collection of 343 binary histological images with an average amount of 1.21 million of pixels to analyze and 484 components to label. 
- 
Fingerprints [16]: This dataset counts 960 fingerprint images collected by using low-cost optical sensors or synthetically generated. These images were taken from the three Verification Competitions FCV2000, FCV2002 and FCV2004. In order to fit CCL application, fingerprints have been binarized using an adaptive threshold and then negated in order to have foreground pixel with value 255. Most of the original images have a resolution of 500 DPI and their dimensions range from 240 by 320 up to 640 by 480 pixels. 
| Samples of the YACCLAB 2D (real) datasets. From left to right: 3DPeS, Fingerprints, Medical, MIRflickr, Tobacco800, XDOCS, Hamlet. | 
- Synthetic Images:
- Classical [4]:A set of synthetic random noise images who contain black and white random noise with 9 different foreground densities (10% up to 90%), from a low resolution of 32x32 pixels to a maximum resolution of 4096x4096 pixels, allowing to test the scalability and the effectiveness of different approaches when the number of labels gets high. For every combination of size and density, 10 images are provided for a total of 720 images. The resulting subset allows to evaluate performance both in terms of scalability on the number of pixels and on the number of labels (density). 
- Granularity [5] :This dataset allows to test algorithms varying not only the pixels density but also their granularity g (i.e., dimension of minimum foreground block), underlying the behaviour of different proposals when the number of provisional labels changes. All the images have a resolution of 2048x2048 and are generated with the Mersenne Twister MT19937 random number generator implemented in the C++ standard and starting with a "seed" equal to zero. Density of the images ranges from 0% to 100% with step of 1% and for every density value 16 images with pixels blocks of gxg with g ∈ [1,16] are generated. Moreover, the procedure has been repeated 10 times for every couple of density-granularity for a total of 16160 images. 
 
- Classical [4]:
| Samples of the YACCLAB 2D granularity dataset: reported images have a foreground density of 30% and, from left to right, granularities are 1, 2, 4, 6, 8, 12, 14, 16. | 
- 
OASIS [27]: This is a dataset of medical MRI data taken from the Open Access Series of Imaging Studies (OASIS) project. It consists of 373 volumes of 256 × 256 × 128 pixels, binarized with the Otsu threshold. 
- 
Mitochondria [28]: It is the Electron Microscopy Dataset, which contains binary sections taken from the CA1 hippocampus for a total of three volumes composed by 165 slices with a resolution of 1024 × 768 pixels. 
- 
Hilbert [24]: This dataset contains six volumes of 128 × 128 × 128 pixels, filled with the 3D Hilbert curve obtained at different iterations (1 to 6) of the construction method. The Hilbert curve is a fractal space-filling curve that representsa challenging test case for the labeling algorithms. 
| Samples of the YACCLAB 3D datasets. From left to right we have the Hilbert space-filling curve, the OASIS dataset and Mitochondria medical imaging data. | 
- Granularity [24]: It contains 3D synthetic images generated as described for the 2D version. In this case, images have a resolution of 256 x 256 x 256 pixels and only three different images for every couple of density-granularity have been generated. 
| Samples of the YACCLAB 3D granularity dataset: reported images have a foreground density of 2% and, from left to right, granularities are 4, 8, 16. | 
- 
Average run-time tests: execute an algorithm on every image of a dataset. The process can be repeated more times in a single test, to get the minimum execution time for each image: this allows to get more reproducible results and overlook delays produced by other running processes. It is also possible to compare the execution speed of different algorithms on the same dataset: in this case, selected algorithms (see Configuration File for more details) are executed sequentially on every image of the dataset. Results are presented in three different formats: a plain text file, histogram charts (.pdf/.ps), either in color or in gray-scale, and a LaTeX table, which can be directly included in research papers. 
- 
Average run-time tests with steps: evaluates the performance of an algorithm separating the allocation/deallocation time from the time required to compute labeling. Moreover, if an algorithm employs multiple scans to produce the correct output labels, YACCLAB will store the time of every scan and will display them separately. To understand how YACCLAB computes the memory allocation time for an algorithm on a reference image, it is important to underline the subtleties involved in the allocation process. Indeed, all modern operating systems (not real-time, nor embedded ones, but certainly Windows and Unix) handle virtual memory exploiting a demand paging technique, i.e demand paging with no pre-paging for most of Unix OS and cluster demand paging for Windows OS. This means that a disk page is copied into physical memory only when it is accessed by a process the first time, and not when the allocation function is called. Therefore, it is not possible to calculate the exact allocation time required by an algorithm, which computes CCL on a reference image, but its upper bound can be estimated using the following approach: - forcing the allocation of the entire memory by reserving it (malloc), filling it with zeros (memset), and tracing the time;
- calculating the time required by the assignment operation (memset), and subtracting it from the one obtained at the previous step;
- repeating the previous points for all data structures needed by an algorithm and summing times together.
 This will produce an upper bound of the allocation time because caches may reduce the second assignment operation, increasing the estimated allocation time. Moreover, in real cases, CCL algorithms may reserve more memory than they really need, but the demand paging, differently from our measuring system, will allocate only the accessed pages. 
- 
Density and size tests: check the performance of different CCL algorithms when they are executed on images with varying foreground density and size. To this aim, a list of algorithms selected by the user is run sequentially on every image of the test_random dataset. As for run-time tests, it is possible to repeat this test for more than one run. The output is presented as both plain text and charts(.pdf/.ps). For a density test, the mean execution time of each algorithm is reported for densities ranging from 10% up to 90%, while for a size test the same is reported for resolutions ranging from 32 x 32 up to 4096 x 4096. 
- 
Memory tests: are useful to understand the reason for the good performances of an algorithm or in general to explain its behavior. Memory tests compute the average number of accesses to the label image (i.e the image used to store the provisional and then the final labels for the connected components), the average number of accesses to the binary image to be labeled, and, finally, the average number of accesses to data structures used to solve the equivalences between label classes. Moreover, if an algorithm requires extra data, memory tests summarize them as ``other'' accesses and return the average. Furthermore, all average contributions of an algorithm and dataset are summed together in order to show the total amount of memory accesses. Since counting the number of memory accesses imposes additional computations, functions implementing memory access tests are different from those implementing run-time and density tests, to keep run-time tests as objective as possible. 
- 
Granularity tests: evaluates an algorithm varying density (from 1% to 100%, using a 1% step) and pixels granularity, but not images resolution. The output results display the average execution time over images with the same density and granularity. 
- Blocksize tests: this test, which only makes sense for CUDA algorithms, is aimed at finding the best block size for each kernel with grid search parameter optimization. The range of values for each block axis can be specified in the configuration file. Given a set of CUDA algorithms, the blocksize test reports execution times of each kernel on one or multiple datasets, repeating the measurement for every different block size. Results are presented in a csv file. For every kernel, dataset and block size, the total execution time in ms is reported. 
| Fingerprints | XDOCS | 
Thanks goes to these wonderful people (emoji key):
| Federico Bolelli 💻 📆 🚧 🚇 🤔 | Stefano Allegretti 💻 🚧 🐛 🤔 🚇 | Costantino Grana 💻 📆 🤔 🚇 | Michele Cancilla 💻 📦 🚧 | Lorenzo Baraldi 💻 📦 | Maximilian Söchting 💻 | 
| patrickhwood 🐛 | WalnutVision 🐛 | 
This project follows the all-contributors specification. Contributions of any kind welcome.
| [1] | F. Chang, C.-J. Chen, and C.-J. Lu, “A linear-time component-labeling algorithm using contour tracing technique,” Computer Vision and Image Understanding, vol. 93, no. 2, pp. 206–220, 2004. | 
| [2] | W.-Y. Chang, C.-C. Chiu, and J.-H. Yang, “Block-based connected-component labeling algorithm using binary decision trees,” Sensors, vol. 15, no. 9, pp. 23 763–23 787, 2015. | 
| [3] | L. Di Stefano and A. Bulgarelli, “A Simple and Efficient Connected Components Labeling Algorithm,” in International Conference on Image Analysis and Processing. IEEE, 1999, pp. 322–327. | 
| [4] | C. Grana, D. Borghesani, and R. Cucchiara, “Optimized Block-based Connected Components Labeling with Decision Trees,” IEEE Transac-tions on Image Processing, vol. 19, no. 6, pp. 1596–1609, 2010. | 
| [5] | L. Lacassagne and B. Zavidovique, “Light speed labeling: efficient connected component labeling on risc architectures,” Journal of Real-Time Image Processing, vol. 6, no. 2, pp. 117–135, 2011. | 
| [6] | K. Wu, E. Otoo, and K. Suzuki, "Optimizing two-pass connected-component labeling algorithms,” Pattern Analysis and Applications," vol. 12, no. 2, pp. 117–135, 2009. | 
| [7] | L. He, X. Zhao, Y. Chao, and K. Suzuki, "Configuration-Transition-Based Connected-Component Labeling", IEEE Transactions on Image Processing, vol. 23, no. 2, pp. 943–951, 2014. | 
| [8] | H. Zhao, Y. Fan, T. Zhang, and H. Sang, "Stripe-based connected components labelling," Electronics letters, vol. 46, no. 21, pp. 1434–1436, 2010. | 
| [9] | C. Grana, L. Baraldi, and F. Bolelli, "Optimized Connected Components Labeling with Pixel Prediction," in Advanced Concepts for Intelligent Vision Systems, 2016, pp. 431-440. | 
| [10] | M. J. Huiskes and M. S. Lew, “The MIR Flickr Retrieval Evaluation,” in MIR ’08: Proceedings of the 2008 ACM International Conference on Multimedia Information Retrieval. New York, NY, USA: ACM, 2008. | 
| [11] | G. Agam, S. Argamon, O. Frieder, D. Grossman, and D. Lewis, “The Complex Document Image Processing (CDIP) Test Collection Project,” Illinois Institute of Technology, 2006. | 
| [12] | D. Lewis, G. Agam, S. Argamon, O. Frieder, D. Grossman, and J. Heard, “Building a test collection for complex document information processing,” in Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 2006, pp. 665–666. | 
| [13] | F. Bolelli, M. Cancilla, L. Baraldi, C. Grana, "Towards Reliable Experiments on the Performance of Connected Components Labeling Algorithms," Journal of Real-Time Image Processing, 2018. | 
| [14] | D. Baltieri, R. Vezzani, and R. Cucchiara, “3DPeS: 3D People Dataset for Surveillance and Forensics,” in Proceedings of the 2011 joint ACM workshop on Human gesture and behavior understanding. ACM, 2011, pp. 59–64. | 
| [15] | F. Dong, H. Irshad, E.-Y. Oh, M. F. Lerwill, E. F. Brachtel, N. C. Jones, N. W. Knoblauch, L. Montaser-Kouhsari, N. B. Johnson, L. K. Rao et al., “Computational Pathology to Discriminate Benign from Malignant Intraductal Proliferations of the Breast,” PloS one, vol. 9, no. 12, p. e114885, 2014. | 
| [16] | D. Maltoni, D. Maio, A. Jain, and S. Prabhakar, "Handbook of fingerprint recognition", Springer Science & Business Media, 2009. | 
| [17] | C.Grana, F.Bolelli, L.Baraldi, and R.Vezzani, "YACCLAB - Yet Another Connected Components Labeling Benchmark," Proceedings of the 23rd International Conference on Pattern Recognition, Cancun, Mexico, 4-8 Dec 2016, 2016. | 
| [18] | V. Oliveira and R. Lotufo, "A study on connected components labeling algorithms using GPUs," in SIBGRAPI. vol. 3, p. 4, 2010. | 
| [19] | O. Kalentev, A. Rai, S. Kemnitz, R. Schneider," Connected component labeling on a 2D grid using CUDA," in Journal of Parallel and Distributed Computing 71(4), 615–620, 2011. | 
| [20] | S. Zavalishin, I. Safonov, Y. Bekhtin, I. Kurilin, "Block Equivalence Algorithm for Labeling 2D and 3D Images on GPU," in Electronic Imaging 2016(2), 1–7, 2016. | 
| [21] | L. Cabaret, L. Lacassagne, D. Etiemble, "Distanceless Label Propagation: an Efficient Direct Connected Component Labeling Algorithm for GPUs," in Seventh International Conference on Image Processing Theory, Tools and Applications, IPTA, 2017. | 
| [22] | S. Allegretti, F. Bolelli, M. Cancilla, C. Grana, "Optimizing GPU-Based Connected Components Labeling Algorithms," in Third IEEE International Conference on Image Processing, Applications and Systems, IPAS, 2018. | 
| [23] | F. Bolelli, L. Baraldi, M. Cancilla, C. Grana, "Connected Components Labeling on DRAGs," in International Conference on Pattern Recognition, 2018, pp. 121-126. | 
| [24] | S. Allegretti, F. Bolelli, C. Grana, "Optimized Block-Based Algorithms to Label Connected Components on GPUs," in IEEE Transactions on Parallel and Distributed Systems, 2019. | 
| [25] | P. Chen, H. Zhao, C. Tao, H. Sang, "Block-run-based connected component labelling algorithm for gpgpu using shared memory." Electronics Letters, 2011 | 
| [26] | F. Bolelli, S. Allegretti, L. Baraldi, and C. Grana, "Spaghetti Labeling: Directed Acyclic Graphs for Block-Based Bonnected Components Labeling," IEEE Transactions on Image Processing, vol. 29, no. 1, pp. 1999-2012, 2019. | 
| [27] | D. S. Marcus, A. F. Fotenos, J. G. Csernansky, J. C. Morris, R. L. Buckner, “Open Access Series of Imaging Studies (OASIS): Longitudinal MRI Data in Nondemented and Demented OlderAdults,” J. Cognitive Neurosci., vol. 22, no. 12, pp. 2677–2684, 2010. | 
| [28] | A. Lucchi, Y. Li, and P. Fua, “Learning for Structured Prediction Using Approximate Subgradient Descent with Working Sets,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2013, pp. 1987–1994. | 
| [29] | S. Allegretti, F, Bolelli, M. Cancilla, F. Pollastri, L. Canalini, C. Grana, "How does Connected Components Labeling with Decision Trees perform on GPUs?," In 18th International Conference on Computer Analysis of Images and Patterns, 2019. | 
| [30] | L. He, Y. Chao, K. Suzuki. "A run-based two-scan labeling algorithm." IEEE Transactions on Image Processing, 2008. | 
| [31] | M. Söchting, S. Allegretti, F. Bolelli, C. Grana. "A Heuristic-Based Decision Tree for Connected Components Labeling of 3D Volumes." 25th International Conference on Pattern Recognition, 2021 | 
| [32] | W. Lee, F. Bolelli, S. Allegretti, C. Grana. "Fast Run-Based Connected Components Labeling for Bitonal Images." 5th International Conference on Imaging, Vision & Pattern Recognition, 2021 | 
| [33] | F. Bolelli, S. Allegretti, C. Grana. "One DAG to Rule Them All." IEEE Transactions on Pattern Analisys and Machine Intelligence, 2021 | 
| [34] | F. N. Paravecino, D. Kaeli, "Accelerated Connected Component Labeling Using CUDA Framework." International Conference on Computer Vision and Graphics, ICCVG, 2014 | 
| [35] | A. Hennequin, L. Lacassagne, L. Cabaret, Q. Meunier, "A new Direct Connected Component Labeling and Analysis Algorithms for GPUs", DASIP, 2018 | 
| [36] | Y. So, H. Ashraf, Y. Hae, I. Kim, "Fast Parallel Connected Component Labeling Algorithm Using CUDA Based On 8-Directional Label Selection", International Journal of Latest Research in Science and Technology, 2014 | 
| [37] | A. Rasmusson, T.S. Sørensen, G. Ziegler, "Connected Components Labeling on the GPU with Generalization to Voronoi Diagrams and Signed Distance Fields", International Symposium on Visual Computing, 2013 | 
| [38] | O. Stava, B. Benes, "Connected Components Labeling in CUDA", GPU Computing Gems, 2011 | 
| [39] | K. Yonehara, K. Aizawa, "A Line-Based Connected Component Labeling Algorithm Using GPUs", Third International Symposium on Computing and Networking, 2015 |