This repo contains the C++ implementation for CounterSnake, which is a hierarchical compression framework that reduces memory consumption of sketch counters. We integrate CounterSnake with nine sketches, and also implement several SOTA frameworks for comparison. We also design a unified testing framework to run these frameworks and sketches on different datasets, which automatically gather evaluation results of their performance.
The directories looks like this graph:
CounterSnake
├── data: datasets put here
├── doc
│ ├── image
│ └── tutorial
├── exp: configuration files used in our experiments
│ ├── old
│ └── vldb
├── src
│ ├── common: header files for our testing framework
│ ├── driver: automatically generated driver codes
│ ├── impl: implementation of some methods in commom
│ ├── layer_counter: source codes of CounterSnake and compared frameworks
│ ├── pcap_parser: scripts that convert datasets to a unified format
│ ├── sketch: implementation of sketches used in our experiments
│ └── sketch_test: codes that run sketch instances and gather results
├── test: unit tests to verify the implementation correctness
└── third_party: dependencies
We require the following dependencies to build this project on Linux or Mac.
| Dependency | Installation (on Linux) | Installation (on Mac) |
|---|---|---|
| Cmake (>=3.20) | sudo apt-get install cmake -y |
brew install cmake |
| boost (>=1.75) | sudo apt-get install libboost-all-dev -y |
brew install boost |
| libpcap (>=1.9) | sudo apt-get install libpcap-dev -y |
brew install libpcap |
| PcapPlusPlus (>=21.05) | Method 1. Build from source using its default configuration and installation directory Method 2. brew install pcapplusplus |
brew install pcapplusplus |
This repo also depends on three third-party libraries, namely eigen, fmt, and tomlplusplus. They are maintained as git submodules. Hence, don't forget to clone them with git submodule update --init. Besides, make sure your C++ compiler supports C++17 and the python interpreter version is at least 3.7 to enable essential library features.
The following shell script builds the testing framework. As long as the dependencies are correctly installed, the script should run successfully.
mkdir build && cd build
cmake ..
makeAfter executing the script, we will see a number of executable files in the build/ directory, including:
- vanilla sketches without optimization: CM, DT, ES, FR, HP, MV, PR, SL (no need to run them directly in our experiments)
- compression frameworks that are integrated with the sketches: BitMatcher, Bitsense, Diamond, Dway, DwayNeg, Pyramid, Sac, Stingy
- dataset parsers: parser, parser-kosarak, synthesizer
Note that the executable files of our CounterSnake framework is named as Dway and DwayNeg, the latter of which support negative counters with the sign-bit encoding technique.
Before running the executable files, we should prepare the stream dataset and covert them into a format that can be accepted by the testing framework.
Currently we provide scripts for three datasets, namely Caida, Kosarak and Zipf. The first two can be download at the corresponding websites, and the last one is generated with our scripts.
For Caida trace:
- You should get a bunch of
.pcapfiles, whose name start with 'equinix-nyc.dira.20190117-130000' or something similar. Choose one of them and put it intodata/. - We will trunctate it and only use the the first 1.0M disctinct items. To do this, modify
inputentry insrc/pcap_parser/parser.tomlas your pcap file name, and then go tobuilddirectory and run./parser -c ../src/pcap_parser/parser.toml. - After this, you should see a new file
data-1000K.binindata/, and this is the parsed Caida dataset.
For Kosarak trace:
- You should get a file
kosarak.dat. - Put it into
data/and run./parser_kosarak -i ../data/kosarak.dat -o ../data/kosarak.binto parse it. - After this, you should see a new file
kosarak.binindata/, and this is the parsed Caida dataset.
For Zipf datasets:
- Go to
src/pcap_parser/parser.tomland change theskewthere. You should also adjust theflow_number(number of disctinct items) to let the total number of items be roughly 25M. - Run
./synthesizer -c ../src/pcap_parser/parser.tomlto generate the file. - You should properly set the
output_filename so that we end up with the following files:zipf-000.bin,zipf-025.bin,zipf-050.bin,zipf-075.bin,zipf-100.bin,zipf-125.bin,zipf-150.bin,zipf-175.bin,zipf-200.bin, which corresponds to skewness$0.0\sim 2.0$ .
The executable files of all the compared frameworks can be run by exec_file -c config_file. Therefore, we must provide configuration files to each of the frameworks. It's tedious work to set all these frameworks' diverse parameters, so we have prepared these configurations files in exp/vldb/.
For example, to run CounterSnake on frequency estimation tasks (Exp#1) with 1MB memory budget, try ./Dway -c ../exp/vldb/exp1-2-4-5-freq/config_1M/dway_1M.toml in build/ directory.
We also provide shell scripts to run a bunch of related experiments at a time. These scripts are in the subdirectories of exp/vldb/.
For example, to test CounterSnake on frequency estimation tasks on all Zipf datasets (Exp#3), try ../exp/vldb/exp3-zipf-freq/run.bash ./Dway dway in build/ directory. The first parameter is the path of the executable file, and the second parameter is the prefix of the configuration files (go to ../exp/vldb/exp3-zipf-freq/config_000/ and you'll see all the possible prefixes).
Unfortunately, we record all results manually so there is no automatic scripts that can gather the results and turn them into nice figures presented in the paper. However, the output of the executable files should be clear to read, and the related metrics can be extracted from there.