Indexing intervals is a fundamental problem, finding a wide range of applications, most notably in temporal and uncertain databases. We propose HINT, a novel and efficient in-memory index for range selection queries over interval collections. HINT applies a hierarchical partitioning approach, which assigns each interval to at most two partitions per level and has controlled space requirements. We reduce the information stored at each partition to the absolutely necessary by dividing the intervals in it, based on whether they begin inside or before the partition boundaries. In addition, our index includes storage optimization techniques for the effective handling of data sparsity and skewness. We show how HINT can be used to efficiently process queries based on Allen’s relationships. Experiments on real and synthetic interval sets of different characteristics show that HINT is typically one order of magnitude faster than existing interval indexing methods.
Source code from the following publications:
-
George Christodoulou, Panagiotis Bouros and Nikos Mamoulis, HINT: A Hierarchical Interval Index for Allen Relationships, https://doi.org/10.1007/s00778-023-00798-w, VLDB Journal 33(1), pp. 73-100 (2024)
-
George Christodoulou, Panagiotis Bouros and Nikos Mamoulis, HINT: A Hierarchical Index for Intervals in Main Memory, https://doi.org/10.1145/3514221.3517873, Proceedings of the 2022 ACM International Conference on Management of Data (ACM SIGMOD), pp. 1257-1270 (2022)
- Panagiotis Bouros
- George Christodoulou
- Nikos Mamoulis
- g++/gcc
- Boost Library
Directory samples includes the BOOKS dataset used in the experiments and a query file containing 20k queries
- AARHUS-BOOKS_2013.dat
- AARHUS-BOOKS_2013_20k.qry
Compile using make all or make <option> where <option> can be one of the following:
- lscan
- 1dgrid
- hint
- hint_m
| Parameter | Description | Comment |
|---|---|---|
| -? or -h | display help message | |
| -v | activate verbose mode; print the trace for every query; otherwise only the final report is displayed | |
| -q | set predicate type: (1) basic relationships from Allen's algebra, "EQUALS", "STARTS", "STARTED", "FINISHES", "FINISHED", "MEETS", "MET", "OVERLAPS", "OVERLAPPED", "CONTAINS", "CONTAINED", "BEFORE" "AFTER" (2) generalized overlaps, "gOVERLAPS", from ACM SIGMOD'22 publication |
basic predicates work only for the linear scan method, 1D-grid and for the most advanced HINTm variants, with SUBS+SORT+SS+CM or ALL optimizations; rest of the methods return 0 |
| -r | set the number of runs per query; by default 1 | in our experimental analysis set to 10 |
The code supports two types of workload:
- Counting the qualifying records, or
- XOR'ing between their ids
You can switch between the two by appropriately setting the WORKLOAD_COUNT flag in def_global.h; remember to use make clean after resetting the flag.
-
main_lscan.cpp
-
containers/relation.h
-
containers/relation.cpp
-
$ ./query_lscan.exec -q gOVERLAPS -r 10 samples/AARHUS-BOOKS_2013.dat samples/AARHUS-BOOKS_2013_20k.qry
$ ./query_lscan.exec -q gOVERLAPS -v samples/AARHUS-BOOKS_2013.dat samples/AARHUS-BOOKS_2013_20k.qry
- main_1dgrid.cpp
- containers/relation.h
- containers/relation.cpp
- indices/1dgrid.h
- indices/1dgrid.cpp
| Extra parameter | Description | Comment |
|---|---|---|
| -p | set the number of partitions | 500 for BOOKS in the experiments |
-
$ ./query_1dgrid.exec -p 500 -q gOVERLAPS -r 10 samples/AARHUS-BOOKS_2013.dat samples/AARHUS-BOOKS_2013_20k.qry
$ ./query_1dgrid.exec -p 500 -q gOVERLAPS -v samples/AARHUS-BOOKS_2013.dat samples/AARHUS-BOOKS_2013_20k.qry
- main_hint.cpp
- containers/relation.h
- containers/relation.cpp
- indices/hierarchicalindex.h
- indices/hierarchicalindex.cpp
- indices/hint.h
- indices/hint.cpp
| Extra parameter | Description | Comment |
|---|---|---|
| -o | "SS" to activate the skewness & sparsity optimization |
-
$ ./query_hint.exec -q gOVERLAPS -r 10 samples/AARHUS-BOOKS_2013.dat samples/AARHUS-BOOKS_2013_20k.qry
$ ./query_hint.exec -o SS -q gOVERLAPS -v samples/AARHUS-BOOKS_2013.dat samples/AARHUS-BOOKS_2013_20k.qry
- main_hint_m.cpp
- containers/relation.h
- containers/relation.cpp
- containers/offsets.h
- containers/offsets.cpp
- containers/offsets_templates.h
- containers/offsets_templates.cpp
- indices/hierarchicalindex.h
- indices/hierarchicalindex.cpp
- indices/hint_m.h
- indices/hint_m.cpp
- indices/hint_m_subs+sort.cpp
- indices/hint_m_subs+sopt.cpp
- indices/hint_m_subs+sort+sopt.cpp
- indices/hint_m_subs+sort+sopt+ss.cpp
- indices/hint_m_subs+sort+cm.cpp
- indices/hint_m_subs+sort+sopt+cm.cpp
- indices/hint_m_subs+sort+ss+cm.cpp
- indices/hint_m_all.cpp
| Extra parameter | Description | Comment |
|---|---|---|
| -m | set the number of bits; if not set, a value will be automattically determined using the cost model | 10 for BOOKS in the experiments |
| -o | set optimizations to be used: "SUBS+SORT" or "SUBS+SOPT" or "SUBS+SORT+SOPT" or "SUBS+SORT+SOPT+SS" or "SUBS+SORT+CM" or "SUBS+SORT+SOPT+CM" or "SUBS+SORT+SS+CM" or "ALL" | omit parameter for base HINTm; "CM" for cache misses optimization |
| -t | evaluate query traversing the hierarchy in a top-down fashion; by default the bottom-up strategy is used | currently supported only by base HINTm |
-
$ ./query_hint_m.exec -m 10 -t -q gOVERLAPS -r 10 samples/AARHUS-BOOKS_2013.dat samples/AARHUS-BOOKS_2013_20k.qry
$ ./query_hint_m.exec -m 10 -q gOVERLAPS -r 10 samples/AARHUS-BOOKS_2013.dat samples/AARHUS-BOOKS_2013_20k.qry
$ ./query_hint_m.exec -m 10 -o subs+sort -q gOVERLAPS -r 10 samples/AARHUS-BOOKS_2013.dat samples/AARHUS-BOOKS_2013_20k.qry
$ ./query_hint_m.exec -m 10 -o subs+sopt -q gOVERLAPS -r 10 samples/AARHUS-BOOKS_2013.dat samples/AARHUS-BOOKS_2013_20k.qry
$ ./query_hint_m.exec -m 10 -o subs+sort+sopt -q gOVERLAPS -r 10 samples/AARHUS-BOOKS_2013.dat samples/AARHUS-BOOKS_2013_20k.qry
$ ./query_hint_m.exec -m 10 -o subs+sort+sopt+ss -q gOVERLAPS -r 10 samples/AARHUS-BOOKS_2013.dat samples/AARHUS-BOOKS_2013_20k.qry
$ ./query_hint_m.exec -m 10 -o subs+sort+cm -q gOVERLAPS -r 10 samples/AARHUS-BOOKS_2013.dat samples/AARHUS-BOOKS_2013_20k.qry
$ ./query_hint_m.exec -m 10 -o subs+sort+sopt+cm -q gOVERLAPS -r 10 samples/AARHUS-BOOKS_2013.dat samples/AARHUS-BOOKS_2013_20k.qry
$ ./query_hint_m.exec -m 10 -o subs+sort+ss+cm -q gOVERLAPS -r 10 samples/AARHUS-BOOKS_2013.dat samples/AARHUS-BOOKS_2013_20k.qry
$ ./query_hint_m.exec -m 10 -o all -q gOVERLAPS -r 10 samples/AARHUS-BOOKS_2013.dat samples/AARHUS-BOOKS_2013_20k.qry
The following are missing from the current version of the code:
- HINT with SS optimization answering the basic predicates from Allen's algebra
- Updates