HINT: A Hierarchical Index for Intervals in Main Memory

Indexing intervals is a fundamental problem, finding a wide range of applications, most notably in temporal and uncertain databases. We propose HINT, a novel and efficient in-memory index for range selection queries over interval collections. HINT applies a hierarchical partitioning approach, which assigns each interval to at most two partitions per level and has controlled space requirements. We reduce the information stored at each partition to the absolutely necessary by dividing the intervals in it, based on whether they begin inside or before the partition boundaries. In addition, our index includes storage optimization techniques for the effective handling of data sparsity and skewness. We show how HINT can be used to efficiently process queries based on Allen’s relationships. Experiments on real and synthetic interval sets of different characteristics show that HINT is typically one order of magnitude faster than existing interval indexing methods.

Source code from the following publications:

George Christodoulou, Panagiotis Bouros and Nikos Mamoulis, HINT: A Hierarchical Interval Index for Allen Relationships, https://doi.org/10.1007/s00778-023-00798-w, VLDB Journal 33(1), pp. 73-100 (2024)
George Christodoulou, Panagiotis Bouros and Nikos Mamoulis, HINT: A Hierarchical Index for Intervals in Main Memory, https://doi.org/10.1145/3514221.3517873, Proceedings of the 2022 ACM International Conference on Management of Data (ACM SIGMOD), pp. 1257-1270 (2022)

Contributors

Panagiotis Bouros
George Christodoulou
Nikos Mamoulis

Dependencies

g++/gcc
Boost Library

Data

Directory samples includes the BOOKS dataset used in the experiments and a query file containing 20k queries

AARHUS-BOOKS_2013.dat
AARHUS-BOOKS_2013_20k.qry

Compile

Compile using make all or make <option> where <option> can be one of the following:

lscan
1dgrid
hint
hint_m

Shared parameters among all methods

Parameter	Description	Comment
-? or -h	display help message
-v	activate verbose mode; print the trace for every query; otherwise only the final report is displayed
-q	set predicate type: (1) basic relationships from Allen's algebra, "EQUALS", "STARTS", "STARTED", "FINISHES", "FINISHED", "MEETS", "MET", "OVERLAPS", "OVERLAPPED", "CONTAINS", "CONTAINED", "BEFORE" "AFTER" (2) generalized overlaps, "gOVERLAPS", from ACM SIGMOD'22 publication	basic predicates work only for the linear scan method, 1D-grid and for the most advanced HINT^m variants, with SUBS+SORT+SS+CM or ALL optimizations; rest of the methods return 0
-r	set the number of runs per query; by default 1	in our experimental analysis set to 10

Workloads

The code supports two types of workload:

Counting the qualifying records, or
XOR'ing between their ids

You can switch between the two by appropriately setting the WORKLOAD_COUNT flag in def_global.h; remember to use make clean after resetting the flag.

Indexing and query processing methods

Linear scan:

Source code files

main_lscan.cpp
containers/relation.h
containers/relation.cpp

Examples

$ ./query_lscan.exec -q gOVERLAPS -r 10 samples/AARHUS-BOOKS_2013.dat samples/AARHUS-BOOKS_2013_20k.qry

$ ./query_lscan.exec -q gOVERLAPS -v samples/AARHUS-BOOKS_2013.dat samples/AARHUS-BOOKS_2013_20k.qry

1D-grid:

Source code files

main_1dgrid.cpp
containers/relation.h
containers/relation.cpp
indices/1dgrid.h
indices/1dgrid.cpp

Execution

Extra parameter	Description	Comment
-p	set the number of partitions	500 for BOOKS in the experiments

Examples

$ ./query_1dgrid.exec -p 500 -q gOVERLAPS -r 10 samples/AARHUS-BOOKS_2013.dat samples/AARHUS-BOOKS_2013_20k.qry

$ ./query_1dgrid.exec -p 500 -q gOVERLAPS -v samples/AARHUS-BOOKS_2013.dat samples/AARHUS-BOOKS_2013_20k.qry

HINT:

Source code files

main_hint.cpp
containers/relation.h
containers/relation.cpp
indices/hierarchicalindex.h
indices/hierarchicalindex.cpp
indices/hint.h
indices/hint.cpp

Execution

Extra parameter	Description	Comment
-o	"SS" to activate the skewness & sparsity optimization

Examples

$ ./query_hint.exec -q gOVERLAPS -r 10 samples/AARHUS-BOOKS_2013.dat samples/AARHUS-BOOKS_2013_20k.qry

$ ./query_hint.exec -o SS -q gOVERLAPS -v samples/AARHUS-BOOKS_2013.dat samples/AARHUS-BOOKS_2013_20k.qry

HINT^m:

Source code files

main_hint_m.cpp
containers/relation.h
containers/relation.cpp
containers/offsets.h
containers/offsets.cpp
containers/offsets_templates.h
containers/offsets_templates.cpp
indices/hierarchicalindex.h
indices/hierarchicalindex.cpp
indices/hint_m.h
indices/hint_m.cpp
indices/hint_m_subs+sort.cpp
indices/hint_m_subs+sopt.cpp
indices/hint_m_subs+sort+sopt.cpp
indices/hint_m_subs+sort+sopt+ss.cpp
indices/hint_m_subs+sort+cm.cpp
indices/hint_m_subs+sort+sopt+cm.cpp
indices/hint_m_subs+sort+ss+cm.cpp
indices/hint_m_all.cpp

Execution

Extra parameter	Description	Comment
-m	set the number of bits; if not set, a value will be automattically determined using the cost model	10 for BOOKS in the experiments
-o	set optimizations to be used: "SUBS+SORT" or "SUBS+SOPT" or "SUBS+SORT+SOPT" or "SUBS+SORT+SOPT+SS" or "SUBS+SORT+CM" or "SUBS+SORT+SOPT+CM" or "SUBS+SORT+SS+CM" or "ALL"	omit parameter for base HINT^m; "CM" for cache misses optimization
-t	evaluate query traversing the hierarchy in a top-down fashion; by default the bottom-up strategy is used	currently supported only by base HINT^m

Examples

base with top-down

$ ./query_hint_m.exec -m 10 -t -q gOVERLAPS -r 10 samples/AARHUS-BOOKS_2013.dat samples/AARHUS-BOOKS_2013_20k.qry

base with bottom-up

$ ./query_hint_m.exec -m 10 -q gOVERLAPS -r 10 samples/AARHUS-BOOKS_2013.dat samples/AARHUS-BOOKS_2013_20k.qry

subs+sort (only bottom-up)

$ ./query_hint_m.exec -m 10 -o subs+sort -q gOVERLAPS -r 10 samples/AARHUS-BOOKS_2013.dat samples/AARHUS-BOOKS_2013_20k.qry

subs+sopt (only bottom-up)

$ ./query_hint_m.exec -m 10 -o subs+sopt -q gOVERLAPS -r 10 samples/AARHUS-BOOKS_2013.dat samples/AARHUS-BOOKS_2013_20k.qry

subs+sort+sopt (only bottom-up)

$ ./query_hint_m.exec -m 10 -o subs+sort+sopt -q gOVERLAPS -r 10 samples/AARHUS-BOOKS_2013.dat samples/AARHUS-BOOKS_2013_20k.qry

subs+sort+sopt+ss (only bottom-up)

$ ./query_hint_m.exec -m 10 -o subs+sort+sopt+ss -q gOVERLAPS -r 10 samples/AARHUS-BOOKS_2013.dat samples/AARHUS-BOOKS_2013_20k.qry

subs+sort+cm (only bottom-up)

$ ./query_hint_m.exec -m 10 -o subs+sort+cm -q gOVERLAPS -r 10 samples/AARHUS-BOOKS_2013.dat samples/AARHUS-BOOKS_2013_20k.qry

subs+sort+sopt+cm (only bottom-up)

$ ./query_hint_m.exec -m 10 -o subs+sort+sopt+cm -q gOVERLAPS -r 10 samples/AARHUS-BOOKS_2013.dat samples/AARHUS-BOOKS_2013_20k.qry

subs+sort+ss+cm optimizations (only bottom-up)

$ ./query_hint_m.exec -m 10 -o subs+sort+ss+cm -q gOVERLAPS -r 10 samples/AARHUS-BOOKS_2013.dat samples/AARHUS-BOOKS_2013_20k.qry

all optimizations (only bottom-up)

$ ./query_hint_m.exec -m 10 -o all -q gOVERLAPS -r 10 samples/AARHUS-BOOKS_2013.dat samples/AARHUS-BOOKS_2013_20k.qry

Notes / TODOs

The following are missing from the current version of the code:

HINT with SS optimization answering the basic predicates from Allen's algebra
Updates

Name		Name	Last commit message	Last commit date
Latest commit History 99 Commits
containers		containers
figures		figures
indices		indices
samples		samples
LICENSE.md		LICENSE.md
README.md		README.md
def_global.h		def_global.h
main_1dgrid.cpp		main_1dgrid.cpp
main_hint.cpp		main_hint.cpp
main_hint_m.cpp		main_hint_m.cpp
main_lscan.cpp		main_lscan.cpp
makefile		makefile
utils.cpp		utils.cpp

Uh oh!

License

Uh oh!

pbour/hint

Folders and files

Latest commit

History

Repository files navigation

HINT: A Hierarchical Index for Intervals in Main Memory

Contributors

Dependencies

Data

Compile

Shared parameters among all methods

Workloads

Indexing and query processing methods

Linear scan:

Source code files

Examples

1D-grid:

Source code files

Execution

Examples

HINT:

Source code files

Execution

Examples

HINTm:

Source code files

Execution

Examples

base with top-down

base with bottom-up

subs+sort (only bottom-up)

subs+sopt (only bottom-up)

subs+sort+sopt (only bottom-up)

subs+sort+sopt+ss (only bottom-up)

subs+sort+cm (only bottom-up)

subs+sort+sopt+cm (only bottom-up)

subs+sort+ss+cm optimizations (only bottom-up)

all optimizations (only bottom-up)

Notes / TODOs

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

HINT^m:

Packages