Orthogonal Range Searching
Lecture 4, CS 631100
Fall 2011 National Tsing Hua University (NTHU)
Lecture 4, CS 631100
Orthogonal Range Searching
Orthogonal Range Searching
Outline
Reference
Textbook chapter 5 Mounts Lectures 17 and 18
Problem: querying a database Solution in one dimension Data structure in IR2 : range trees Extension to higher dimensions log n factor improvement
Lecture 4, CS 631100
Orthogonal Range Searching
Orthogonal Range Searching
An Example of Application on Database
A database in a bank records transactions A query: nd all the transactions such that
The amount is between $ 1000 and $ 2000 It happened between 10:40am and 11:20am
Lecture 4, CS 631100
Orthogonal Range Searching
Orthogonal Range Searching
An Example of Application on Database
A database in a bank records transactions A query: nd all the transactions such that
The amount is between $ 1000 and $ 2000 It happened between 10:40am and 11:20am
Geometric interpretation
Lecture 4, CS 631100
Orthogonal Range Searching
Orthogonal Range Searching
Query problems
Assume n is the total number of transactions in the database We will show how to build a data structure in O(n log n) time that allows to perform this type of queries in O(k + log n) time where k is the size of the output (the number of transactions that are reported) The data structure is built only once, then a large number of queries can be answered quickly O(n log n) is the preprocessing time O(k + log n) is the query time
Lecture 4, CS 631100
Orthogonal Range Searching
Orthogonal Range Searching
Boxes
3dbox [0, 3] [0, 2.5] [0, 2] Generalize to any dimension 2dbox Also known as rectangle Parallel to coordinate axis Algorithmic problems with boxes are relatively easy
Lecture 4, CS 631100
Orthogonal Range Searching
Orthogonal Range Searching
Boxes
3dbox [0, 3] [0, 2.5] [0, 2] Generalize to any dimension 2dbox Also known as rectangle Parallel to coordinate axis Algorithmic problems with boxes are relatively easy
Lecture 4, CS 631100
Orthogonal Range Searching
Orthogonal Range Searching
Boxes
2dbox Also known as rectangle Parallel to coordinate axis
3dbox [0, 3] [0, 2.5] [0, 2] Generalize to any dimension Algorithmic problems with boxes are relatively easy
Lecture 4, CS 631100
Orthogonal Range Searching
Orthogonal Range Searching
Problem statement
Let P be a set of n points in IRd We assume d = O(1) Preprocess P so as to answer queries of the type
Input: (a1 , b1 , a2 , b2 , . . . ad , bd ) Output: P ([a1 , b1 ] [a2 , b2 ] [ad , bd ])
We denote k = |P ([a1 , b1 ] [a2 , b2 ] [ad , bd ])|
Lecture 4, CS 631100
Orthogonal Range Searching
Orthogonal Range Searching
One Dimensional Case (d=1)
One Dimensional Case (d=1): Using BBST
Lecture 4, CS 631100
Orthogonal Range Searching
Orthogonal Range Searching
One Dimensional Case (d=1)
Problem statement
P is a set of real numbers Queries: nd all the points in P that are between a and b Data structure:
Balanced Binary Search Tree Preprocessing time: (n log n) time to build a BBST Space usage: (n)
Query time: (k + log n) time. How?
Lecture 4, CS 631100
Orthogonal Range Searching
Orthogonal Range Searching
One Dimensional Case (d=1)
Answering a query
Algorithm Report (T , a, b) Input: a BBST T storing P , an interval [a, b] Output: P [a, b] 1. if T = N U LL 2. then return 3. x value stored at the root of T 4. if a<x 5. then Report(T .lef t, a, b) 6. if a x b 7. then output x 8. if x<b 9. then Report(T .right, a, b)
Lecture 4, CS 631100
Orthogonal Range Searching
Orthogonal Range Searching
One Dimensional Case (d=1)
Analysis of query time
Report left path, right path, vsplit and subtrees in between. Length of
path from root to vsplit left path right path
All lengths are O(log n) Sum of the sizes of red subtrees: k
Query time: O(k + log n)
Lecture 4, CS 631100
Orthogonal Range Searching
10
Orthogonal Range Searching
Two Dimensional Case (d=2)
Two Dimensional Case (d=2): Using range tree
Lecture 4, CS 631100
Orthogonal Range Searching
11
Orthogonal Range Searching
Two Dimensional Case (d=2)
Introduction
A set P of n points in IR2 Query: given (a1 , b1 , a2 , b2 ), nd all points (x, y ) from P in rectangle [a1 , b1 ] [a2 , b2 ]. Results presented in this section
(n log n) preprocessing time (n log n) space usage (k + log2 n) query time
Query time will be slightly improved in the last section
Lecture 4, CS 631100
Orthogonal Range Searching
12
Orthogonal Range Searching
Two Dimensional Case (d=2)
Canonical sets
First store T in a BBST using the xcoordinates as keys We associate each node v of T with a canonical set Cv containing points in P stored in the subtree rooted at v .
Lecture 4, CS 631100
Orthogonal Range Searching
13
Orthogonal Range Searching
Two Dimensional Case (d=2)
Range trees in IR
Each canonical set Cv is stored in a BBST Tv using the y coordinates as keys. Tv is called the canonical tree at node v .
We make the query through TWO steps: 1st on x-coordinates, & 2nd on y -coordinates (as shown in the following slides).
Lecture 4, CS 631100 Orthogonal Range Searching 14
Orthogonal Range Searching
Two Dimensional Case (d=2)
Step 1: Querying x-coordinates
First make the query with range [a1 , b1 ] on x-coordinates
Let P = P ([a1 , b1 ] (, )) Let P be the set of points on the right path and the left path (when searching for a1 and b1 ) We partition P \ P into c canonical subsets
Thus P = P C1 C2 . . . Cc
Lecture 4, CS 631100
Orthogonal Range Searching
15
Orthogonal Range Searching
Two Dimensional Case (d=2)
Partitioning P
After we make the query with range [a1 , b1 ] on x-coord.: We take the nodes on the left path and the right path, which gives P . For each node on the left path, select canonical tree Ti of its right child, (gives some Ci ). For each node on the right path, select canonical tree Ti of its left child, (gives some Ci ). It takes O(log n) time (height of the BBST). There are c = O(log n) canonical sets in our partition.
Lecture 4, CS 631100
Orthogonal Range Searching
16
Orthogonal Range Searching
Two Dimensional Case (d=2)
Step 2: Querying y -coordinates
p P check if p [a1 , b1 ] [a2 , b2 ], and report it if it is.
For all i, use interval [a2 , b2 ] to perform a 1-dim. search query in Ci using canonical tree Ti .
The union of all these results gives P ([a1 , b1 ] [a2 , b2 ]) Analysis of query time:
Let ki = no. of points reported from Ti c i=1 ki k Query time:
c c
O(log n + ki ) = c log n +
i=1 i=1
ki = O(log2 n + k )
Lecture 4, CS 631100
Orthogonal Range Searching
17
Orthogonal Range Searching
Two Dimensional Case (d=2)
Analysis of total query time
Ci
Ti
canonical tree
Query on x-coordinates on T :
Obtain P (points on left & right paths)& canonical trees Ti . It takes O(log n) time. It takes O(log2 n + k ) (refer to previous slide).
Query on y -coordinates on Ti :
Total query time = O(log n) + O(log2 n + k ) = O(log2 n + k ).
Lecture 4, CS 631100 Orthogonal Range Searching 18
Orthogonal Range Searching
Two Dimensional Case (d=2)
Space complexity (Proof 1)
A point p belongs to all the canonical sets in the path from the vertex of T that stores p to the root (and only these canonical sets) Thus p lies in O(log n) canonical sets Hence
v T
|Cv | = O(n log n),
where Cv = the canonical set at node v . The memory space used is O(n log n). Actually, it is (n log n).
Why?
Lecture 4, CS 631100
Orthogonal Range Searching
19
Orthogonal Range Searching
Two Dimensional Case (d=2)
Space complexity (Proof 2)
Ti n
n 2 n 2
n 2( n 2) = n
n 4
log n levels ... ... ... ...
n 4
n 4
n 4
4( n 4) = n ...
n(1) = n Total = (n log n)
Lecture 4, CS 631100 Orthogonal Range Searching 20
Orthogonal Range Searching
Two Dimensional Case (d=2)
Preprocessing time
Tv can be build in O(|Cv | log |Cv |) time |Cv | log |Cv | log n
Hence the range tree can be built in time |Cv | = log nO(n log n) = O(n log2 n)
We can do better ...
Compute the Tv s from leaves to root Computing Tv is merging two sorted sequences It takes O(|Cv |) time Overall, we can build the range tree in time |Cv | = (n log n)
Lecture 4, CS 631100
Orthogonal Range Searching
21
Orthogonal Range Searching
Range trees in higher dimensions
Range trees in higher dimensions
Lecture 4, CS 631100
Orthogonal Range Searching
22
Orthogonal Range Searching
Range trees in higher dimensions
Idea
We assume d > 1 and d = O(1). We want to perform range searching in IRd . We still build T with respect to the x1 coordinate.
For each canonical set of T we build a (d 1)dimensional range searching data structure using coordinates (x2 , x3 , . . . xd ). To answer a ddimensional query
Find the canonical trees of T associated with [a1 , b1 ] Make a d 1dimensional query on each canonical tree recursively, using [a2 , b2 ] [a3 , b3 ] . . . [ad , bd ]
Lecture 4, CS 631100
Orthogonal Range Searching
23
Orthogonal Range Searching
Range trees in higher dimensions
Analysis
Query time: O(logd n + k )
Due to d nested levels in d-dim. range tree, Searching for d levels takes O(logd n) time. Reporting all points inside the query range takes O(k ) time.
Space complexity: O(n logd1 n)
By induction on d (See next slide ...)
Preprocessing time: O(n logd1 n)
Compute the Tv s from leaves to root As the size of the range tree is O(n logd1 n), building the whole range tree takes O(n logd1 n).
Lecture 4, CS 631100
Orthogonal Range Searching
24
Orthogonal Range Searching
Range trees in higher dimensions
Space complexity (Proof by Induction)
Suppose (d 1)-dim. range tree has size of O(n logd2 n).
T Ti O(n logd2 n)
d2 n O( n 2 log 2)
O(n logd2 n)
d2 n 2O( n 2 log 2) = O(n logd2 n 2) d2 n 4O( n 4 log 4) = O(n logd2 n 4)
log n levels ... ... ... ...
d2 n O( n 4 log 4)
...
nO(1) = O(n)
Then size of d-dim. range tree is d2 n O(n logd2 n) + O(n logd2 n 2 ) + O (n log 4 ) + . . . + O (n) d2 d1 = log n O(n log n) = O(n log n).
Lecture 4, CS 631100 Orthogonal Range Searching
25
Orthogonal Range Searching
Improved range trees
Improved range trees: Fractional cascading
Lecture 4, CS 631100
Orthogonal Range Searching
26
Orthogonal Range Searching
Improved range trees
Motivation
In IR2 the query time of range trees is (k + log2 n) For comparison based algorithms, (k + log n) is a lower bound. Can we do better to achieve the lower bound? Yes, well then show how to obtain (k + log n) optimal query time.
Lecture 4, CS 631100
Orthogonal Range Searching
27
Orthogonal Range Searching
Improved range trees
Step 1: Querying x-coordinates (Same as before:)
Make the query with range [a1 , b1 ] on x-coordinates.
Ci
Cj
Take the nodes on the left path and the right path. Select canonical set Ci at right child of a node on left path; Select canonical set Cj at left child of a node on right path. It takes O(log n) time (height of the BBST T ). Let {C1 , C2 , . . . , Cc } = canonical sets selected, where c = O(log n).
Lecture 4, CS 631100 Orthogonal Range Searching 28
Orthogonal Range Searching
Improved range trees
Step 2: Querying y -coordinates (Modied)
When processing a query (a1 , b1 , a2 , b2 ), we search canonical trees Tv , always with two keys a2 and b2 . For each such tree, we spend O(log n) searching time. Main Idea: As Cv.lef t and Cv.right are subsets of Cv , We keep pointers between nodes of Tv and nodes of Tv.lef t & Tv.right that keep same key, or next larger key.
Av
Av.lef t
Av.right
Thus after performing search on a2 or b2 in Tv , we can perform search on a2 or b2 in Tv.lef t & Tv.right in O(1) time.
Lecture 4, CS 631100 Orthogonal Range Searching 29
Orthogonal Range Searching
Improved range trees
Step 2: Querying y -coordinates (Modied)
Minor Idea: Replacing each canonical tree Ti by a canonical array Ai for canonical set Ci :
Make a search for key a2 in array Ai ; Starting from a2 , walk along array Ai until b2 is exceeded.
Av
Av.lef t
Av.right
Lecture 4, CS 631100
Orthogonal Range Searching
30
Orthogonal Range Searching
Improved range trees
Step 2: Querying y -coordinates (Modied)
First make a binary search for a2 in Aroot , which takes O(log n) time.
Aroot Au Av v u Ci w Cj Aw Aj
Ai
By following pointer links, we can search a2 in a canonical array Ai in O(1) time. Starting from a2 , walk along array Ai (& reporting them) until b2 is exceeded.
Lecture 4, CS 631100 Orthogonal Range Searching
31
Orthogonal Range Searching
Improved range trees
Improving d-dim. range trees
Hence we can answer 2-dim. range query in O(log n + k ) optimal time. This technique is known as fractional cascading. By induction, it also improves by a factor O(log n) the results in d > 2 (by using canonical arrays at the last level, and the linking pointers). Hence range trees with fractional cascading in d 2 yield
Query time: O(k + logd1 n) (improved by a O(log n) factor) Space usage: O(n logd1 n) (same as before) Preprocessing time: O(n logd1 n) (same as before)
Lecture 4, CS 631100
Orthogonal Range Searching
32
Orthogonal Range Searching
Improved range trees
Remarks on 2-dim. improved range trees
O(log n + k ) query time and O(n log n) preprocessing time are optimal. But space complexity is NOT optimal. O(n log n/ log log n) space is possible in 2 dimensions with the same query time, and this is optimal. (not covered in this course)
Lecture 4, CS 631100
Orthogonal Range Searching
33
Orthogonal Range Searching
Improved range trees
Concluding remarks
Range trees:
simple nearly optimal
Spatial databases mainly use Rtrees
not covered in this course good in practice with real data-sets but no performance guarantee (no good worst case bound on the query time)
Lecture 4, CS 631100
Orthogonal Range Searching
34
Orthogonal Range Searching
Next Lecture
Summary of this lecture:
Orthogonal Range Searching
2-dim. range trees d-dim. range trees Fractional cascading
Next lecture:
Segment Trees and Interval Trees
Segment Trees Interval Trees
Lecture 4, CS 631100
Orthogonal Range Searching
35