0% found this document useful (0 votes)

114 views14 pages

Ranking Spatial Data by Quality Preferences

Spatial preference queries rank objects based on the qualities of features in their spatial neighborhood. Such a neighborhood concept can be specified by the user via different functions. An optimized branch-and-bound solution is efficient and robust with respect to different parameters.

Uploaded by

Hari Prakash Yeravelli

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

114 views14 pages

Ranking Spatial Data by Quality Preferences

Uploaded by

Hari Prakash Yeravelli

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

Ranking Spatial Data by Quality Preferences

Man Lung Yiu, Hua Lu, Member, IEEE, Nikos Mamoulis, and Michail Vaitis
AbstractA spatial preference query ranks objects based on the qualities of features in their spatial neighborhood. For example,
using a real estate agency database of flats for lease, a customer may want to rank the flats with respect to the appropriateness of their
location, defined after aggregating the qualities of other features (e.g., restaurants, cafes, hospital, market, etc.) within their spatial
neighborhood. Such a neighborhood concept can be specified by the user via different functions. It can be an explicit circular region
within a given distance from the flat. Another intuitive definition is to assign higher weights to the features based on their proximity to
the flat. In this paper, we formally define spatial preference queries and propose appropriate indexing techniques and search
algorithms for them. Extensive evaluation of our methods on both real and synthetic data reveals that an optimized branch-and-bound
solution is efficient and robust with respect to different parameters.
Index TermsQuery processing, spatial databases.

1 INTRODUCTION
S
PATIAL database systems manage large collections of
geographic entities, which apart from spatial attributes
contain nonspatial information (e.g., name, size, type, price,
etc.). In this paper, we study an interesting type of preference
queries, which select the best spatial location with respect to
the quality of facilities in its spatial neighborhood.
Given a set D of interesting objects (e.g., candidate
locations), a top-/ spatial preference query retrieves the
/ objects in D with the highest scores. The score of an
object is defined by the quality of features (e.g., facilities or
services) in its spatial neighborhood. As a motivating
example, consider a real estate agency office that holds a
database with available flats for lease. Here feature refers
to a class of objects in a spatial map such as specific facilities
or services. A customer may want to rank the contents of
this database with respect to the quality of their locations,
quantified by aggregating nonspatial characteristics of other
features (e.g., restaurants, cafes, hospital, market, etc.) in the
spatial neighborhood of the flat (defined by a spatial range
around it). Quality may be subjective and query-parametric.
For example, a user may define quality with respect to
nonspatial attributes of restaurants around it (e.g., whether
they serve seafood, price range, etc.).
As another example, the user (e.g., a tourist) wishes to find
a hotel j that is close to a high-quality restaurant and a high-
quality cafe. Fig. 1a illustrates the locations of an object data
set D (hotels) in white, and two feature data sets: the set F
1
(restaurants) in gray, and the set F
2
(cafes) in black. Feature
points are labeledby quality values that can be obtainedfrom
ratingproviders (e.g., http://www.zagat.com/). For the ease
of discussion, the qualities are normalized to values in 0. 1.
The score tj of a hotel j is defined in terms of: 1) the
maximum quality for each feature in the neighborhood
region of j, and 2) the aggregation of those qualities.
A simple score instance, called the range score, binds the
neighborhood region to a circular region at j with radius c
(shown as a circle), and the aggregate function to SUM. For
instance, the maximum quality of gray and black points
within the circle of j
1
are 0.9 and 0.6, respectively, so the
score of j
1
is tj
1
0.9 0.6 1.5. Similarly, we obtain
tj
2
1.0 0.1 1.1 and tj
3
0.7 0.7 1.4. Hence,
the hotel j
1
is returned as the top result.
In fact, the semantics of the aggregate function is relevant
to the users query. The SUM function attempts to balance the
overall qualities of all features. For the MIN function, the top
result becomes j
3
, with the score tj
3
minf0.7. 0.7g 0.7.
It ensures that the top result has reasonably high qualities in
all features. For the MAX function, the top result is j
2
, with
tj
2
maxf1.0. 0.1g 1.0. It is used to optimize the quality
in a particular feature, but not necessarily all of them.
The neighborhood region in the above spatial preference
query can also be defined by other score functions. A
meaningful score function is the influence score (see Section 4).
As opposed to the crisp radius c constraint in the range
score, the influence score smoothens the effect of c and
assigns higher weights to cafes that are closer to the hotel.
Fig. 1b shows a hotel j
5
and three cafes :
1
. :
2
. :
3
(with their
quality values). The circles have their radii as multiples of c.
Now, the score of a cafe :
i
is computed by multiplying its
quality with the weight 2
,
, where , is the order of the
smallest circle containing :
i
. For example, the scores of :
1
, :
2
,
and :
3
are 0.3,2
1
0.15, 0.9,2
2
0.225, and 1.0,2
3
0.125,
respectively. The influence score of j
5
is taken as the highest
value (0.225).
Traditionally, there are two basic ways for ranking
objects: 1) spatial ranking, which orders the objects
according to their distance from a reference point, and
2) nonspatial ranking, which orders the objects by an
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 23, NO. 3, MARCH 2011 433
. M. L. Yiu is with the Department of Computing, Hong Kong Polytechnic
University, Hung Hom, Hong Kong. E-mail: [email protected].
. H. Lu is with the Department of Computer Science, Aalborg University,
DK-9220 Aalborg, Denmark. E-mail: [email protected].
. N. Mamoulis is with the Department of Computer Science, University of
Hong Kong, Pokfulam Road, Hong Kong. E-mail: [email protected].
. M. Vaitis is with the Department of Geography, University of the Aegean,
University Hill, GR-811 00 Mytilene, Greece. E-mail: [email protected].
Manuscript received 2 Feb. 2009; revised 9 Aug. 2009; accepted 21 Oct. 2009;
published online 26 July 2010.
Recommended for acceptance by M. Ester.
For information on obtaining reprints of this article, please send e-mail to:
[email protected], and reference IEEECS Log Number TKDE-2009-02-0046.
Digital Object Identifier no. 10.1109/TKDE.2010.119.
1041-4347/11/$26.00 2011 IEEE Published by the IEEE Computer Society
aggregate function on their nonspatial values. Our top-/
spatial preference query integrates these two types of
ranking in an intuitive way. As indicated by our examples,
this new query has a wide range of applications in service
recommendation and decision support systems.
To our knowledge, there is no existing efficient solution
for processing the top-/ spatial preference query. A brute-
force approach (to be elaborated in Section 3.2) for
evaluating it is to compute the scores of all objects in D
and select the top-/ ones. This method, however, is
expected to be very expensive for large input data sets. In
this paper, we propose alternative techniques that aim at
minimizing the I/O accesses to the object and feature data
sets, while being also computationally efficient. Our
techniques apply on spatial-partitioning access methods
and compute upper score bounds for the objects indexed by
them, which are used to effectively prune the search space.
Specifically, we contribute the branch-and-bound (BB)
algorithm and the feature join (FJ) algorithm for efficiently
processing the top-/ spatial preference query.
Furthermore, this paper studies three relevant extensions
that have not been investigated in our preliminary work [1].
The first extension (Section 3.4) is an optimized version of BB
that exploits a more efficient technique for computing the
scores of the objects. The second extension (Section 3.6)
studies adaptations of the proposed algorithms for aggregate
functions other than SUM, e.g., the functions MIN and MAX.
The third extension (Section 4) develops solutions for the
top-/ spatial preference query based on the influence score.
The rest of this paper is structured as follows: Section 2
provides background on basic and advanced queries on
spatial databases, as well as top-/ query evaluation in
relational databases. Section 3 defines the top-/ spatial
preference queryandpresents our solutions. Section4studies
the query extension for the influence score. In Section 5, our
query algorithms are experimentally evaluated with real and
synthetic data. Finally, Section 6 concludes the paper with
future research directions.
2 BACKGROUND AND RELATED WORK
Object ranking is a popular retrieval task in various
applications. In relational databases, we rank tuples using
an aggregate score function on their attribute values [2]. For
example, a real estate agency maintains a database that
contains information of flats available for rent. A potential
customer wishes to view the top 10 flats with the largest
sizes and lowest prices. In this case, the score of each flat is
expressed by the sum of two qualities: size and price, after
normalization to the domain [0, 1] (e.g., 1 means the largest
size and the lowest price). In spatial databases, ranking is
often associated to nearest neighbor (NN) retrieval. Given a
query location, we are interested in retrieving the set of
nearest objects to it that qualify a condition (e.g., restau-
rants). Assuming that the set of interesting objects is
indexed by an R-tree [3], we can apply distance bounds
and traverse the index in a branch-and-bound fashion to
obtain the answer [4].
Nevertheless, it is not always possible to use multi-
dimensional indexes for top-/ retrieval. First, such indexes
break down in high-dimensional spaces [5], [6]. Second,
top-/ queries may involve an arbitrary set of user-specified
attributes (e.g., size and price) from possible ones (e.g., size,
price, distance to the beach, number of bedrooms, floor,
etc.) and indexes may not be available for all possible
attribute combinations (i.e., they are too expensive to create
and maintain). Third, information for different rankings to
be combined (i.e., for different attributes) could appear in
different databases (in a distributed database scenario) and
unified indexes may not exist for them. Solutions for top-/
queries [7], [2], [8], [9] focus on the efficient merging of
object rankings that may arrive from different (distributed)
sources. Their motivation is to minimize the number of
accesses to the input rankings until the objects with the top-
/ aggregate scores have been identified. To achieve this,
upper and lower bounds for the objects seen so far are
maintained while scanning the sorted lists.
In the following sections, we first review the R-tree,
which is the most popular spatial access method and the
NN search algorithm of [4]. Then, we survey recent research
of feature-based spatial queries.
2.1 Spatial Query Evaluation on R-Trees
The most popular spatial access method is the R-tree [3],
which indexes minimum bounding rectangles (MBRs) of
objects. Fig. 2 shows a set D fj
1
. . . . . j
8
g of spatial objects
(e.g., points) and an R-tree that indexes them. R-trees can
efficiently process main spatial query types, including
spatial range queries, nearest neighbor queries, and spatial
joins. Given a spatial region \, a spatial range query retrieves
from D the objects that intersect \. For instance, consider a
range query that asks for all objects within the shaded area
in Fig. 2. Starting from the root of the tree, the query is
processed by recursively following entries, having MBRs
that intersect the query region. For instance, c
1
does not
intersect the query region, thus the subtree pointed by c
1
434 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 23, NO. 3, MARCH 2011
Fig. 1. Examples of top-/ spatial preference queries. (a) Range score,
c 0.2 km. (b) Influence score, c 0.2 km.
Fig. 2. Spatial queries on R-trees.
cannot contain any query result. In contrast, c
2
is followed
by the algorithm and the points in the corresponding node
are examined recursively to find the query result j
7
.
A nearest neighbor query takes as input a query object
and returns the closest object in D to . For instance, the
nearest neighbor of in Fig. 2 is j
7
. Its generalization is the
/-NN query, which returns the / closest objects to , given a
positive integer /. NN (and /-NN) queries can be efficiently
processed using the best-first (BF) algorithm of [4], provided
that D is indexed by an R-tree. A min-heap H which
organizes R-tree entries based on the (minimum) distance of
their MBRs to is initialized with the root entries. In order
to find the NN of in Fig. 2, BF first inserts to H entries c
1
,
c
2
, c
3
, and their distances to . Then, the nearest entry c
2
is
retrieved from H and objects j
1
. j
7
. j
8
are inserted to H. The
next nearest entry in H is j
7
, which is the nearest neighbor
of . In terms of I/O, the BF algorithm is shown to be no
worse than any NN algorithm on the same R-tree [4].
The aggregate R-tree (aR-tree) [10] is a variant of the R-
tree, where each nonleaf entry augments an aggregate
measure for some attribute value (measure) of all points in
its subtree. As an example, the tree shown in Fig. 2 can be
upgraded to a MAX aR-tree over the point set, if entries
c
1
. c
2
. c
3
contain the maximum measure values of sets
fj
2
. j
3
g. fj
1
. j
8
. j
7
g. fj
4
. j
5
. j
6
g, respectively. Assume that
the measure values of j
4
. j
5
. j
6
are 0.2. 0.1. 0.4, respectively.
In this case, the aggregate measure augmented in c
3
would
be maxf0.2. 0.1. 0.4g 0.4. In this paper, we employ MAX
aR-trees for indexing the feature data sets (e.g., restaurants),
in order to accelerate the processing of top-/ spatial
preference queries.
Given a feature data set F and a multidimensional
region 1, the range top-/ query selects the tuples (from F)
within the region R and returns only those with the
/ highest qualities. Hong et al. [11] indexed the data set
by a MAX aR-tree and developed an efficient tree traversal
algorithm to answer the query. Instead of finding the best
/ qualities from F in a specified region, our (range score)
query considers multiple spatial regions based on the points
from the object data set D, and attempts to find out the best
/ regions (based on scores derived from multiple feature
data sets F
c
).
2.2 Feature-Based Spatial Queries
Xia et al. [12] solved the problem of finding top-/ sites (e.g.,
restaurants) based on their influence on feature points (e.g.,
residential buildings). As an example, Fig. 3a shows a set of
sites (white points), a set of features (black points with
weights), such that each line links a feature point to its
nearest site. The influence of a site j
i
is defined by the sum
of weights of feature points having j
i
as their closest site.
For instance, the score of j
1
is 0.9 0.5 1.4. Similarly, the
scores of j
2
and j
3
are 1.5 and 1.2, respectively. Hence, j
2
is
returned as the top-1 influential site.
Related to top-/ influential sites query are the optimal
location queries studied in [13], [14]. The goal is to find the
location in space (not chosen from a specific set of sites) that
minimizes an objective function. In Figs. 3b and 3c, feature
points and existing sites are shown as black and gray
points, respectively. Assume that all feature points have the
same quality. The maximum influence optimal location
query [13] finds the location (to insert to the existing set of
sites) with the maximum influence (as defined in [12]),
whereas the minimum distance optimal location query [14]
searches for the location that minimizes the average
distance from each feature point to its nearest site. The
optimal locations for both queries are marked as white
points in Figs. 3b and 3c, respectively.
The techniques proposed in [12], [13], [14] are specific to
the particular query types described above and cannot be
extended for our top-/ spatial preference queries. Also, they
deal with a single-feature data set whereas our queries
consider multiple feature data sets.
Recently, novel spatial queries and joins [15], [16], [17],
[18] have been proposed for various spatial decision
support problems. However, they do not utilize nonspatial
qualities of facilities to define the score of a location. Finally,
[19], [20] studied the evaluation of textual location-based
queries on spatial objects.
3 SPATIAL PREFERENCE QUERIES
Section 3.1 formally defines the top-/ spatial preference
query problem and describes the index structures for the
data sets. Section 3.2 studies two baseline algorithms for
processing the query. Section 3.3 presents an efficient
branch-and-bound algorithm for the query, and its further
optimization is proposed in Section 3.4. Section 3.5 develops
a specialized spatial join algorithm for evaluating the query.
Finally, Section 3.6 extends the above algorithms for
answering top-/ spatial preference queries involving other
aggregate functions.
3.1 Definitions and Index Structures
Let F
c
be a feature data set, in which each feature object
: 2 F
c
is associated with a quality .: and a spatial point.
We assume that the domain of .: is the interval 0. 1. As
an example, the quality .: of a restaurant : can be
obtained from a ratings provider.
Let D be an object data set, where each object j 2 D is a
spatial point. In other words, D is the set of interesting
points (e.g., hotel locations) considered by the user.
Given an object data set D and i feature data sets
F
1
. F
2
. . . . . F
i
, the top-/ spatial preference query retrieves the
/ points in D with the highest score. Here, the score of an
object point j 2 D is defined as
t
0
j AGG

t
0
c
j j c 2 1. i

. 1
YIU ET AL.: RANKING SPATIAL DATA BY QUALITY PREFERENCES 435
Fig. 3. Influential sites and optimal location queries. (a) Top-/ influential.
(b) Max-influence. (c) Min-distance.
where AGG is an aggregate function and t
0
c
j is the (cth)
component score of j with respect to the neighborhood
condition 0 and the (cth) feature data set F
c
.
We proceed to elaborate the aggregate function and the
component score function. Typical examples of the aggre-
gate function AGG are: SUM, MIN, and MAX. We first focus on
the case where AGG is SUM. In Section 3.6, we will discuss
the generic scenario where AGG is an arbitrary monotone
aggregate function.
An intuitive choice for the component score function
t
0
c
j is: the range score t
iiq
c
j, taken as the maximum
quality .: of points : 2 F
c
that are within a given
parameter distance c from j, or 0 if no such point exists.
t
iiq
c
j maxf.: j : 2 F
c
^ di:tj. : cg [ f0g. 2
In our problem setting, the user requires that an object j 2
D must not be considered as a result if there exists some F
c
such that the neighborhood region of j does not contain any
feature point of F
c
.
There are other choices for the component score function
t
0
c
j. One example is the influence score function t
ii)
c
j
which will be considered in Section 4. Another example is
the NN score t
ii
c
j that has been studied in our previous
work [1], so it will not be examined again in this paper. The
condition 0 is dropped whenever the context is clear.
In this paper, we assume that the object data set D is
indexed by an R-tree and each feature data set F
c
is indexed
by an MAX aR-tree, where each nonleaf entry augments the
maximum quality (of features) in its subtree. Nevertheless,
our solutions are directly applicable to data sets that are
indexed by other hierarchical spatial indexes (e.g., point
quad-trees). The rationale of indexing different feature data
sets by separate aR-trees is that: 1) a user queries for only
few features (e.g., restaurants and cafes) out of all possible
features (e.g., restaurants, cafes, hospital, market, etc.), and
2) different users may consider different subsets of features.
Based on the above indexing scheme, we develop
various algorithms for processing top-/ spatial preference
queries. Table 1 lists the notations to be used throughout
the paper.
3.2 Probing Algorithms
We first introduce a brute-force solution that computes the
score of every point j 2 D in order to obtain the query
results. Then, we propose a group evaluation technique that
computes the scores of multiple points concurrently.
3.2.1 Simple Probing Algorithm
According to Section 3.1, the quality .: of any feature
point : falls into the interval 0. 1. Thus, for a point j 2 D,
where not all its component scores are known, its upper
bound score t

j is defined as
t

j
X
i
c1
t
c
j. if t
c
j is known.
1. otherwise.
&
3
It is guaranteed that the bound t

j is greater than or equal

to the actual score tj.
Algorithm 1 is a pseudocode of the simple probing (SP)
algorithm, which retrieves the query results by computing
the score of every object point. The algorithm uses two
global variables: \
/
is a min-heap for managing the top-/
results and represents the top-/ score so far (i.e., lowest
score in \
/
). Initially, the algorithm is invoked at the root
node of the object tree (i.e., ` D.ioot). The procedure is
recursively applied (at Line 4) on tree nodes until a leaf
node is accessed. When a leaf node is reached, the
component score t
c
c (at Line 8) is computed by executing
a range search on the feature tree F
c
for range score queries.
Lines 6-8 describe an incremental computation technique, for
reducing unnecessary component score computations. In
particular, the point c is ignored as soon as its upper bound
score t

c (see (3)) cannot be greater than the best-/ score

. The variables \
/
and are updated when the actual
score tc is greater than .
Algorithm 1. Simple Probing Algorithm
algorithm SP(Node `)
1: for each entry c 2 ` do
2: If ` is nonleaf then
3: read the child node `
0
pointed by c;
4: SP(`
0
);
5: else
6: for c : 1 to i do
7: If t

c then > upper bound score

8: compute t
c
c using tree F
c
; update t

c;
9: If tc then
10: update \
/
(and ) by c;
3.2.2 Group Probing Algorithm
Due to separate score computations for different objects, SP
is inefficient for large-object data sets. In view of this, we
propose the group probing (GP) algorithm, a variant of SP,
that reduces I/O cost by computing scores of objects in the
same leaf node of the R-tree concurrently. In GP, when a
leaf node is visited, its points are first stored in a set \ and
then their component scores are computed concurrently at a
single traversal of the F
c
tree.
We now introduce some distance notations for MBRs.
Given a point j and an MBR c, the value iiidi:tj. c
(iordi:tj. c) [4] denotes the minimum (maximum)
possible distance between j and any point in c. Similarly,
given two MBRs c
o
and c
/
, the value iiidi:tc
o
. c
/

436 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 23, NO. 3, MARCH 2011
TABLE 1
List of Notations
(iordi:tc
o
. c
/
) denotes the minimum (maximum) possible
distance between any point in c
o
and any point in c
/
.
Algorithm 2 shows the procedure for computing the cth
component score for a group of points. Consider a subset \
of D for which we want to compute their t
iiq
c
j score at
feature tree F
c
. Initially, the procedure is called with `
being the root node of F
c
. If c is a nonleaf entry and its
iiidi:t from some point j 2 \ is within the range c, then
the procedure is applied recursively on the child node of c,
since the subtree of F
c
rooted at c may contribute to the
component score of j. In case c is a leaf entry (i.e., a feature
point), the scores of points in \ are updated if they are
within distance c from c.
Algorithm 2. Group Range Score Algorithm
algorithm Group_Range(Node `, Set \ , Value c,
Value c)
1: for each entry c 2 ` do
2: If ` is nonleaf then
3: If 9j 2 \ . iiidi:tj. c c then
4: read the child node `
0
pointed by c;
5: Group_Range(`
0
,\ ,c, c);
6: else
7: for each j 2 \ such that di:tj. c c do
8: t
c
j : maxft
c
j. .cg;
3.3 Branch-and-Bound Algorithm
GP is still expensive as it examines all objects in D and
computes their component scores. We now propose an
algorithm that can significantly reduce the number of
objects to be examined. The key idea is to compute, for
nonleaf entries c in the object tree D, an upper bound T c of
the score tj for any point j in the subtree of c. If T c ,
then we need not access the subtree of c, thus we can save
numerous score computations.
Algorithm 3 is a pseudocode of our BB algorithm, based
on this idea. BB is called with ` being the root node of D. If
` is a nonleaf node, Lines 3-5 compute the scores T c for
nonleaf entries c concurrently. Recall that T c is an upper
bound score for any point in the subtree of c. The
techniques for computing T c will be discussed shortly.
Like (3), with the component scores T
c
c known so far, we
can derive T

c, an upper bound of T c. If T

c ,
then the subtree of c cannot contain better results than those
in \
/
and it is removed from \ . In order to obtain points
with high scores early, we sort the entries in descending
order of T c before invoking the above procedure
recursively on the child nodes pointed by the entries in \ .
If ` is a leaf node, we compute the scores for all points of `
concurrently and then update the set \
/
of the top-/ results.
Since both \
/
and are global variables, their values are
updated during recursive call of BB.
Algorithm 3. Branch-and-Bound Algorithm
\
/
: new min-heap of size / (initially empty);
: 0; > /th score in \
/
algorithm BB(Node `)
1: \ : fcjc 2 `g;
2: If ` is nonleaf then
3: for c : 1 to i do
4: compute T
c
c for all c 2 \ concurrently;
5: remove entries c in \ such that T

c ;
6: sort entries c 2 \ in descending order of T c;
7: for each entry c 2 \ such that T c do
8: read the child node `
0
pointed by c;
9: BB(`
0
);
10: else
11: for c : 1 to i do
12: compute t
c
c for all c 2 \ concurrently;
13: remove entries c in \ such that t

c ;
14: update \
/
(and ) by entries in \ ;
3.3.1 Upper Bound Score Computation
It remains to clarify how the (upper bound) scores T
c
c of
nonleaf entries (within the same node `) can be computed
concurrently (at Line 4). Our goal is to compute these upper
bound scores such that
. the bounds are computed with low I/O cost, and
. the bounds are reasonably tight, in order to facilitate
effective pruning.
To achieve this, we utilize only level-1 entries (i.e., lowest
level nonleaf entries) in F
c
for deriving upper bound scores
because: 1) there are much fewer level-1 entries than leaf
entries (i.e., points), and 2) high-level entries in F
c
cannot
provide tight bounds. In our experimental study, we will
also verify the effectiveness and the cost of using level-1
entries for upper bound score computation.
Algorithm 2 can be modified for the above upper bound
computation task (where input \ corresponds to a set of
nonleaf entries), after changing Line 2 to check whether
child nodes of ` are above the leaf-level.
The following example illustrates how upper bound
range scores are derived. In Fig. 4a, .
1
and .
2
are nonleaf
entries in the object tree D and the others are level-1 entries
in the feature tree F
c
. For the entry .
1
, we first define its
Minkowski region [21] (i.e., gray region around .
1
), the area
whose iiidi:t from .
1
is within c. Observe that only entries
c
i
intersecting the Minkowski region of .
1
can contribute to
the score of some point in .
1
. Thus, the upper bound score
T
c
.
1
is simply the maximum quality of entries c
1
. c
5
. c
6
. c
7
,
i.e., 0.9. Similarly, T
c
.
2
is computed as the maximum
quality of entries c
2
. c
3
. c
4
. c
8
, i.e., 0.7. Assuming that .
1
and
.
2
are entries in the same tree node of D, their upper bounds
are computed concurrently to reduce I/O cost.
YIU ET AL.: RANKING SPATIAL DATA BY QUALITY PREFERENCES 437
Fig. 4. Examples of deriving scores. (a) Upper bound scores.
(b) Optimized computation.
3.4 Optimized Branch-and-Bound Algorithm
This section develops a more efficient score computation
technique to reduce the cost of the BB algorithm.
3.4.1 Motivation
Recall that Lines 11-13 of the BB algorithm are used to
compute the scores of object points (i.e., leaf entries of the
R-tree on D). A leaf entry c is pruned if its upper bound
score t

c is not greater than the best score found so far .

However, the upper bound score t

c (see (3)) is not tight

because any unknown component score is replaced by a
loose bound (i.e., the value 1).
Let us examine the computation of t

j
1
for the point j
1
in Fig. 4b. The entry c
11
1
is a nonleaf entry from the feature
tree F
1
. Its augmented quality value is .c
11
1
0.8. The
entry points to a leaf node containing two feature points,
whose qualities values are 0.6 and 0.8, respectively.
Similarly, c
12
2
is a nonleaf entry from the tree F
2
and it
points to a leaf node of feature points.
Suppose that the best score found so far in BB is 1.4
(not shown in the figure). We need to check whether the
score of j
1
can be higher than . For this, we compute the
first component score t
1
j
1
0.6 by accessing the child
node of c
11
1
. Now, we have the upper bound score of j
1
as
t

j 0.6 1.0 1.6. Such a bound is above 1.4 so we

need to compute the second component score t
2
j
1
0.5
by accessing the child node of c
12
2
. The exact score of j
1
is
tj
1
0.6 0.5 1.1; the point j
1
is then pruned because
tj
1
. In summary, two leaf nodes are accessed during
the computation of tj
1
.
Our observation here is that the point j
1
can be pruned
earlier, without accessing the child node of c
12
2
. By taking
the maximum quality of level-1 entries (from F
2
) that
intersect the c-range of j
1
, we derive: t
2
j
1
.c
12
2
0.7.
With the first component score t
1
j
1
0.6, we infer that:
tj
1
0.6 0.7 1.3. Such a value is below so j
1
can
be pruned.
3.4.2 Optimized Computation of Scores
Based on our observation, we propose a tighter derivation
for the upper bound score of j than the one shown in (3).
Let j be an object point in D. Suppose that we have
traversed some paths of the feature trees on F
1
. F
2
. . . . . F
i
.
Let j
c
be an upper bound of the quality value for any
unvisited entry (leaf or nonleaf) of the feature tree F
c
. We
then define the function t

j as
t

j
X
i
c1
maxf.: j : 2 F
c
. di:tj. : c. .:
! j
c
g [ fj
c
g.
4
In the max function, the first set denotes the upper bound
quality of any visited feature point within distance c from j.
The following lemma shows that the value t

j is always
greater than or equal to the actual score tj.
Lemma 1. It holds that t

j ! tj, for any j 2 D.

Proof. If the actual component score t
c
j is above j
c
,
then, t
c
j maxf.: j : 2 F
c
. di:tj. : c. .: ! j
c
g.
Otherwise, we derive t
c
j j
c
. In both cases, we have
t
c
j maxf.: j : 2 F
c
. di:tj. : c. .:
! j
c
g [ fj
c
g.
Therefore, we have t

j ! tj. tu
According to (4), the value t

j is tight only when every

j
c
value is low. In order to achieve this, we access the
feature trees in a round-robin fashion, and traverse the
entries in each feature tree in descending order of quality
values. Round-robin is a popular and effective strategy
used for efficient merging of rankings [7], [9]. Alternative
strategies include the selectivity-based strategy and the
fractal-dimension strategy [22]. These strategies are de-
signed specifically for coping with high-dimensional data,
however, in our problem setting, they have insignificant
performance gain over round-robin.
Algorithm 4 is the pseudocode for computing the scores
of objects efficiently from the feature trees F
1
. F
2
. . . . . F
i
.
The set \ contains objects whose scores need to be
computed. Here, c refers to the distance threshold of the
range score, and represents the best score found so far. For
each feature tree F
c
, we employ a max-heap H
c
to traverse
the entries of F
c
in descending order of their quality values.
The root of F
c
is first inserted into H
c
. The variable j
c
maintains the upper bound quality of entries in the tree that
will be visited. We then initialize each component score
t
c
j of every object j 2 \ to 0.
Algorithm 4. Optimized Group Range Score Algorithm
algorithm Optimized_Group_Range(Trees
F
1
. F
2
. . . . . F
i
, Set \ , Value c, Value )
1: for c : 1 to i do
2: H
c
: new max-heap (with quality score as key);
3: insert F
c
.ioot into H
c
;
4: j
c
: 1;
5: for each entry j 2 \ do
6: t
c
j : 0;
7: c : 1; > ID of the current feature tree
8: while j\ j 0 and there exists a nonempty heap H
c
do
9: deheap an entry c from H
c
;
10: j
c
: .c; >update threshold
11: if 8j 2 \ . iiidi:tj. c c then
12: continue at Line 8;
13: for each j 2 \ do > prune unqualified points
14: if
P
i
c1
maxfj
c
. t
c
jg then
15: remove j from \ ;
16: read the child node C` pointed to by c;
17: for each entry c
0
of C` do
18: if C` is a nonleaf node then
19: if 9j 2 \ . iiidi:tj. c
0
c then
20: insert c
0
into H
c
;
21: else > update component scores
22: for each j 2 \ such that di:tj. c
0
c do
23: t
c
j : maxft
c
j. .c
0
g;
24: c : next (round-robin) value where H
c
is not
empty;
25: for each entry j 2 \ do
26: tj :
P
i
c1
t
c
j;
At Line 7, the variable c keeps track of the ID of the
current feature tree being processed. The loop at Line 8 is
used to compute the scores for the points in the set \ . We
438 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 23, NO. 3, MARCH 2011
then deheap an entry c from the current heap H
c
. The
property of the max-heap guarantees that the quality value
of any future entry deheaped from H
c
is at most .c. Thus,
the bound j
c
is updated to .c. At Lines 11-12, we prune
the entry c if its distance from each object point j 2 \ is
larger than c. In case c is not pruned, we compute the tight
upper bound score t

j for each j 2 \ (by (4)); the object j

is removed from \ if t

j (Lines 13-15).
Next, we access the child node pointed to by c, and
examine each entry c
0
in the node (Lines 16-17). A nonleaf
entry c
0
is inserted into the heap H
c
if its minimum distance
from some j 2 \ is within c (Lines 18-20); whereas a leaf
entry c
0
is used to update the component score t
c
j for any
j 2 \ within distance c from c
0
(Lines 22-23). At Line 24, we
apply the round-robin strategy to find the next c value such
that the heap H
c
is not empty. The loop at Line 8 repeats
while \ is not empty and there exists a nonempty heap H
c
.
At the end, the algorithm derives the exact scores for the
remaining points of \ .
3.4.3 The BB* Algorithm
Based on the above, we extend BB (Algorithm 3) to an
optimized BB* algorithm as follows: First, Lines 11-13 of BB
are replaced by a call to Algorithm 4, for computing the
exact scores for object points in the set \ . Second, Lines 3-5
of BB are replaced by a call to a modified Algorithm 4, for
deriving the upper bound scores for nonleaf entries (in \ ).
Such a modified Algorithm 4 is obtained after replacing
Line 18 by checking whether the node C` is a nonleaf node
above the level-1.
3.5 Feature Join Algorithm
An alternative method for evaluating a top-/ spatial
preference query is to perform a multiway spatial join [23]
on the feature trees F
1
. F
2
. . . . . F
i
to obtain combinations
of feature points which can be in the neighborhood of some
object from D. Spatial regions which correspond to
combinations of high scores are then examined, in order
to find data objects in D having the corresponding feature
combination in their neighborhood. In this section, we first
introduce the concept of a combination, then discuss the
conditions for a combination to be pruned, and finally
elaborate the algorithm used to progressively identify the
combinations that correspond to query results.
Tuple h)
1
. )
2
. . . . . )
i
i is a combination if, for any c 2 1. i,
)
c
is an entry (either leaf or nonleaf) in the feature tree F
c
.
The score of the combination is defined by
th)
1
. )
2
. . . . . )
i
i
X
i
c1
.)
c
. 5
For a nonleaf entry )
c
, .)
c
is the MAX of all feature qualities
in its subtree (stored with )
c
, since F
c
is an aR-tree). A
combination disqualifies the query if
9 i 6 , ^ i. , 2 1. i. iiidi:t)
i
. )
,
2c. 6
When such a condition holds, it is impossible to have a
point in D whose iiidi:t from )
i
and )
,
are both within c,
respectively. The above validity check acts as a multiway
join condition that significantly reduces the number of
combinations to be examined.
Figs. 5a and 5b illustrate the condition for a nonleaf
combination h
1
. 1
2
i and a leaf combination ho
3
. /
4
i,
respectively, to be a candidate combination for the query.
Algorithm5 is a pseudocode of our feature join algorithm.
It employs a max-heap H for managing combinations of
feature entries in descending order of their combination
scores. The score of a combination h)
1
. )
2
. . . . . )
i
i as defined
in (5) is an upper bound of the scores of all combinations
h:
1
. :
2
. . . . . :
i
i of feature points, such that :
c
is located in the
subtree of )
c
for each c 2 1. i. Initially, the combination
with the root pointers of all feature trees is enheaped. We
progressively deheap the combination with the largest score.
If all its entries point to leaf nodes, then we load these nodes
1
1
. . . . . 1
i
andcall Find_Result to traverse the object R-tree D
and find potential results. Find_Result is a variant of the BB
algorithm, with the following difference: 1
1
. . . . . 1
i
are
viewed as i tiny feature trees (each with one node) and
accesses to them incur no extra I/O cost.
Algorithm 5. Feature Join Algorithm
\
/
: new min-heap of size / (initially empty);
: 0; >/th score in \
/
algorithm FJ(Tree D,Trees F
1
. F
2
. . . . . F
i
)
1: H : new max-heap (combination score as the key);
2: insert hF
1
.ioot. F
2
.ioot. . . . . F
i
.iooti into H;
3: while H is not empty do
4: deheap h)
1
. )
2
. . . . . )
i
i from H;
5: if 8 c 2 1. i. )
c
points to a leaf node
6: for c : 1 to i do
7: read the child node 1
c
pointed by )
c
;
8: Find_Result(D.ioot, 1
1
. . . . . 1
i
);
9: else
10: )
c
: highest level entry among )
1
. )
2
. . . . . )
i
;
11: read the child node `
c
pointed by )
c
;
12: for each entry c
c
2 `
c
do
13: insert h)
1
. )
2
. . . . . c
c
. . . . . )
i
i into H if its score is
greater than and it qualifies the query;
algorithm Find_Result(Node `, Nodes 1
1
. . . . . 1
i
)
1: for each entry c 2 ` do
2: if ` is nonleaf then
3: compute T c by entries in 1
1
. . . . . 1
i
;
4: if T c then
5: read the child node `
0
pointed by c;
6: Find_Result(`
0
, 1
1
. . . . . 1
i
);
7: else
8: compute tc by entries in 1
1
. . . . . 1
i
;
9: update \
/
(and ) by c (when necessary);
YIU ET AL.: RANKING SPATIAL DATA BY QUALITY PREFERENCES 439
Fig. 5. Qualified combinations for the join. (a) Nonleaf combination.
(b) Leaf combination.
In case not all entries of the deheaped combination point
to leaf nodes (Line 9 of FJ), we select the one at the highest
level, access its child node `
c
and then form new
combinations with the entries in `
c
. A new combination
is inserted into H for further processing if its score is higher
than and it qualifies the query. The loop (at Line 3)
continues until H becomes empty.
3.6 Extension to Monotonic Aggregate Functions
We now extend our proposed solutions for processing the
top-/ spatial preference query defined by any monotonic
aggregate function AGG. Examples of AGG include (but not
limited to) the MIN and MAX functions.
3.6.1 Adaptation of Incremental Computation
Recall that the incremental computation technique is
applied by algorithms SP, GP, and BB, for reducing I/O
cost. Specifically, even if some component score t
c
j of a
point j has not been computed yet, the upper bound score
t

j of j can be derived by (3). Whenever t

j drops
below the best score found so far , the point j can be
discarded immediately without needing to compute the
unknown component scores of j.
In fact, the algorithms SP, GP, and BB are directly
applicable to any monotonic aggregate function AGG
because (3) can be generalized for AGG. Now, the upper
bound score t

j of j is defined as
t

j AGG
i
c1
t
c
j. if t
c
j is known.
1. otherwise.
&
7
Due to the monotonicity property of AGG, the bound t

j is
guaranteedto be greater thanor equal to the actual score tj.
3.6.2 Adaptation of Upper Bound Computation
The BB* and FJ algorithms compute the upper bound score of
a nonleaf entry of the object tree Dor a combination of entries
from feature trees, by summing its upper bound component
scores. Both BB* and FJ are applicable to any monotonic
aggregate function AGG, with only the slight modifications
discussedbelow. For BB*, we replace the summationoperator
byAGG, in(4), andat Lines 14 and26 of Algorithm4. For FJ, we
replace the summation by AGG, in (5).
4 INFLUENCE SCORE
This section first studies the influence score function that
combines both the qualities and relative locations of feature
points. It then presents the adaptations of our solutions in
Section 3 for the influence score function. Finally, we
discuss how our solutions can be used for other types of
influence score functions.
4.1 Score Definition
The range score has a drawback that the parameter c is not
easy to set. Consider, for instance, the example of the range
score t
iiq
in Fig. 6a, where the white points are object
points in D, the gray points and black points are feature
points in the feature sets F
1
and F
2
, respectively. If c is set
to 0.2 (shown by circles), then the object j
2
has the score
t
iiq
j
2
0.9 0.1 1.0 and it cannot be the best object (as
t
iiq
j
1
1.2). This happens because a high-quality black
feature is barely outside the c-range of j
2
. Had c been
slightly larger, that black feature (with weight 0.6) would
contribute to the score of j
2
, making it the best object.
In the field of statistics, the Gaussian density function
[24] has been used to estimate the density in the space, from
a set F of points. The density at location j is estimated as:
Gj
P
)2F
exp
di:t
2
j.)
2o
2
, where o is a parameter. Its
advantage is that the value Gj is not sensitive to a slight
change in o. Gj is mainly contributed by the points (of F)
close to j and weakly affected by the points far away.
Inspired by the above function, we devise a score
function such that it is not too sensitive to the range
parameter c. In addition, the users in our application
usually prefer a high-quality restaurant (i.e., a feature point)
rather than a large number of low-quality restaurants.
Therefore, we use the maximum operator rather than the
summation in Gj. Specifically, we define the influence score
of an object point j with respect to the feature set F
c
as
t
ii)
c
j maxf .: 2

di:tj.:
c
j : 2 F
c
g. 8
where .: is the quality of :, c is a user-specified range, and
di:tj. : is the distance between j and :.
The overall score t
ii)
j of j is then defined as
t
ii)
j AGG

t
ii)
c
j j c 2 1. i

. 9
where AGG is a monotone aggregate operator and i is the
number of feature data sets. Again, we focus on the case
where AGG is the SUM function.
Let us compute the influence score t
ii)
for the points in
Fig. 6a, assuming c 0.2. From Fig. 6a, we obtain
t
ii)
j
1
max

0.7 2

0.18
0.20
. 0.9 2

0.50
0.20

max

0.5 2

0.18
0.20
. 0.1 2

0.60
0.20
. 0.6 2

0.80
0.20

0.643 and
t
ii)
j
2
max

0.9 2

0.18
0.20
. 0.7 2

0.65
0.20

max

0.1 2

0.19
0.20
. 0.6 2

0.22
0.20
. 0.5 2

0.70
0.20

0.762.
The top-1 point is j
2
, implying that the influence score can
capture feature points outside the range c 0.2. In fact, the
influence score function possesses two nice properties. First,
a feature point : that is barely outside the range c (from the
object point j) still has potential to contribute to the score,
provided that its quality .: is sufficiently high. Second,
the distance di:tj. : has an exponentially decaying effect
440 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 23, NO. 3, MARCH 2011
Fig. 6. Example of influence score (c 0.2). (a) Exact score. (b) Upper
bound property.
on the score, meaning that feature points nearer to j
contribute higher scores.
4.2 Query Processing for SP, GP, BB, and BB*
We now examine the extensions of the SP, GP, BB, and BB*
algorithms for top-/ spatial preference queries defined by
the influence score in (8).
4.2.1 Incremental Computation Technique
Observe that the upper bound of t
ii)
c
j is 1. Therefore, (3)
still holds for the influence score, and the incremental
computation technique (see Section 3.2) can still be applied
in SP, GP, and BB.
4.2.2 Exact Score Computation for a Single Object
For the SP algorithm, we elaborate how to compute the
score t
ii)
c
j (see (8)) of an object j 2 D. This is challenging
because some feature : 2 F
c
outside the c-range of j may
contribute to the score. Unlike the computation of the range
score, we can no longer use the c-range to restrict the search
space.
Given an object point j and an entry c from the feature
tree of F
c
, we define the upper bound function
.
ii)
c. j .c 2

iiidi:tj.c
c
. 10
In case c is a leaf entry (i.e., a feature point :), we have
.
ii)
:. j .: 2

di:tj.:
c
. The following lemma shows that
the value .
ii)
c. j is an upper bound of .
ii)
c
0
. j for any
entry c
0
in the subtree of c.
Lemma 2. Let c and c
0
be entries fromthe feature tree F
c
such that
c
0
is in the subtree of c. It holds that .
ii)
c. j ! .
ii)
c
0
. j, for
any object point j 2 D.
Proof. Let j be any object point j 2 D. Since c
0
falls into the
subtree of c, we have: iiidi:tj. c iiidi:tj. c
0
. As
F
c
is a MAX aR-tree, we have: .c ! .c
0
. Thus, we
have: .
ii)
c. j ! .
ii)
c
0
. j. tu
As an example, Fig. 6b shows an object point j and
three entries c
1
. c
2
. c
3
from the same feature tree. Note
that c
2
and c
3
are in the subtree of c
1
. The dotted lines
indicate the minimum distance from j to c
1
and c
3
,
respectively. Thus, we have .
ii)
c
1
. j 0.8 2

0.2
0.2
0.4
and .
ii)
c
3
. j 0.7 2

0.3
0.2
0.247. Clearly, .
ii)
c
1
. j is
larger than .
ii)
c
3
. j.
By using Lemma 2, we apply the best-first approach to
compute the exact component score t
c
j for the point j;
Algorithm 6 employs a max-heap H in order to visit the
entries of the tree in descending order of their .
ii)
values.
We first insert the root of the tree F
c
into H, and initialize
t
c
j to 0. The loop at Line 4 continues as long as H is not
empty. At Line 5, we deheap an entry c from H. If the value
.
ii)
c. j is above the current t
c
j, then there is potential to
update t
c
j by using some point in the subtree of c. In that
case, we read the child node pointed to by c, and examine
each entry c
0
in that node (Lines 7-8). If c
0
is a nonleaf entry,
it is inserted into H provided that its .
ii)
c
0
. j value is
above t
c
j. Otherwise, it is used to update t
c
j.
Algorithm 6. Object Influence Score Algorithm
algorithm Object_Influence(Point j, Value c, Value c)
1: H : new max-heap (with .
ii)
value as key);
2: insert hF
c
.ioot. 1.0i into H;
3: t
c
j : 0;
4: while H is not empty do
5: deheap an entry c from H;
6: if .
ii)
c. j t
c
j then
7: read the child node C` pointed to by c;
8: for each entry c
0
of C` do
9: if C` is a nonleaf node then
10: if .
ii)
c
0
. j t
c
jthen
11: insert hc
0
. .
ii)
c
0
. ji into H;
12: else > update component score
13: t
c
j : maxft
c
j. .
ii)
c
0
. jg;
4.2.3 Group Computation and Upper Bound
Computation
Recall that, for the case of range scores, both the GP and
BB algorithms apply the group computation technique
(Algorithm 2) for concurrently computing the component
score t
c
j for every object point j in a given set \ . Now,
Algorithm 6 can be modified as follows to support
concurrent computation of influence scores. First, the
parameter j is replaced by a set \ of objects. Second, we
initialize the value t
c
j for each object j 2 \ at Line 3 and
perform the score update for each j 2 \ at Line 13. Third,
the conditions at Lines 6 and 10 are checked whether they
are satisfied by some object j 2 \ .
In addition, the BB algorithm (see Algorithm 3) needs to
compute the upper bound component score T
c
c for all
nonleaf entries in the current node simultaneously. Again,
Algorithm 6 can be modified for this purpose.
4.2.4 Optimized Computation of Scores in BB*
Given an entry c (from a feature tree), we define the upper
bound score of c using a set \ of points as
.
ii)
c. \ max
j2\
.
ii)
c. j. 11
The BB* algorithm applies Algorithm 4 to compute the
range scores for a set \ of object points. With (11), we can
modify Algorithm 4 to compute the influence score, with
the following changes. First, the heap H
c
(at Line 2) is used
to organize its entries c in descending order of the key
.
ii)
c. \ , and the value .c (at Line 10) is replaced by
.
ii)
c. \ . Second, the restrictions based on the c-range (at
Lines 11-12, 19, and 22) are removed. Third, the value .c
0

(at Line 23) needs to be replaced by .

ii)
c
0
. j.
4.3 Query Processing for FJ
The FJ algorithm can be adapted for the influence score, but
with two changes. Recall that the tuple h)
1
. )
2
. . . . . )
i
i is
said to be a combination if )
c
is an entry in the feature tree
F
c
, for any c 2 1. i.
First, (6) can no longer be used to prune a combination
based on distances among the entries in the combination.
Any possible combination must be considered if its upper
bound score is above the best score found so far .
Second, (5) is now a loose upper bound value for the
influence score because it ignores the distances among the
YIU ET AL.: RANKING SPATIAL DATA BY QUALITY PREFERENCES 441
entries )
c
. Therefore, we need to develop a tighter upper
bound for the influence score.
The following lemma shows that, given a set of
rectangles that partition the spatial domain DOM, the value
max
i2
P
i
c1
.
ii)
)
c
. i is an upper bound of the value
P
i
c1
.
ii)
)
c
. j for any point j (in DOM).
Lemma 3. Let be a set of rectangles which partition
the spatial domain DOM. Given the tree entries
)
1
. )
2
. . . . . )
i
(from the respective feature trees), it holds
that max
i2
P
i
c1
.
ii)
)
c
. i !
P
i
c1
.
ii)
)
c
. j, for any
point j 2 DOM.
Proof. Let j be a point in DOM. There exists an rectangle
i
0
2 such that j falls into i
0
. Thus, for any c 2 1. i,
we have iiidi:t)
c
. i
0
iiidi:t)
c
. j, and derive
.
ii)
)
c
. i
0
! .
ii)
)
c
. j. By summing all components,
we obtain
P
i
c1
.
ii)
)
c
. i
0
!
P
i
c1
.
ii)
)
c
. j. As i
0
2 ,
we also have max
i2
P
i
c1
.
ii)
)
c
. i !
P
i
c1
.
ii)
)
c
. i
0
.
Therefore, the lemma is proved. tu
In fact, the above upper bound value
max
i2
X
i
c1
.
ii)
)
c
. i
can be tightened by dividing the rectangles of into smaller
rectangles.
Fig. 7a shows the combination h)
1
. )
2
i, whose entries
belong to the feature trees F
1
and F
2
, respectively. We
first partition the domain space into four rectangles
i
1
. i
2
. i
3
. i
4
, and then compute their upper bound values
(shown in the figure). Thus, the current upper bound score
of h)
1
. )
2
i is taken as: maxf1.5. 1.0. 0.1. 0.8g 1.5. To tight-
en the upper bound, we pick the rectangle (i
1
) with the
highest value and partition it into four rectangles
i
11
. i
12
. i
13
. i
14
(see Fig. 7b). Now, the upper bound score
of h)
1
. )
2
i becomes: maxf0.5. 1.1. 0.8. 0.7. 1.0. 0.1. 0.8g 1.1.
By applying this method iteratively, the upper bound score
can be gradually tightened.
Algorithm 7 is the pseudocode for computing the upper
bound score for the combination h)
1
. )
2
. . . . . )
i
i of feature
entries. The parameter represents the best score found so
far (in FJ). The value 1
ior
is used to control the number of
iterations in the algorithm; its typical value is 20. At Line 1,
we employ a max-heap H to organize its rectangles in
descending order of their upper bound scores. Then, we
insert the spatial domain rectangle into H. The loop at Line 4
continues while H is not empty and 1
ior
0. After
deheaping a rectangle i from H (Line 5), we partition it
into four child rectangles. Each child rectangle i
0
is inserted
into H if its upper bound score is above . We then
decrement 1
ior
(at Line 10). At the end (Lines 11-14), if the
heap H is not empty, then algorithm returns the key value of
Hs top entry as the upper bound score. Such a value is
guaranteed to be the maximum upper bound value in the
heap. Otherwise (i.e., empty H), the algorithm returns as
the upper bound score because all rectangles with score
below have been pruned.
Algorithm 7. FJ Upper Bound Computation Algorithm
algorithm FJ_Influence(Value c, Value , Value 1
ior
,
Entries )
1
. )
2
. . . . . )
i
)
1: H : new max-heap (with upper bound score as key);
2: let DOM be the rectangle of the spatial domain;
3: insert hDOM.
P
i
c1
.)
c
i into H;
4: while 1
ior
0 and H is not empty do
5: deheap a rectangle i from H;
6: partition i into four child rectangles;
7: for each child rectangle i
0
of i do
8: if
P
i
c1
.
ii)
)
c
. i
0
then
9: insert hi
0
.
P
i
c1
.
ii)
)
c
. i
0
i into H;
10: 1
ior
: 1
ior
1;
11: if H is not empty then
12: return the key value of the top entry of H;
13: else
14: return ;
4.4 Extension to Generic Influence Scores
Our algorithms can be also applied to other types of
influence score functions. Given a function U : < ! <, we
model a score function as .
ii)
:. j .: Udi:tj. :,
where j is an object point and : 2 F
c
is a feature point.
Let c be a feature tree entry. The crux of our solution is to
redefine the upper bound function .
ii)
c. j (like in (10))
such that .
ii)
c. j ! .
ii)
:. j, for any feature point : in the
subtree of c.
In fact, the upper bound function can be expressed as
.
ii)
c. j .c Ud, where d is a distance value. Observe
that d must fall in the interval iiidi:tj. c. iordi:tj. c.
Thus, we apply a numerical method (e.g., the bisection
method) to find the value d 2 iiidi:tj. c. iordi:tj. c
that maximizes the value of U.
For the special case that U is a monotonic decreasing
function, we can simply set d iiidi:tj. c because it
definitely maximizes the value of U.
5 EXPERIMENTAL EVALUATION
In this section, we compare the efficiency of the proposed
algorithms using real and synthetic data sets. Each data set is
indexed by an aR-tree with 4 K bytes page size. We used an
LRU memory buffer whose default size is set to 0.5 percent
of the sum of tree sizes (for the object and feature trees used).
Our algorithms were implemented in C++ and experiments
were run on a Pentium D 2.8 GHz PC with 1 GB of RAM. In
all experiments, we measure both the I/O cost (in number of
page faults) and the total execution time (in seconds) of our
algorithms. Section 5.1 describes the experimental settings.
Sections 5.2 and 5.3 study the performance of the proposed
algorithms for queries with range scores and influence
scores, respectively. We then present our experimental
findings on real data in Section 5.4.
442 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 23, NO. 3, MARCH 2011
Fig. 7. Deriving upper bound of the influence score for FJ. (a) First
iteration. (b) Second iteration.
5.1 Experimental Settings
We used both real and synthetic data for the experiments.
The real data sets will be described in Section 5.4. For each
synthetic data set, the coordinates of points are random
values uniformly and independently generated for different
dimensions. By default, an object data set contains
200K points and a feature data set contains 100K points.
The point coordinates of all data sets are normalized to the
2D space 0. 10000
2
.
For a feature data set F
c
, we generated qualities for its
points such that they simulate a real-world scenario:
facilities close to (far from) a town center often have high
(low) quality. For this, a single anchor point :

is selected
such that its neighborhood region contains high number of
points. Let di:t
iii
(di:t
ior
) be the minimum (maximum)
distance of a point in F
c
from the anchor :

. Then, the
quality of a feature point : is generated as
.:
di:t
ior
di:t:. :

di:t
ior
di:t
iii

0
. 12
where di:t:. :

stands for the distance between : and :

,
and 0 controls the skewness (default: 1.0) of quality
distribution. In this way, the qualities of points in F
c
lie
in 0. 1 and the points closer to the anchor have higher
qualities. Also, the quality distribution is highly skewed for
large values of 0.
We study the performance of our algorithms with
respect to various parameters, which are displayed in
Table 2 (their default values are shown in bold). In each
experiment, only one parameter varies while the others are
fixed to their default values.
5.2 Performance on Queries with Range Scores
This section studies the performance of our algorithms for
top-/ spatial preference queries on range scores.
Table 3 shows the I/O cost and execution time of the
algorithms, for different aggregate functions (SUM, MIN, and
MAX). GP has lower cost than SP because GP computes the
scores of points within the same leaf node concurrently. The
incremental computation technique (used by SP and GP)
derives a tight upper bound score (of each point) for the MIN
function, a partially tight bound for SUM, and a loose bound
for MAX (see Section 3.6). This explains the performance of SP
and GP across different aggregate functions. However, the
cost of the other methods are mainly influenced by the
effectiveness of pruning. BB employs an effective technique
to prune unqualified nonleaf entries in the object tree so it
outperforms GP. The optimized score computation method
enables BB* to save on average 20 percent I/Oand 30 percent
time of BB. FJ outperforms its competitors as it discovers
qualified combination of feature entries early.
We ignore SP in subsequent experiments, and compare
the cost of the remaining algorithms on synthetic data sets
with respect to different parameters.
Next, we empirically justify the choice of using level-1
entries of feature trees F
c
for the upper bound score
computation routine in the BB algorithm (see Section 3.3). In
this experiment, we use the default parameter setting and
study how the number of node accesses of BB is affected by
the level of F
c
used. Table 4 shows the decomposition of
node accesses over the tree D and the trees F
c
, and the
statistics of upper bound score computation. Each accessed
nonleaf node of D invokes a call of the upper bound score
computation routine.
When level-0 entries of F
c
are used, each upper bound
computation call incurs a high number (617.5) of node
accesses (of F
c
). On the other hand, using level-2 entries for
upper bound computation leads to very loose bounds,
making it difficult to prune the leaf nodes of D. Observe
that the total cost is minimized when level-1 entries (of F
c
)
are used. In that case, the node accesses per upper bound
computation call is low (15), and yet the obtained bounds
are tight enough for pruning most leaf nodes of D.
Fig. 8 plots the cost of the algorithms as a function of the
buffer size. As the buffer size increases, the I/O of all
algorithms drops. FJ remains the best method, BB* the
second, and BB the third; all of them outperform GP by a
wide margin. Since the buffer size does not affect the pruning
effectiveness of the algorithms, it has a small impact on the
execution time.
Fig. 9 compares the cost of the algorithms with respect to
the object data size jDj. Since the cost of FJ is dominated by
the cost of joining feature data sets, it is insensitive to jDj.
On the other hand, the cost of the other methods (GP, BB,
and BB*) increases with jDj, as score computations need to
be done for more objects in D.
YIU ET AL.: RANKING SPATIAL DATA BY QUALITY PREFERENCES 443
TABLE 2
Range of Parameter Values
TABLE 3
Effect of the Aggregate Function, Range Scores
TABLE 4
Effect of the Level of F
c
Used for Upper Bound Score
Computation in the BB Algorithm
Fig. 10 plots the I/O cost of the algorithms with respect
to the feature data size jFj (of each feature data set). As jFj
increases, the cost of GP, BB, and FJ increases. In contrast,
BB* experiences a slight cost reduction as its optimized
score computation method (for objects and nonleaf entries)
is able to perform pruning early at a large jFj value.
Fig. 11 plots the cost of the algorithms with respect to the
number i of feature data sets. The costs of GP, BB, and BB*
rise linearly as i increases because the number of
component score computations is at most linear to i. On
the other hand, the cost of FJ increases significantly with i,
because the number of qualified combinations of entries is
exponential to i.
Fig. 12 shows the cost of the algorithms as a function of
the number / of requested results. GP, BB, and BB* compute
the scores of objects in D in batches, so their performance is
insensitive to /. As / increases, FJ has weaker pruning
power and its cost increases slightly.
Fig. 13 shows the cost of the algorithms, when varying
the query range c. As c increases, all methods access more
nodes in feature trees to compute the scores of the points.
The difference in execution time between BB* and FJ shrinks
as c increases. In summary, although FJ is the clear winner
in most of the experimental instances, its performance is
significantly affected by the number i of feature data sets.
BB* is the most robust algorithm to parameter changes and
it is recommended for problems with large i.
5.3 Performance on Queries with Influence Scores
We proceed to examine the cost of our algorithms for top-/
spatial preference queries on influence scores.
Fig. 14 compares the cost of the algorithms with respect
to the number i of feature data sets. The cost follows the
trend in Fig. 11. Again, the number of combinations
examined by FJ increases exponentially with i so its cost
increases rapidly.
Fig. 15 plots the cost of the algorithms by varying the
number / of requested results. Observe that FJ becomes
more expensive than BB* (in both I/O and time) when the
value of / is beyond 8. This is attributed to two reasons.
First, FJ incurs extra computational cost as it needs to
invoke Algorithm 7 for computing the upper bound score of
a combination of feature entries. Second, FJ incurs high I/O
cost to identify objects in D that produce high scores with
the current combination of features.
Fig. 16 shows the cost of the algorithms as a function of
the parameter c. Interestingly, the trend here is different
from the one in Fig. 13. According to (8), when c decreases,
the influence score also decreases, rendering it more
difficult to distinguish the scores among different objects.
Thus, the cost of BB, BB*, and FJ becomes high at a low c
value. Summing up, for the newly introduced influence
score, FJ is more sensitive to parameter changes and it loses
to BB* not only when there are multiple feature data sets,
but also at large /.
444 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 23, NO. 3, MARCH 2011
Fig. 9. Effect of jDj, range scores. (a) I/O. (b) Time.
Fig. 10. Effect of jFj, range scores. (a) I/O. (b) Time.
Fig. 11. Effect of i, range scores. (a) I/O. (b) Time.
Fig. 12. Effect of /, range scores. (a) I/O. (b) Time.
Fig. 13. Effect of c, range scores. (a) I/O. (b) Time.
Fig. 8. Effect of buffer size, range scores. (a) I/O. (b) Time.
5.4 Results on Real Data
In this section, we conduct experiments on real object and
feature data sets in order to demonstrate the application of
top-/ spatial preference queries.
We obtained three real spatial data sets from a travel
portal website, http://www.allstays.com/. Locations in
these data sets correspond to (longitude and latitude)
coordinates in US. We cleaned the data sets by discarding
records without longitude and latitude. Each remaining
location is normalized to a point in the 2D space
0. 10. 000
2
. One data set is used as the object data set
and the other two are used as feature data sets. The object
data set D contains 11,399 camping locations. The feature
data set F
1
contains 30,921 hotel records, each with a room
price (quality) and a location. The feature data set F
2
has
3,848 records of Wal-Mart stores, each with a gasoline
availability (quality) and a location. The domain of each
quality attribute (e.g., room price and gasoline availability)
is normalized to the unit interval 0. 1. Intuitively, a
camping location is considered as good if it is close to a
Wal-Mart store with high gasoline availability (i.e.,
convenient supply) and a hotel with high room price
(which indirectly reflects the quality of nearby outdoor
environment).
Fig. 17 plots the cost of the algorithms with respect to c,
for queries with range scores. At a very small c value, most of
the objects have the zero score as they have no feature points
within their neighborhood. This forces BB, BB*, and FJ to
access a larger number of objects (or feature combinations)
before finding an object with nonzero score, which can then
be used for pruning other unqualified objects.
Fig. 18 compares the cost of the algorithms with respect
to c, for queries with influence scores. In general, the cost
follows the trend in Fig. 16. BB* outperforms BB at low
c value whereas BB incurs a slightly lower cost than BB* at a
high c value. Observe that the cost of BB and BB* is close to
that of FJ when c is sufficiently high. In summary, the
relative performance between the algorithms in all experi-
ments is consistent to the results on synthetic data.
6 CONCLUSION
In this paper, we studied top-/ spatial preference queries,
which provide a novel type of ranking for spatial objects
based on qualities of features in their neighborhood. The
neighborhood of an object j is captured by the scoring
function: 1) the range score restricts the neighborhood to a
crisp region centered at j, whereas 2) the influence score
relaxes the neighborhood to the whole space and assigns
higher weights to locations closer to j.
We presented five algorithms for processing top-/
spatial preference queries. The baseline algorithm SP
computes the scores of every object by querying on feature
data sets. The algorithm GP is a variant of SP that reduces
I/O cost by computing scores of objects in the same leaf
node concurrently. The algorithm BB derives upper bound
scores for nonleaf entries in the object tree, and prunes
those that cannot lead to better results. The algorithm BB*
is a variant of BB that utilizes an optimized method for
computing the scores of objects (and upper bound scores
of nonleaf entries). The algorithm FJ performs a multiway
join on feature trees to obtain qualified combinations of
feature points and then search for their relevant objects in
the object tree.
Based on our experimental findings, BB* is scalable to
large data sets and it is the most robust algorithm with
respect to various parameters. However, FJ is the best
algorithm in cases where the number i of feature data sets
is low and each feature data set is small.
YIU ET AL.: RANKING SPATIAL DATA BY QUALITY PREFERENCES 445
Fig. 15. Effect of /, influence scores. (a) I/O. (b) Time.
Fig. 16. Effect of c, influence scores. (a) I/O. (b) Time.
Fig. 17. Effect of c, range scores, real data. (a) I/O. (b) Time.
Fig. 18. Effect of c, influence scores, real data. (a) I/O. (b) Time.
Fig. 14. Effect of i, influence scores. (a) I/O. (b) Time.
In the future, we will study the top-/ spatial preference
query on a road network, in which the distance between
two points is defined by their shortest path distance rather
than their euclidean distance. The challenge is to develop
alternative methods for computing the upper bound scores
for a group of points on a road network.
ACKNOWLEDGMENTS
This work was supported by grant HKU 715509E from
Hong Kong RGC.
REFERENCES
[1] M.L. Yiu, X. Dai, N. Mamoulis, and M. Vaitis, Top-k Spatial
Preference Queries, Proc. IEEE Intl Conf. Data Eng. (ICDE),
2007.
[2] N. Bruno, L. Gravano, and A. Marian, Evaluating Top-k Queries
over Web-Accessible Databases, Proc. IEEE Intl Conf. Data Eng.
(ICDE), 2002.
[3] A. Guttman, R-Trees: A Dynamic Index Structure for Spatial
Searching, Proc. ACM SIGMOD, 1984.
[4] G.R. Hjaltason and H. Samet, Distance Browsing in Spatial
Databases, ACM Trans. Database Systems, vol. 24, no. 2, pp. 265-
318, 1999.
[5] R. Weber, H.-J. Schek, and S. Blott, A Quantitative Analysis and
Performance Study for Similarity-Search Methods in High-
Dimensional Spaces, Proc. Intl Conf. Very Large Data Bases
(VLDB), 1998.
[6] K.S. Beyer, J. Goldstein, R. Ramakrishnan, and U. Shaft, When is
Nearest Neighbor Meaningful? Proc. Seventh Intl Conf. Database
Theory (ICDT), 1999.
[7] R. Fagin, A. Lotem, and M. Naor, Optimal Aggregation
Algorithms for Middleware, Proc. Intl Symp. Principles of
Database Systems (PODS), 2001.
[8] I.F. Ilyas, W.G. Aref, and A. Elmagarmid, Supporting Top-k Join
Queries in Relational Databases, Proc. 29th Intl Conf. Very Large
Data Bases (VLDB), 2003.
[9] N. Mamoulis, M.L. Yiu, K.H. Cheng, and D.W. Cheung, Efficient
Top-k Aggregation of Ranked Inputs, ACM Trans. Database
Systems, vol. 32, no. 3, p. 19, 2007.
[10] D. Papadias, P. Kalnis, J. Zhang, and Y. Tao, Efficient OLAP
Operations in Spatial Data Warehouses, Proc. Intl Symp. Spatial
and Temporal Databases (SSTD), 2001.
[11] S. Hong, B. Moon, and S. Lee, Efficient Execution of Range Top-k
Queries in Aggregate R-Trees, IEICE Trans. Information and
Systems, vol. 88-D, no. 11, pp. 2544-2554, 2005.
[12] T. Xia, D. Zhang, E. Kanoulas, and Y. Du, On Computing Top-t
Most Influential Spatial Sites, Proc. 31st Intl Conf. Very Large Data
Bases (VLDB), 2005.
[13] Y. Du, D. Zhang, and T. Xia, The Optimal-Location Query, Proc.
Intl Symp. Spatial and Temporal Databases (SSTD), 2005.
[14] D. Zhang, Y. Du, T. Xia, and Y. Tao, Progessive Computation of
The Min-Dist Optimal-Location Query, Proc. 32nd Intl Conf. Very
Large Data Bases (VLDB), 2006.
[15] Y. Chen and J.M. Patel, Efficient Evaluation of All-Nearest-
Neighbor Queries, Proc. IEEE Intl Conf. Data Eng. (ICDE), 2007.
[16] P.G.Y. Kumar and R. Janardan, Efficient Algorithms for Reverse
Proximity Query Problems, Proc. 16th ACM Intl Conf. Advances in
Geographic Information Systems (GIS), 2008.
[17] M.L. Yiu, P. Karras, and N. Mamoulis, Ring-Constrained Join:
Deriving Fair Middleman Locations from Pointsets via a
Geometric Constraint, Proc. 11th Intl Conf. Extending Database
Technology (EDBT), 2008.
[18] M.L. Yiu, N. Mamoulis, and P. Karras, Common Influence Join:
A Natural Join Operation for Spatial Pointsets, Proc. IEEE Intl
Conf. Data Eng. (ICDE), 2008.
[19] Y.-Y. Chen, T. Suel, and A. Markowetz, Efficient Query
Processing in Geographic Web Search Engines, Proc. ACM
SIGMOD, 2006.
[20] V.S. Sengar, T. Joshi, J. Joy, S. Prakash, and K. Toyama, Robust
Location Search from Text Queries, Proc. 15th Ann. ACM Intl
Symp. Advances in Geographic Information Systems (GIS), 2007.
[21] S. Berchtold, C. Boehm, D. Keim, and H. Kriegel, A Cost Model
for Nearest Neighbor Search in High-Dimensional Data Space,
Proc. ACM Symp. Principles of Database Systems (PODS), 1997.
[22] E. Dellis, B. Seeger, and A. Vlachou, Nearest Neighbor Search on
Vertically Partitioned High-Dimensional Data, Proc. Seventh Intl
Conf. Data Warehousing and Knowledge Discovery (DaWaK), pp. 243-
253, 2005.
[23] N. Mamoulis and D. Papadias, Multiway Spatial Joins, ACM
Trans. Database Systems, vol. 26, no. 4, pp. 424-475, 2001.
[24] A. Hinneburg and D.A. Keim, An Efficient Approach to
Clustering in Large Multimedia Databases with Noise, Proc.
Fourth Intl Conf. Knowledge Discovery and Data Mining (KDD),
1998.
Man Lung Yiu received the bachelors degree in
computer engineering and the PhD degree in
computer science from the University of Hong
Kong in 2002 and 2006, respectively. Prior to his
current post, he worked at Aalborg University for
three years starting in the Fall of 2006. He is now
an assistant professor in the Department of
Computing, Hong Kong Polytechnic University.
His research focuses on the management of
complex data, in particular, query processing
topics on spatiotemporal data and multidimensional data.
Hua Lu received the BSc and MSc degrees
from Peking University, China, in 1998 and
2001, respectively, and the PhD degree in
computer science from the National University
of Singapore in 2007. He is currently an
assistant professor in the Department of
Computer Science, Aalborg University, Den-
mark. His research interests include skyline
queries, spatiotemporal databases, geographic
information systems, and mobile computing.
He is a member of the IEEE.
Nikos Mamoulis received the diploma in com-
puter engineering and informatics from the
University of Patras, Greece, in 1995, and the
PhD degree in computer science from the Hong
Kong University of Science and Technology in
2000. He is currently an associate professor in
the Department of Computer Science, University
of Hong Kong, which he joined in 2001. In the
past, he has worked as a research and devel-
opment engineer at the Computer Technology
Institute, Patras, Greece and as a postdoctoral researcher at the
Centrum voor Wiskunde en Informatica (CWI), the Netherlands. During
2008-2009, he was on leave to the Max-Planck Institute for Informatics
(MPII), Germany. His research focuses on the management and mining
of complex data types, including spatial, spatiotemporal, object-
relational, multimedia, text, and semistructured data. He has served
on the program committees of more than 70 international conferences
and workshops on data management and data mining. He was the
general chair of SSDBM 2008, the PC chair of SSTD 2009, and he
organized the SSTDM 2006 and DBRank 2009 workshops. He has
served as PC vicechair of ICDM 2007, ICDM 2008, and CIKM 2009. He
was the publicity chair of ICDE 2009. He is an editorial board member
for Geoinformatica Journal and was a field editor of the Encyclopedia of
Geographic Information Systems.
Michail Vaitis received the diploma in 1992 and
the PhD degree in computer engineering and
informatics from the University of Patras,
Greece, in 2001. He was collaborating for five
years with the Computer Technology Institute
(CTI), Greece, as a research and development
engineer, working on hypertext and database
systems. Now, he is an assistant professor at
the Department of Geography, University of the
Aegean, Greece, which he joined in 2003. His
research interests include geographical databases, spatial data infra-
structures, and hypermedia models and services.
446 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 23, NO. 3, MARCH 2011

Efficient Ranking On Road Network by Quality Preferences
No ratings yet
Efficient Ranking On Road Network by Quality Preferences
10 pages
Ranking Spatial Data by Quality Preferences: Abstract
0% (1)
Ranking Spatial Data by Quality Preferences: Abstract
16 pages
Indexing and Ranking in Spatial Database
No ratings yet
Indexing and Ranking in Spatial Database
10 pages
Padmapriya - KSR Engineering College
No ratings yet
Padmapriya - KSR Engineering College
6 pages
Design of Reconfigurable Network-on-Chip Topology
No ratings yet
Design of Reconfigurable Network-on-Chip Topology
5 pages
Designing Yelp or Nearby Friends Grokking The System Design Interview
No ratings yet
Designing Yelp or Nearby Friends Grokking The System Design Interview
8 pages
Fast Nearest Neighbor Search With Keywords: Yufei Tao Cheng Sheng
No ratings yet
Fast Nearest Neighbor Search With Keywords: Yufei Tao Cheng Sheng
13 pages
ApacheCon 2018 Geo 3 Calcite Spatial Hyde
No ratings yet
ApacheCon 2018 Geo 3 Calcite Spatial Hyde
48 pages
Calculating The Longest Dominated Locations: Abstract
No ratings yet
Calculating The Longest Dominated Locations: Abstract
3 pages
Vikram Narayan Research Engineer GREYC, University of Caen
No ratings yet
Vikram Narayan Research Engineer GREYC, University of Caen
19 pages
Efficiently Searching Nearest Neighbor in Documents
No ratings yet
Efficiently Searching Nearest Neighbor in Documents
3 pages
Spatial Database Management Intro
100% (1)
Spatial Database Management Intro
26 pages
An Integrated Approach For Image Retrieval Based On Content
No ratings yet
An Integrated Approach For Image Retrieval Based On Content
6 pages
Unit 3
No ratings yet
Unit 3
15 pages
UNIT IV - Spacial Data Analysis
No ratings yet
UNIT IV - Spacial Data Analysis
42 pages
Applying Multicriteria Algorithms To Restaurant Recommendation
No ratings yet
Applying Multicriteria Algorithms To Restaurant Recommendation
5 pages
Spatial Nearest Neighbor Skyline
No ratings yet
Spatial Nearest Neighbor Skyline
15 pages
Lecture 4
No ratings yet
Lecture 4
12 pages
Fast Searching of Nearest Neighbor Using Key Values in Data Mining
No ratings yet
Fast Searching of Nearest Neighbor Using Key Values in Data Mining
5 pages
Big Data Storage Techniques For Spatial Databases: Implications of Big Data Architecture On Spatial Query Processing
No ratings yet
Big Data Storage Techniques For Spatial Databases: Implications of Big Data Architecture On Spatial Query Processing
27 pages
5 CBIR Based On Integrating Region 2010
No ratings yet
5 CBIR Based On Integrating Region 2010
3 pages
Uses Preference Analysis For Frequent Peer Dominator
No ratings yet
Uses Preference Analysis For Frequent Peer Dominator
76 pages
Business Space Recommendation System
No ratings yet
Business Space Recommendation System
80 pages
Chapter 6 PDF
No ratings yet
Chapter 6 PDF
77 pages
Spatial Big Data Science: Zhe Jiang Shashi Shekhar
100% (3)
Spatial Big Data Science: Zhe Jiang Shashi Shekhar
138 pages
Sample Paper PDF
No ratings yet
Sample Paper PDF
3 pages
Week8 Geo372 Decision Making PDF
No ratings yet
Week8 Geo372 Decision Making PDF
58 pages
Sdms Chapter 1
No ratings yet
Sdms Chapter 1
41 pages
New Cbir System
No ratings yet
New Cbir System
10 pages
Fast Nearest Neighbor Search With Keywords
No ratings yet
Fast Nearest Neighbor Search With Keywords
3 pages
Spatial Database Management Guide
No ratings yet
Spatial Database Management Guide
45 pages
DWDM - Unit - VIII
No ratings yet
DWDM - Unit - VIII
32 pages
Week 4 Module 5 GIS Analysis 01
No ratings yet
Week 4 Module 5 GIS Analysis 01
20 pages
HISTSFC: Optimization For ND Massive Spatial Points Querying
No ratings yet
HISTSFC: Optimization For ND Massive Spatial Points Querying
22 pages
Spatial and Temporal Database
No ratings yet
Spatial and Temporal Database
44 pages
26 2010 CVPR Spatial Bag of Words
No ratings yet
26 2010 CVPR Spatial Bag of Words
8 pages
Query Processing Optimization
No ratings yet
Query Processing Optimization
20 pages
An Effective Framework For Skyline Queries Using PCA
No ratings yet
An Effective Framework For Skyline Queries Using PCA
5 pages
Image Retrieval
No ratings yet
Image Retrieval
7 pages
Unit 2
No ratings yet
Unit 2
25 pages
Report-Converted Sip
No ratings yet
Report-Converted Sip
14 pages
Spatial Data Mining: Presented By-: Rajkumar Jain M.tech (C.s.e) 1 Year (2 Sem)
0% (1)
Spatial Data Mining: Presented By-: Rajkumar Jain M.tech (C.s.e) 1 Year (2 Sem)
27 pages
Multiple Product Aspect Ranking Using Sentiment Classification
No ratings yet
Multiple Product Aspect Ranking Using Sentiment Classification
5 pages
Document 7 1
No ratings yet
Document 7 1
9 pages
Qualitative Representations in Large Spatial Databases
No ratings yet
Qualitative Representations in Large Spatial Databases
8 pages
3 Pagesynopsis 2
No ratings yet
3 Pagesynopsis 2
3 pages
CH 5
No ratings yet
CH 5
53 pages
Location Based
No ratings yet
Location Based
11 pages
A Novel Spatio-Temporal Attributes Index Based Query For Wireless Sensor Networks
No ratings yet
A Novel Spatio-Temporal Attributes Index Based Query For Wireless Sensor Networks
3 pages
Restaurant Recommendation1
No ratings yet
Restaurant Recommendation1
5 pages
Spatial Data Mining Techniques
No ratings yet
Spatial Data Mining Techniques
8 pages
GIS Spatial Data Analysis Guide
No ratings yet
GIS Spatial Data Analysis Guide
19 pages
Paper-Pranav - 2024-25
No ratings yet
Paper-Pranav - 2024-25
20 pages
AI-Powered Real Estate Platform
No ratings yet
AI-Powered Real Estate Platform
127 pages
Spatiotemporal Data Mining
No ratings yet
Spatiotemporal Data Mining
27 pages
Unit 4 Spatial Big Data
No ratings yet
Unit 4 Spatial Big Data
14 pages
Spatial Data Management
No ratings yet
Spatial Data Management
7 pages
Image Classification For Content-Based Indexing
No ratings yet
Image Classification For Content-Based Indexing
14 pages
Year 5 Autumn Term 1 SPaG Activity Mat 2
No ratings yet
Year 5 Autumn Term 1 SPaG Activity Mat 2
6 pages
Solutions Manual for Electronic Materials
No ratings yet
Solutions Manual for Electronic Materials
26 pages
C Programming and Data Structures - Unit I Notes
No ratings yet
C Programming and Data Structures - Unit I Notes
40 pages
3 The Writing Process
No ratings yet
3 The Writing Process
28 pages
Wa0000.
No ratings yet
Wa0000.
5 pages
Smart GPSLog
No ratings yet
Smart GPSLog
5 pages
18.reading Mysterious Creatures
No ratings yet
18.reading Mysterious Creatures
1 page
ESL Lesson Introduction & Presentation
No ratings yet
ESL Lesson Introduction & Presentation
3 pages
Download full Solving Problems in Mathematical Analysis, Part III: Curves and Surfaces, Conditional Extremes, Curvilinear Integrals, Complex Functions, Singularities and Fourier Series 1st Edition Tomasz Radożycki ebook all chapters
No ratings yet
Download full Solving Problems in Mathematical Analysis, Part III: Curves and Surfaces, Conditional Extremes, Curvilinear Integrals, Complex Functions, Singularities and Fourier Series 1st Edition Tomasz Radożycki ebook all chapters
65 pages
Ansys Fluent Text Command List
No ratings yet
Ansys Fluent Text Command List
582 pages
Unit 3 - Listening - STUDENT
No ratings yet
Unit 3 - Listening - STUDENT
3 pages
Electronics Engineer Portfolio
No ratings yet
Electronics Engineer Portfolio
1 page
SEO Basics: Search Engines & Optimization
No ratings yet
SEO Basics: Search Engines & Optimization
52 pages
Inquiry Unit Planning Template
No ratings yet
Inquiry Unit Planning Template
4 pages
Elasticsearch Docker Setup & Queries
No ratings yet
Elasticsearch Docker Setup & Queries
2 pages
Have You Got Any Brothers or Sisters? Yes, I Have. / No, I Haven't
No ratings yet
Have You Got Any Brothers or Sisters? Yes, I Have. / No, I Haven't
2 pages
Ch-4 - Introduction To Calculus
No ratings yet
Ch-4 - Introduction To Calculus
51 pages
Arduino Motor Shield 2A
No ratings yet
Arduino Motor Shield 2A
6 pages
Ir Mod4 Notes
No ratings yet
Ir Mod4 Notes
19 pages
Momen
No ratings yet
Momen
2 pages
Lesson Plan For Grade 12 DANCE
100% (1)
Lesson Plan For Grade 12 DANCE
2 pages
Chapter 23
100% (1)
Chapter 23
48 pages
First Holy Communion (A5 Booklet)
100% (1)
First Holy Communion (A5 Booklet)
7 pages
2007 Canadian Computing Competition: Junior Division: Sponsor
No ratings yet
2007 Canadian Computing Competition: Junior Division: Sponsor
11 pages
Cat-Themed Musical Score
No ratings yet
Cat-Themed Musical Score
9 pages
Literatures of South India Notes
No ratings yet
Literatures of South India Notes
3 pages
41 - Sermon Outlines 2017
No ratings yet
41 - Sermon Outlines 2017
155 pages
Lesson Plan Math-3 (Detailed 1)
No ratings yet
Lesson Plan Math-3 (Detailed 1)
10 pages
XML and Web Database
No ratings yet
XML and Web Database
10 pages
Transformers and Attention Mechanisms - Pre Quiz - Attempt Review
No ratings yet
Transformers and Attention Mechanisms - Pre Quiz - Attempt Review
5 pages

Ranking Spatial Data by Quality Preferences

Uploaded by

Ranking Spatial Data by Quality Preferences

Uploaded by

Ranking Spatial Data by Quality Preferences

j is greater than or equal

c (see (3)) cannot be greater than the best-/ score

c then > upper bound score

c is not greater than the best score found so far .

c (see (3)) is not tight

j 0.6 1.0 1.6. Such a bound is above 1.4 so we

j ! tj, for any j 2 D.

j is tight only when every

j for each j 2 \ (by (4)); the object j

j of j can be derived by (3). Whenever t

(at Line 23) needs to be replaced by .

stands for the distance between : and :

You might also like