Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
13 views6 pages

PPQP Ieee 04092014

The document presents a framework for a privacy-preserving query service that integrates data from multiple heterogeneous databases while ensuring the anonymity of customers and data sources. It emphasizes the importance of maintaining identity, query, data, and result privacy through various encryption techniques, particularly commutative encryption. The proposed system allows for horizontal integration of data, enabling efficient query processing without compromising sensitive information.

Uploaded by

Subrata Bose
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views6 pages

PPQP Ieee 04092014

The document presents a framework for a privacy-preserving query service that integrates data from multiple heterogeneous databases while ensuring the anonymity of customers and data sources. It emphasizes the importance of maintaining identity, query, data, and result privacy through various encryption techniques, particularly commutative encryption. The proposed system allows for horizontal integration of data, enabling efficient query processing without compromising sensitive information.

Uploaded by

Subrata Bose
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Horizontal Integration of Distributed Data

for Privacy Preserving Query Processing


Service
Authors Name/s per 1st Affiliation (Author) Authors Name/s per 2nd Affiliation (Author)
line 1 (of Affiliation): dept. name of organization line 1 (of Affiliation): dept. name of organization
line 2-name of organization, acronyms acceptable line 2-name of organization, acronyms acceptable
line 3-City, Country line 3-City, Country
line 4-e-mail address if desired line 4-e-mail address if desired

Abstract — Query service for information on TABLE I. Query Examples


demand is the need of the hour. Getting information at Subject Query Possible privacy
a mouse click in a web browser is very helpful for concerns
anybody vis-à-vis the time and effort for searching, Airlines Find list of young men who Gender and Age limits
collecting and collating information from different data had flown from City-A to of the passengers,
sources subject to availability. In this paper we have City-B on a particular Name of the cities, Date
proposed a framework in which a service provider morning and Time of the flights
provides privacy preserving query service to its customer Banks List all bank account details Person’s identity in any
by integrating data from multiple heterogeneous of person X having form such as Name,
minimum balance of M SSN No, Minimum
databases each of which holds a horizontal subset of the
million in the accounts as balance amount, Bank’s
query specific data. The customer and the databases are on a particular date identity such Name,
unknown to each other. Even one database is unaware Address, Country etc.
of the identity of other database. The service provider Crime Find crime records of Person’s identity in any
acts as a bridge between them for servicing a specific records person X for crime form such as Name,
query. This service aims to preserve the following committed between Year-1 Nationality,
privacy metric. a) Identity privacy – identity of the and Year-2 from Police and Identification mark,
customer from the data sources, data sources from the CID departments records of Physical measure such
customer and one data source from other data sources the state as Height, Weight,
Values of years, Name
b) Query privacy – privacy of the customer’s query from
of the State, Name of
the service provider and the data sources c) Data privacy the Departments
– privacy of the data source’s data from the customer,
service provider and other data sources and d) Result
privacy – privacy of query result of a data source from
wide variety of queries (Table I) on different
service provider and other data sources. Commutative fields/sectors/businesses like banking and finance,
encryption is the basic method used in this work. insurance, airlines, geography, science, art and
culture, history, literature, sports, cinema, crime
Keywords — Secure computation; commutative records, company related information etc. to name a
encryption; privacy preserving query service; distributed
processing; horizontal integration
few. Service provider in the business decides this list
depending on its business interest and potential within
I. INTRODUCTION legal and ethical boundaries. In this business model
Service providers provide services on demand customer should not care to know who the data
over the Internet. Typically these services use large sources are, the data sources should not be bothered
databases and powerful servers that host web about the identity of the customer and even other data
applications and services. Anyone with a suitable sources. Needless to say privacy is a serious concern
in such an open environment for the benefit of all
internet connection and a browser can access these
concerned – customer for their query and identity, the
applications [4]. We are considering one such web
service. We call it query service. A set of data sources data sources for their data, result and identities. The
allows a service provider to offer this service to its service provider, an untrusted third party who is
customers on their databases. This service allows a honest but curious, acts as an intermediary in the
customer to issue a query to the service provider who system. It receives the query from the customer and
in turn finds potential and willing data sources solves it with the help of data sources without gaining
relevant to answer the query and solves the query with any knowledge about the query, data and the result.
the help of their data. The data sources should The current work offers privacy solution for query
generally have prior arrangement with the service processing in this open scenario.
provider though some of them could be spotted on the A. Privacy Analysis
fly and included in the query service depending on the
need. The databases maintained independently by Our proposed service model demands four types of
different owners participating in such query service privacy namely identity, query, data and result. Each
are not expected to be homogeneous in nature. player in the system (the service provider or the
Database software products, their versions and schema customer or any data source) likes to be protected
are likely to vary from one data source to the other. from others, while maximizing the utility,
The customer submits queries through a suitably attractiveness and efficiencies of the service. In our
designed web application interface ready to cater to a model a customer issues a parametric query to the
service provider through a web application program TABLE III: Query Privacy Level
interface. The application program accepts customer’s | NS | |S| | HS Privacy level
choices and data through the interface and generates a |
dynamic SQL query1. The dynamic query has 0 0 x Private Query with highly sensitive
placeholders2 for values to be supplied by the user at x 0 x1 constants
0 x x Private Query with sensitive and
execution time. Though the textual content of such x x x highly sensitive constants
SQL query such as, 0 x 0 Private Query with sensitive
x x 0 constants
SELECT airline, flight_number, departure_time, 0 0 0
date_of_journey FROM Airlines WHERE City-A = ? Public Query
x 0 0
AND City-B = ? AND departure_time <= ? AND 1
x indicates greater than zero

date_of_journey BETWEEN ? AND ? while they have the freedom of using intermediate
may not reveal any privacy, the complete query with results to their own advantage. As such, no database
the values of the parameters replacing the would have any real incentive to keep quiet or to
placeholders makes it sensitive [9]. This is a concern change its part of the result. The reasons are that the
for the customer. customer and the data sources are not known to each
We approach the problem of preserving query other – their identities are kept secret. By a database
privacy as that of hiding these constants from the remaining silent, the customer would think the
service provider and optionally the databases. Hiding database is not interested to disclose anything or it
constants during query processing has an obvious cost obtained empty result – there is no great distinction
because the system must make use of some hiding between these two for the customer. The service
(encryption and decryption) mechanism to achieve provider being a reputed intermediary will never risk
that. Thus instead of treating each and every constant its reputation by keeping silent or falsifying; rather it
identically we divide the set of constants into three likes to remain oblivious to both query and result.
disjoint sets – nonsensitive (NS), sensitive (S) and Query privacy of the customer and the data privacy
highly sensitive (HS), where each category has its (which includes result privacy) of the data sources are
defined protection level as shown in Table II. The the two primary considerations besides maintaining
application program allows the customer to assign identity privacy in this semi honest model.
each constant to one of these sets but in specific cases B. Privacy of Sensitive Query Constants
the choices could be predefined by the system
depending on its severity. For example, in searching Our approach to deal with the three sets of query
crime records a criminal’s name should normally constants is as follows:
belong to HS. Depending on the cardinality of the sets a) Elements of NS are substituted in the query text
HS, S and NS we get four types of query privacies as by the application program before sending the
shown in Table III. The nonsensitive constants query to the service provider who in turn sends it
provide increased efficiency as they are visible to to the databases.
everyone. Both sensitive and nonsensitive constants
are visible to the databases. This adds to the interests b) Elements of S in encrypted form are sent to the
of different data sources to participate, because they databases via service provider along with the
know what data they are revealing to the (unknown) query text. The service provider acts as conduit
customer and the service provider whereas the highly in this data transfer. The databases in turn get
sensitive constants primarily protect the interests of them decrypted with the help of the customer
the customer, who is reluctant to reveal these even to again via service provider used as a conduit.
the (unknown) databases. The data being the private After obtaining S each database substitutes these
asset of the data sources should always be protected values in the respective placeholders and
from the service provider, the customer and other data executes its local query. We have proposed
sources. Similarly, the query result obtained from a commutative encryption in this secret message
data source needs to be protected from the service transfer.
provider and the data sources. Similar privacy
c) Elements of HS in encrypted form are sent to the
constraints would also apply to any intermediate
service provider who uses them for selecting the
results.
tuples of interest of the customer’s query from
Our query processing framework is based on a
the result sets obtained by local query processing
semi-honest model where each participant is expected
in each database. The original query is stripped
to follow the protocol,
off the part or whole of WHERE clause
TABLE II. Graded Query Constants involving these constants and adding the
Query Constant Set Protected From attributes lost in this process from the WHERE
Highly sensitive (HS) Both service provider and the clause to the SELECT list of attributes. This
databases ensures that answer of the original query is
Sensitive (S) Only service provider contained in the query result of the transformed
Nonsensitive (NS) None
query.
In this paper we have provided a solution for the
complete case in which all the three sets HS, S and
1
An incomplete SQL statement within a software system, meant to
NS are non-empty. In case one or more sets are
be fully constructed and executed at runtime [11] empty its specific solution can be derived by
2
Question marks (?) are used to denote the placeholders for the necessary modification of above solution.
constants
C. Data Integration – Horizontal and Vertical and the databases, C combines the partial query
results to obtain the final query result. The idea is to
The query service collects and integrates data
perform the entire operation in a privacy preserving
from multiple data sources through the service
manner so that SP does not learn anything about the
provider virtually forming a distributed database. We
can classify the data integration into one of the query constants S and HS, about the contents of the
following types: databases, and about the results (except possibly its
sizes). No database learns about the contents of other
a. Horizontal Integration: A query can be resolved databases and also others’ results and identities of C
by retrieving data from a set of databases each of and other data sources. C does not learn anything
which in its schema has all the attributes about the contents of the databases or identity of the
(possibly along with other attributes) of the target corresponding data source. It only learns the overall
list and of the predicate in the customer query. result R or individual results Ri, without being able
The final query result is obtained by taking union to locate the data source.
of the local query results of individual databases.
This is akin to horizontal fragmentation in III. RELATED WORK
distributed databases, although the databases Aggarwal et al. [1] proposed a two-party storage
need not have identical set of attributes and can model to enable secure database query service for
be heterogeneous. outsourced data. The key idea in their approach is to
b. Vertical Integration: A query can be resolved by partition data into two logically independent database
retrieving data from a set of databases each of systems according to the privacy constraint; these
which has only a part (but not all) of the databases cannot communicate with each other yet
attributes from the target list and the predicate in execute database query in that distributed architecture.
the customer query. The final query result is Their proposed scheme does not allow queries to
obtained by joining the local query results of execute on multiple databases. Agrawal, Evfimievski,
individual databases on common key. This is and Srikant [2] developed protocols for secure
akin to vertical fragmentation in distributed intersection, intersection size and equijoin database
databases. operations for two databases using commutative
encryption and hashing. Their work exposed partial
c. Mixed/Hybrid Integration: This is a combination information such as table sizes and the query to the
of horizontal and vertical integration databases and does not support aggregation queries. In
Any query processing would be accomplished [5] Chow, Lee and Subramanian proposed a two-party
through a sequence of horizontal integration and computation model for privacy-preserving queries
vertical integration operations. In this paper we have over distributed database in an honest but curious
worked on horizontal data integration. adversarial model comprising of two semi honest
parties – randomizer and computing engine other than
II. PROBLEM DEFINITION the customer and the databases. Scalability of query
computation over large databases was their focus area,
In our system, we have a service provider SP, a
though the proposed model does not support
customer C and a number of independent
comparison across databases. Their work assumes that
heterogeneous databases D1, …, Dn of n data
randomizer picks up a random string and sends to the
sources. The databases are assumed to be relational.
databases and the customer, for de-randomization via
We assume that the data sources cannot communicate
confidential channels. Their model supports data
with each other. SP itself is not a data source. It privacy and result privacy but does not consider query
receives customer’s query Q and solves the query privacy. Emekci, Agrawal, Abbadi and Gulbeden [6]
with the help of a set of databases. C does not have proposed privacy preserving intersection, equijoin and
any knowledge about the databases that can resolve aggregation query solution over hash-based P2P
the given query. The data sources registered with SP system in which selected third parties perform query
share their data catalogue for the part of their data computation to speed up query response while
they like to share for the query service. With the help preserving privacy of the data sources. These third
of the catalogues SP locates the relevant databases by parties are selected from a peer-to-peer (P2P) system,
comparing the schema of the databases with the namely Chord [12] for computation of the query
attributes in the target list and the predicate of the results. The secrets are distributed to the third parties
query. Alternatively, SP can also send the formatted using Shamir’s secret sharing method [10]. Their
query to a set of potential data sources based on the model considers data privacy but not query privacy.
catalogs available or prior knowledge about them. Most of the existing privacy preserving query
The data sources would then match their database processing solutions deals with data and query privacy
schemas with Q and examine the constant sets {NS, separately. However, in a recent work Hu, Xu, Ren
S, HS} (without looking into the constant values) and Choi [7] dealt with data, query and result privacy.
with its business objective and decide to participate in Their model is for single database. They used
the query resolution. Finally the service provider homomorphic encryption, a computationally heavy
reformulates (and splits) the query into local queries encryption algorithm to compute Euclidean distance
and sends to the relevant databases for processing. for distance based queries such as kNN query and
Each database processes its local query. The query distance range query. Moreover homomorphic
processing engine at each database generates results encryption enforces some restrictions on the domain
in accordance with a common schema and sends the of plaintext. There is a number of privacy preserving
result in encrypted form to SP. With the help of SP
techniques for query processing over distributed 𝑚 𝑒2 𝑚𝑜𝑑 𝑛 𝑒1
𝑚𝑜𝑑 𝑛 = 𝑚 𝑒1 𝑒2 𝑚𝑜𝑑 𝑛 = 𝑚 𝑒1 𝑚𝑜𝑑 𝑛 𝑒2 𝑚𝑜𝑑 𝑛
databases [5, 6, 7, 9]. = 𝐸𝑘 𝐵 𝑚 𝑒1 𝑚𝑜𝑑 𝑛 = 𝐸𝑘 𝐵 𝐸𝑘 𝐴 𝑚

IV. MATHEMATICAL PRELIMINARIES D. Quasi-commutative hash functions


A. Commutative encryption Hash functions are defined on a single argument:
In commutative encryption if a plaintext message ℎ: 𝑋 → 𝑌. One-way hash functions are those for
which ℎ−1 is difficult to find. For quasi-commutative
m is encrypted by two different keys ka and kb in
different order, they are mapped to the same cipher property we need to define hash functions which take
text. Thus if some data is encrypted more than once, two arguments i.e. ℎ: 𝐴 × 𝐵 → 𝐶 where |𝐴| ≈ 𝐵 ≈
the order in which it is decrypted does not matter. |𝐶|.
The order of encryption need not be followed for Definition 2: A function 𝑓: 𝑋 × 𝑌 → 𝑋 is said to be
decryption. We have used commutative encryption quasi-commutative if ∀ 𝑥 ∈ 𝑋 and ∀ 𝑦1 , 𝑦2 ∈ 𝑌 we have,
for secure data exchange between any two parties via
𝑓 𝑓 𝑥, 𝑦1 , 𝑦2 = 𝑓 𝑓 𝑥, 𝑦2 , 𝑦1
a third party. We have proposed a variant of RSA
Addition and multiplication modulo n and
one-way accumulator also known as exponential exponential modulo n all have this property. Among
accumulator originally advocated by Benaloh and de
these, exponential module n, ℎ 𝑥, 𝑦 = 𝑥 𝑦 𝑚𝑜𝑑 𝑛
Mare [3]. One-way accumulator has commutative
under suitable conditions is also one-way. One-way
property.
hash functions for which the range is equal to the
Definition 1: If M and K are the message space and domain of the first argument, i.e. ℎ: 𝑋 × 𝑌 → 𝑋, and
key space respectively ∀ m ∈ M and for any ka and kb ∈ K, which also have the quasi-commutative property are
we have one-way accumulators.
Ek a Ek b m = E k b Ek a m Definition 3: A family of one-way hash functions
It follows from the above that each of which is quasi-commutative is said to be one-
way accumulators. Quasi-commutative property of
D k a D k b E k a Ek b m = D k a D k b Ek b Ek a m =m one-way accumulators ℎ ensures that ∀ 𝑥 ∈
𝑋 𝑎𝑛𝑑 𝑦1 , 𝑦2 , … , 𝑦𝑚 ∈ 𝑌 the accumulated hash value
Commutative property enables message sharing
between parties without any advance arrangement 𝑧 = ℎ (ℎ … ℎ ℎ 𝑥, 𝑦1 , 𝑦2 … , 𝑦𝑚−1 , 𝑦𝑚 )
such as key distribution or public-key disclosure.
is unchanged for every permutation of 𝑦 ′s.
B. One-way accumulator E. One-way accumulator function construction
The common form of one-way accumulator is
For any n, the function 𝑒𝑛 (𝑥, 𝑦) = 𝑥 𝑦 𝑚𝑜𝑑 𝑛 is
defined by starting with a seed value y0, which clearly quasi-commutative. In RSA n is constructed
signifies the empty set, and then defining the
as a product of two primes p & q. Family 𝑒𝑛
accumulation value incrementally from y0 for a set of constitutes a family of one-way hash functions.
values X = {x1, x2, …., xn} so that yi = f(yi-1, xi), where Benaloh and de Mare [3] made further restrictions on
f is a one-way function whose final value does not n to use 𝑒𝑛 as one-way accumulators. They defined n
depend on the order of the xis. A well-known example to be a product of two safe primes4 p and q and
of a one-way accumulator function is the RSA defined the hash function h as ℎ 𝑥, 𝑦𝑖 = 𝑥 𝑦 𝑖 𝑚𝑜𝑑 𝑛,
accumulator, 𝑛 = 𝑝𝑞. To prevent collisions they require 𝑝 𝑎𝑛𝑑 𝑞
𝑓(𝑦, 𝑥) = 𝑦 𝑥 𝑚𝑜𝑑 𝑛 to be safe primes. Kantarcioglu and Clifton [8]
modified the encryption as follows. Instead of
for suitably-chosen values of the seed y0 and modulus
generating random 𝑦𝑖 values, they generate secret key
n. In particular, choosing n = pq with p and q being
pair as 𝑆𝑖 = (𝑒𝑖 , 𝑑𝑖 ) such that 𝑒𝑖 𝑑𝑖 = 1 𝑚𝑜𝑑 𝜑 𝑛 . 𝑒𝑖
two strong primes make the RSA accumulator
function as difficult to invert as RSA cryptography. is used for encryption whereas 𝑑𝑖 is used for
decryption or vice versa. This is identical to RSA
C. SRA Protocol except that p and q have safe property instead of
being arbitrary primes.
This is RSA but unlike RSA both 𝑒 and 𝑑 are kept
secret with the party. Each party generates its own F. Privacy Preserving Message Passing Protocols
secret key pair K = (e, d); uses 𝑒 for encryption and 𝑑
We offer two privacy preserving message passing
for decryption or vice versa. Let 𝑝 and 𝑞 be two
protocols which act as the basic building blocks of
primes selected jointly by the parties, 𝑛 = 𝑝𝑞.
our proposed algorithm. Our privacy preserving
Let 𝑘𝐴 = 𝑒1 , 𝑑1 be the secret key pair of A where algorithms are based on commutative encryption.
𝑒1 . 𝑑1 = 1 𝑚𝑜𝑑 𝜑(𝑛)3 and similarly 𝑘𝐵 = 𝑒2 , 𝑑2 be Though any commutative encryption is applicable to
the secret key pair of B. Given the key pair (e, d), the our solution, we have proposed to use the
encryption of message 𝑚 is defined as 𝐸𝐾 𝑚 = construction of One-way accumulators by
𝑚 𝑒 𝑚𝑜𝑑 𝑛 and the decryption of ciphertext 𝑐 is defined Kantarcioglu and Clifton [8]. This construction offers
as 𝐷𝐾 𝑐 = 𝑐 𝑑 𝑚𝑜𝑑 𝑛. a collision resistant hash function for encryption by
choosing two safe prime p and q so that the matching
This protocol is commutative as 𝐸𝑘 𝐴 𝐸𝑘 𝐵 𝑚 = of encrypted values behaves identically to that of
𝐸𝑘 𝐴 𝑚 𝑒2 𝑚𝑜𝑑 𝑛 = plain text values.

3 4
𝜑 𝑛 = 𝑝 − 1 (𝑞 − 1). is Euler’s totient function A prime p is safe if p = 2p’ + 1 where p’ is an odd prime.
V. BUILDING BLOCKS OF OUR PRIVACY has been discussed in Section 1. Before we present
PRESERVING PROTOCOL our algorithm we take a simple example query to
explain our methodology.
Following protocol is used in the beginning of query
execution by SP. Example: Consider a hypothetical query of the form
A. Setup SELECT col1, col2 FROM table1
WHERE col1 <= NS1 AND col2 BETWEEN S1 and S2
a. SP chooses two safe primes p and q and AND col3 = HS1
share these with C and each D.
Let us assume that the constants entered by the
b. Using p and q, C and each D create their
customer through the query service’s web interface
own secret key KD and KC following RSA. are as follows:
Let ED and EC (DD and DC) denote the
encryption (decryption) functions with the NS1 = 70, S1 = 1000, S2 = 2000 and HS1 = “ABC
secret key of D and C respectively. Both (E, Limited”
D) pairs satisfy the commutative properties. Before sending to SP the query text goes through the
Following protocols are used to privately exchange a following transformations:
secret message between two parties via an 1. The value of nonsensitive constant NS1 is
intermediary. substituted in the query text as this can be made
B. Privacy Preserving Message5 Passing from C to public as per user’s choice
D via SP: PPMP (Source C, Destination D, 2. The attribute corresponding to the highly
Intermediary SP, Message m) sensitive constant HS1 is stripped off from the
query text and the attribute is added to the select
a. C encrypts m with its encryption key and list so the answer of the original query is found
sends EC(m) to D via SP. within the query result of the transformed query
b. D encrypts EC(m) with its encryption key
and sends ED(EC (m)) to C via SP. The modified query Q’ after above transformations is
c. C decrypts ED(EC(m)) to obtain ED(m) and SELECT col1, col2, col3 FROM table1
sends ED(m) to D via SP. WHERE col1 <= 70 and col2 BETWEEN ? and ?
d. D decrypts ED(m) to obtain m. A. Privacy Preserving Query Service Protocol
C. Privacy Preserving Message Passing from D to C Step 1. SP chooses a set of databases D1, ..., Dn to
via SP: PPMP (Source D, Destination C, process Customer C’s query Q and runs Setup
Intermediary SP, Message m) process to distribute safe primes p and q to C and
each Di.
This is symmetrically opposite version of the
Step 2. C transforms Q into Q’ [by client side
above protocol.
program at C] and sends to SP
D. Privacy Preserving Encrypted Message Passing Step 3. SP sends Q’ to each Di.
from C to SP via SP: PPMP (Source C, Step 4. C sends set of sensitive constants S to each
Destination SP, Intermediary SP, Message m, Di using PPMP (C, Di, SP, S).
Encryption D) Step 5. Each Di executes Q’ to generate result set
a. C encrypts m with its encryption key and Ri, encrypts it with its key and sends the
sends EC(m) to D via SP. encrypted Ri to SP for further process.
b. D encrypts EC(m) with its encryption key Step 6. C sends the set of highly sensitive constants
and sends ED(EC (m)) to C via SP. HS to SP for each Di using PPMP (C, SP,
c. C decrypts ED(EC(m)) to obtain ED(m) and SP, HS,Di)
sends ED(m) to SP. Step 7. On receipt of the query result Ri from each
data source, SP picks the tuples of interest by
E. Privacy Preserving Message Decryption by C by (equality) matching the encrypted data E(Ri)
D via SP PPMP (Source C, Intermediary SP, with each s in E(HS) and sends the resultant
Message ED(m), Encryption D) data E(ri) to C.
a. C encrypts m with its encryption key and Step 8. For each data source Di , C decrypts E(ri)
sends EC(ED(m)) to D via SP. using PPMP (C, SP, EDi(ri), Di)
b. D decrypts EC(m) with its encryption key Step 9. C combines the result sets ri s to get the final
and sends EC (m) to C via SP. answer.
c. C decrypts EC(m) to obtain m.
Procedure described above describes the case where
all the constant sets are non-empty. However each
VI. PROPOSED SOLUTION variation (Ref Table III) will call for suitable
modification of the algorithm. For example, if HS is
In our query service customer’s query Q has three empty, we need not strip off col3 from the WHERE
sets of constants {NS, S, HS}. Privacy requirement clause and the databases need not send the partial
of each type of constant has been listed in Table II. result to SP for tuple selection. Instead each database
How privacies of each type of constants are handled can send its local query result to C using PPMP (D,
C, SP, m).
5
Message m can be a scalar or a vector
B. Security Analysis processing in this model preserves data privacy of the
data sources and the query privacy of customer. It
Our query processing framework is based on a semi- also preserves identity privacy of the customer and
honest or honest but curious model which means that the data sources and the result privacy. The protocols
all the parties follow the protocol correctly but they have been built on commutative encryption for
may record any intermediate input received during secretly transferring data /messages between two
the protocol execution and try to derive some benefit parties via a third party. We have suggested one-way
out of it. Databases are not known to the Customer, accumulator as our choice of commutative encryption
from a set of data sources the databases are chosen by scheme because of its computational infeasibility.
the Service Provider based on the query type. For efficiency of processing the problem of hiding the
Moreover, they do not have any common interest. So sensitive query constants has been studied in three
practically they have no chance of colluding with the different practical scenarios depending on the degree
Customer. However being known to the Service of disclosure of the constants allowed by the
Provider the databases may collude with each other. customer to other players (Table II, Section I). The
Our attempt is to prevent the collusion between them. computation and communication complexity will
We analyze security aspect of each protocol. Query vary depending on degree of disclosures. Our future
privacy of the Customer and the data privacy of the plan is to work on privacy preserving queries on
data sources are the primary consideration in this vertically distributed databases.
semi honest model. Service provider remains
completely in dark during the process but provides REFERENCES
the service without learning the query, data and the [1] Aggarwal G., M. Bawa, P. Ganesan, H. Garcia-Molina, K.
result of computation. Kenthapadi, R. Motwani, U. Srivastava, D. Thomas, Y. Xu.
“Two can keep a secret: A distributed architecture for secure
Any party - the Customer, the Service Provider
database services” In Proc. of CIDR 2005.
or the data source may act as an adversary. The [2] Agrawal R, A. Evfimievski, and R. Srikant. “Information
encryption/decryption algorithms are known to all the sharing across private databases”. In Proc. of the 2003 ACM
parties but the secret key pair of each party is SIGMOD international conference on Management of data,
pages 86–97, 2003.
unknown to others. Following situations may arise: [3] Benaloh J. and M. de Mare. “One-way accumulators: A
a. The service provider may like to know the value decentralized alternative to digital signatures”. In
of the constants of the parametric query given by EUROCRYPT ’93: Workshop on the theory and application
the customer, the data values of the attributes of cryptographic techniques on Advances in cryptology,
pages 274–285. Springer-Verlag New York, Inc., 1994.
corresponding to the sensitive constants as well [4] Boss G., P.Malladi, D. Quan, L. Legregni, H. Hall - IBM
as the query result Corporation 2007, Service provider computing.
b. The customer may like to know the data values [5] Chow S. S. M., J.H. Lee, and L. Subramanian, “Two-party
of the attributes corresponding to the sensitive computation model for privacy-preserving queries over
distributed databases,” in NDSS, 2009.
constants.
[6] Emekci F., D. Agrawal, A. E. Abbadi, and A. Gulbeden.
c. The data source may like to know the values of “Privacy preserving query processing using third parties”. in
the sensitive constants and other data sources’ ICDE 2006, page 27. IEEE Computer Society, 2006.
data. [7] Hu H., J. Xu, C. Ren and B. Choi. Processing private queries
over untrusted data service provider through privacy
Security of data depends on the security of the
homomorphism, In: ICDE IEEE Computer Society (2011), p.
encryption/decryption key pair of individual players. 601-612.
We have proposed one-way accumulator as our [8] Kantarcioglu M. and C. Clifton. “Privacy-preserving
choice of commutative encryption. Benaloh and de distributed mining of association rules on horizontally
partitioned data”. In The ACM SIGMOD Workshop on
Mare [3] showed that it is computationally infeasible. Research Issues on Data Mining and Knowledge Discovery
(DMKD'02), pages 24-31, June 2 2002.
VII. CONCLUSION AND FUTURE WORK [9] Olumofin F. and I. Goldberg. “Privacy-preserving queries
over relational databases”. In PETS’10, Berlin, 2010.
The problem of preserving privacy of sensitive [10] Shamir A., “How to share a secret,” Commun. ACM, vol. 22,
constants in customer’s query in a query service no. 11, pp. 612–613, 1979.
framework has been studied in this paper. In this [11] Silberschatz A., H. F. Korth, and S. Sudarshan. Database
System Concepts. McGraw-Hill, Inc., New York, NY, USA,
model a service provider provides query service with 5th edition, 2005.
the help of different data databases virtually forming [12] Stoica I., R. Morris, D. R. Karger, M. F. Kaashoek, and H.
a distributed database with horizontal partitioning, Balakrishnan. “Chord: A scalable peer-to-peer lookup service
though the databases could be heterogeneous. Query for internet applications”. In SIGCOMM 2001, pages 149–
160, 2001.

You might also like