Unit 4
Data Security
Data Security
Cloud data security involves far more than simply data
encryption.
prospective cloud adopters would have security concerns
around storing and processing sensitive data in a public or
hybrid or even in a community cloud.
these concerns usually center on two areas:
Decreased control by the owning organization when data is no
longer managed within an organization’s premises.
Concern by the owning organization that
multitenancy clouds inherently pose risks to
sensitive data
Control over Data and Public
Cloud Economics
In contrast to use of a public cloud, maintaining organizational
physical control over stored data or data as it traverses internal
networks and is processed by on-premises computers does
offer potential advantages for security.
The basic problem is that most organizations are
neither qualified to be in the information security
business nor are they in that business—they are
simply using computers and networks to get their
work done
CryptDB
Cloud users demands security to their data which are stored in
data repositories of cloud service provider.
Thus the concept of Network Security can be applied over the
cloud network, where several encryption algorithms are
applied to provide integrity on the data.
Such algorithms include Symmetric encryptions, Asymmetric
encryptions, Hashing algorithms and Digital signatures.
Even though these algorithms provide security, however these
are not applied on query based data retrieval from databases
where certain queries are used to invoke the data.
the operations are done in database which is located remotely,
away from user, providing encryption on queries and data
together will make an efficient approach.
Such mechanisms like Homomorphic encryption, Order-
preserving encryption are examined and a novel approach is
defined to meet all security issues over a cloud termed as
“CryptDB”.
In a cloud computing environment, the conventional role of the
service provider is divided into two categories: the infrastructure
providers, who manage cloud platforms and lease resources such
as CPU, storage system etc., (according to a usage-based pricing
model), and service providers, who rent resources from one or
many infrastructure providers to serve the cloud users.
DaaS, data as a service, this type of service provides only data
which are requested by the user and also stores the user data
remotely and is made avail upon user request.
Among all these services DaaS is considered to be the most
demanded service from cloud users as they exchange, store,
modify their information through internet.
Problem: Confidential Data
Leaks
curious DB
administrators
User 1
SQL DB Server
User 2 Application
User 3
hackers
curious cloud/employees
physical attacks
Both on private clouds and public clouds
Regulatory laws
Idea on Security Algorithms:
They can be classified as symmetric key algorithms and
asymmetric key algorithms.
In symmetric key , a single key is shared among sender and
receiver to encrypt and decrypt the data known as common
key/shared key/symmetric key. Algorithms like AES
(Advanced Encryption Standard), Big-fish, Two-fish are based
on single key encryptions .
Other algorithms includes RSA, DSA are based on
two key management schemes where the public
key is used to encrypt the data and secret
key/private key is used to decrypt the data, these
algorithms are called as asymmetric algorithms
since they use different keys for encryption and
decryption.
Cipher text is sent through the network instead original text
there is less chance of leaking the original message and other
attacks are moderately prevented.
Since this encryption revolves around keys for encryption and
decryption, in order to decrypt the encrypted message by the
receiver he must know the private/secret key which the sender
has to send besides the cipher message
CryptDB
Goal: protect confidentiality of data
user Threat 2: active/passive attacks on all
password servers
Threat 1: passive
attacks on DB
User 1 server
SQL DB Server
Proxy
User 2 Application
User 3
1. Process SQL queries on encrypted data
2. Capture and enforce cryptographically access control in
SQL: chain keys from user passwords to data item
CryptDB is a system that provides practical and provable
confidentiality in the face of these attacks for applications
backed by SQL databases.
It works by executing SQL queries over encrypted data using
a collection of efficient SQL-aware encryption schemes.
CryptDB leverages the typical structure of database-backed
applications, consisting of a DBMS server and a separate
application server
the latter runs the application code and issues DBMS queries
on behalf of one or more users.
CryptDB’s approach is to execute queries over encrypted data,
and the key insight that makes it practical is that SQL uses a
well-defined set of operators, each of which we are able to
support efficiently over encrypted data.
It works by intercepting all user issued SQL queries in a cryptdb-
database proxy, which rewrites queries to execute on encrypted
data, as CryptDB assumes that all queries go through its proxy.
The proxy encrypts and decrypts data and query, by generating
parse tree preserving the semantics of the query. The DBMS server
never receives keys to decrypt the cipher text to get plaintext, so
there is no chance of accessing confidential data by DBMS server.
CryptDB addresses two kinds of threats, shown as dotted lines.
In threat 1, a curious database administrator with complete access
to the DBMS server snoops on private data, in this case CryptDB
prevents the DBA from accessing secret information.
In threat 2, an adversary gains complete control over both the
software and hardware of the application, proxy and DBMS
servers, in this case
CryptDB ensures the adversary cannot obtain data belonging to
users that are not logged in (e.g., user 2), but the data of the logged
in user may get hacked
Threat 1: Passive attacks to DB
Server
Trusted Under attack
DB Server
application queries Proxy SQL
unencrypted
Stores schema, master key
Decrypts results
No query execution
Perform SQL query processing on encrypted
data
1. Support standard SQL queries on
encrypted data
2. Process queries completely at the DB
server
Example
Applicatio
n
60
SELECT * FROM emp 100
≥
WHERE salary = 100 800
table1 (emp) 100
SELECT * FROM table1 col1/ col2/name col3/salary
Proxy
≥
WHERE col3 = x5a8c34
x638e5 rank
4 x934b
x1eab
x4be219
c1
81
x5a8c
x638e
x95c62
334
54
x638e
x5a8c x922eb
x84a2
x2ea887
? 4
34
54 1c
x5a8c
x922eb x638e
x5a8c
x17cea
34
4 754
34
x5a8c
x638e
34
54
Working with CryptDB
Prerequisites:
CryptDB typically runs on Linux systems. Ensure you have a
suitable Linux distribution installed (e.g., Ubuntu, CentOS).
You'll need to have MySQL or PostgreSQL installed on your
system, as CryptDB supports integration with these database
management systems.
Install required dependencies such as GCC, Make, OpenSSL, and
others.
CryptDB source code can be obtained from its GitHub repository
or official website.
Build CryptDB:
Extract the downloaded CryptDB source code.
Navigate to the directory containing the source code and follow
the provided instructions to build CryptDB. This usually involves
running the ./configure script followed by make and make install
commands.
Set Up Database:
Install and configure MySQL or PostgreSQL as per CryptDB's
requirements. You may need to create a new database instance
or use an existing one.
Configuration:
Configure CryptDB to work with your database management
system. This typically involves editing configuration files to
specify database connection details, encryption parameters,
and other settings.
Start CryptDB:
Once CryptDB is built and configured, start the CryptDB
server using the provided startup script or command. This will
initialize the CryptDB server and allow it to accept
connections from client applications.
Commands to configure Cryptdb and
mysql proxy server
On server
sudo apt-get update
sudo apt-get -y install git ruby
git clone -b public git://g.csail.mit.edu/cryptdb
cd cryptdb
sudo ./scripts/install.rb .
sudo nano /etc/mysql/my.cnf
comment bind-address=127.0.0.1
mysql -u root -pletmein
GRANT ALL ON . TO 'root'@'%' IDENTIFIED BY 'letmein';
flush privileges;
sudo service mysql restart
touch conf/config.mk
make udf
On proxy
sudo apt-get update
sudo apt-get -y install git ruby
git clone -b public git://g.csail.mit.edu/cryptdb
cd cryptdb
sudo ./scripts/install.rb .
sudo
comment bind-address=127.0.0.1
mysql -u root -pletmein
GRANT ALL ON . TO 'root'@'%' IDENTIFIED BY 'letmein';
flush privileges;
sudo service mysql restart
export EDBDIR=$HOME/cryptdb
$HOME/cryptdb/bins/proxy-bin/bin/mysql-proxy --
plugins=proxy --event-threads=4 --max-open-
files=1024
--proxy-lua-script=$EDBDIR/mysqlproxy/wrapper.lua --
proxy-address=10.0.0.6:3307 --proxy-backend-
addresses=10.0.0.5:3306
On client
sudo apt-get update
sudo apt-get install mysql-client
mysql -u root -pletmein -h 10.0.0.6 -P 3307
Create database and execute queries
Downloading CryptDB
CryptDB is developed in MIT University; it is
stored in public repository called git-hub.
This version of cryptdb is a freeware; any
interested personnel can make use of it.
The command to download this library is
“git clone -b public
git://g.csail.mit.edu/cryptdb” .
Processing a Query in
CryptDB
User issues a query, which is intercepted by Database proxy
and re-writes the table and column name using a „Key‟.
Proxy checks if the DBMS server is to be given keys to adjust
onion layers, the proxy issues update command instead of
issuing keys to call appropriate UDFs.
The Database proxy forwards the query to the DBMS server
which executes using standard SQL.
DBMS server returns the query result to Database proxy
which decrypts and returns plain text to the user.
Threat 1: Passive attacks to DB
Server
Trusted Under attack
DB Server
application queries Proxy SQL
unencrypted
Stores schema, master key
Decrypts results
No query execution
Perform SQL query processing on encrypted
data
1. Support standard SQL queries on
encrypted data
2. Process queries completely at the DB
server
Example
Applicatio
n
60
SELECT * FROM emp 100
≥
WHERE salary = 100 800
table1 (emp) 100
SELECT * FROM table1 col1/ col2/name col3/salary
Proxy
≥
WHERE col3 = x5a8c34
x638e5 rank
4 x934b
x1eab
x4be219
c1
81
x5a8c
x638e
x95c62
334
54
x638e
x5a8c x922eb
x84a2
x2ea887
? 4
34
54 1c
x5a8c
x922eb x638e
x5a8c
x17cea
34
4 754
34
x5a8c
x638e
34
54
Two techniques
1. SQL-aware encryption strategy
◦ Obs.: set of SQL operators is limited
◦ Different encryption schemes provide different
functionality
2. Adjustable query-based encryption
◦ Adapt encryption of data based on user queries
Check it out! 20 minutes for
reference
Go through the below encryption algorithms
and understand them.
◦ AES
◦ DES
◦ CBC(Cipher block chaining)
◦ Blowfish
Onion Layers of Encryption
Cryptdb has a special function called Onion layer. It is defined
as layer-by-layer encryption/decryption schemes which are
called upon when a specific query is issued,
When it comes to SQL aware encryption there are different
aspects of computation that are based on different fundamental
principles.
For example the operator GROUP BY relies on equality
checks concerning the encrypted data, other functions like
SUM rely on the ability to perform additions of the encrypted
data.
CryptDB deals with these different computational aspects by
clustering functions by their underlying operations, as
mentioned above
Around these different aspects or clusters CryptDB builds a
construct that the developers have called onion:
An onion features different layers of encryption from least
revealing on the outside to most revealing on the inside.
Random (RND): A randomly created Initial Vector (IV) is used to construct a
ciphertext from the column word. A strong encryption is provided by RND,
which is suitable for sensitive data handling. Therefore, it enables queries to be
operated that require computation; for example, ORDER BY, Min and SUM
queries.
Deterministic (DET): The defense offered by the leakage is weaker due to a
shortage of the same ciphertext being generated for the same text. It is a pseudo-
random permutation.
Order-Preserving Encryption (OPE): It retains the ciphertext sequence to
remain in plaintext. For instance, if a < b, OPEK(a) < OPEK(b), which is
applicable for a key K. It is weaker than DET.
Homomorphic (HOM): For any data which requires computation,
homomorphic is useful. It makes it easier to create complicated mathematical
equations, as plaintext might be produced.
JOIN and OPE-JOIN: As separate DET keys are used, joins are used to enter
columns and mask the connections between columns. Equality and order
checking are accomplished through joins.
Word checks (SEARCH): The encrypted query reaches the DBMS with the
encryption keys. It runs successfully with a few user-defined functions (UDFs).
The data that it returns to the proxy is decrypted and sent to the application.
With the encryption key, the authenticated application enters the DBMS. A few
UDFs operate successfully. The data returned to the proxy is then decrypted and
The transformation from one layer into another (“peeling off a
layer”) happens automatically when the need arises (i.e. when
a query with a certain operator/function is issued). In this case
CryptDB automatically reencrypts the entire column and
remembers its state.
Encryption Types
Each type uses a different algorithm that meets the specified
requirements for a certain type and can be exchanged for
another algorithm should the need arise,
e.g. when a used cipher is broken. In such an event existing
encrypted data would have to be decrypted with the old
algorithm and reencrypted using the new one.
We have listed the different layers from most to least secure.
Random (RND):
The RND onion layer provides the strongest security
assurances: It is probabilistic, meaning that the same plaintext
will be encrypted to a different ciphertext.
it does not allow any feasible computation in a reasonable
amount of time.
The current implementation of RND uses the Advanced
Encryption Standard (AES) to encrypt strings and Blowfish to
encrypt integers.
the respective block sizes of the two ciphers: Blowfish has a
blocksize of 64 bit, which should be large enough to store 99%
of all integers, whereas AES is used with a blocksize of 128 bit.
This means using Blowfish to store integers only needs half the
space that AES needs. Both implementations use the Cipher
Block Chaining (CBC) mode
Homomorphic encryption (HOM)
Homomorphic encryption is the conversion of data into
ciphertext that can be analyzed and worked with as if it were
still in its original form.
Homomorphic encryption enables complex mathematical
operations to be performed on encrypted data without
compromising the encryption.
The HOM onion layer provides an equally strong security
assurance.
It is specifically designed for columns of the data type integer
and allows the database to perform operations of an additive
nature.
This includes of course the addition of several entries, but also
operations like SUM or AVG
The ability to perform mathematical operations on encrypted
data means that there needs to be a relationship between
plaintexts and ciphertexts.
Word search (SEARCH)
The SEARCH onion layer is exclusive for columns
of the data type text.
It allows for a keyword level text search with the
LIKE operator.
The implementation splits the string that is to be
stored in the database by a specified delimiter
(e.g. space or semicolon) and stores each distinct
substring in a concatenated and encrypted form
in the database.
Each substring is padded to a certain size and its position
inside the concatenated string is permutated thus obfuscating
the position where it appears in the original string.
When the user wants to perform a search using the LIKE
operator CryptDB applies the padding to the search term and
sends the encrypted version to the DBMS.
The DBMS can now search for this specific string and is able
to return the results.
Deterministic (DET)
The DET onion layer provides the second strongest security assurance.
In contrary to RND this layer is deterministic, meaning that the same
plaintext will be encrypted to the same ciphertext.
This means that the DBMS can identify fields with equal (encrypted)
content.
This allows us to use functions like GROUP BY, to group identical
fields together or use DISTINCT to only select fields that are
different.
It does not however reveal whether a certain field is bigger or smaller
than another field.
For this type the developers used Blowfish and AES again, although
this time they do not distinguish between integers and strings, but
choose the cipher depending of the blocksize of the plaintext.
Blowfish is used for any plaintext that is smaller than 64 bit and AES
for any plaintext that is bigger than 64 bit.
Order-preserving encryption (OPE):
The OPE onion layer is significantly weaker than the DET layer
as it reveals the order of the different entries.
This means that if x < y, then OP E(x) < OP E(y), also if x = y,
then OP E(x) = OP E(y). This allows us to use ordered operations
like MIN, MAX or ORDER BY.
Join (JOIN, OPE-JOIN)
The JOIN and OPE-JOIN layers are both “sub layers” of DET
respective of OPE. That means both of them feature the
computational abilities of their “parent layer”.
This type works over multiple columns and allows to determine
whether a plaintext in column a is equal to a plaintext in column
b for JOIN and whether a plaintext in column a is bigger or
smaller than a plaintext in column b for OPE-JOIN.
Both operators work with multiple column allowing for
constructs like: SELECT * FROM test_table WHERE
name1=name2 AND name2=name3
2. Adjustable query-based
encryption
Start out the database with the most secure
encryption scheme
Adjust encryption dynamically
Strip off levels of the onions: proxy gives
key to server using a UDF
Example
RN
D
DET
DET
emp:
SEARCH
rank name salar
JOIN
y
Any value
SELECT * FROM emp WHERE salary = 100
UPDATE table1 SET col3onion1 =
DecryptRND(key, col3onion1)
SELECT * FROM table1 WHERE col3onion1 =
x5a8c34
FPE(Format Preserving
Encryption)
an encryption algorithm which preserves the format of the
information while it is being encrypted.
FPE is weaker than standard
Advanced Encryption Standard (AES), but FPE can
preserve the length of the data as well as its format.
FPE works with existing databases to encrypt data while
keeping it in the same format, encrypting data while not
harming the function of existing applications.
FPE encrypts takes plaintext and converts it to ciphertext, of
the same format.
An application can do operations on data as if it were the
plaintext, while not revealing the sensitive information
encrypted.
Personally Identifiable Information (PII), credit card
information, social security numbers, and other sensitive data
are normally encrypted with Format Preserving Encryption.
Using FPE, a 16-digit card number encrypts to a 16-digit
number, for example, and a nine-digit Social Security number
encrypts to a nine-digit number.
This differs from other ciphers that would produce ciphertext
with a different data type and length.
FPE on the Cloud
FPE has three different modes of operation: FF1, FF2, and
FF3, which are referred to as FFX as a whole.
Some cloud service providers (CSPs) offer options to utilize
FPE within their platform, but far fewer than regular vendors.
Of the three biggest CSPs, Microsoft Azure, Amazon Web
Services (AWS), and Google Cloud Platform (GCP), only
GCP offers users the ability to work with Format Preserving
Encryption.
Trust
Zero trust model in cloud
Trust in cloud :
Zero trust is a security model used to secure an
organization based on the idea that no person or
device should be trusted by default, even if they are
already inside an organization’s network.
A zero-trust approach aims to remove implicit trust
by enforcing strict identity authentication and
authorization throughout the network, not just at a
trusted perimeter
In this model, every request to access resources is treated as if it
comes from an untrusted network until it has been inspected,
authenticated, and verified.
Zero trust is a cloud security model designed to secure modern
organizations by removing implicit trust and enforcing strict
identity authentication and authorization.
Under zero trust, every user, device, and component is
considered untrusted at all times, regardless of whether they are
inside or outside of an organization’s network.
Data and resources are inaccessible by default, and connections
are only granted strictly controlled access after they have been
authenticated and authorized.
This process is applied for any user or connected endpoint, and
identity is continuously authenticated. In addition, all network
traffic is logged, monitored, and analyzed closely for any signals
of a compromise.
three zero-trust principles that shape the model :
(https://cloud.google.com/learn/what-is-zero-trust)
Assume all network traffic is a threat, at all
times:
Zero trust takes the view that every user is hostile and that threats
are omnipresent, both inside and outside the network. Therefore,
any traffic that does not have explicit permission is automatically
denied access. Every device, user, and network flow is
authenticated, authorized, and validated when requesting access on
an ongoing basis.
Enforce least-privileged access. Zero-trust security
approaches grant least-privilege access, the minimum privileges
and access to the necessary resources when they are needed
without impacting the ability to complete a task.
Least-privilege access helps restrict attackers from moving laterally
to more critical resources if an account or device is compromised.
Always monitor. The zero-trust model advocates for
continuous monitoring and analyzes and manages activity on
the network at all times. This enables real-time understanding
of what entities are trying to access resources and helps
identify potential threats, active incidents, and any anomalies
that should be investigated.
Risk based Authentication in cloud :
The reason for this breach is the password thefts. Once a
hacker gets hold of a password, it can be used to access any of
the victims online accounts, resulting in privacy and security
being compromised.
One way to overcome this security breach is to have a strong
risk based authentication process in place.
Also known as multi factor authentication, it is an access
control method that adds layers of identity verification to
ensure only authorized users gain network access.
In a typical organization where applications are deployed
within the organization’s perimeter the “trust boundary” is
mostly static and is monitored and controlled by the IT
department.
In traditional model, the trust boundary encompasses the
network, systems, and applications hosted in a private data
center managed by the IT department.
And access to the network, systems, and applications is
secured via network security controls including virtual private
networks (VPNs), intrusion detection systems (IDSs),
intrusion prevention systems (IPSs), and multifactor
authentication.
To compensate for the loss of network control and to
strengthen risk assurance, organizations will be forced to rely
on other higher-level software controls, such as application
security and user access controls.
These controls manifest as strong authentication, authorization
based on role or claims, trusted sources with accurate
attributes, identity federation, single sign-on (SSO), user
activity monitoring, and auditing.