Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
44 views92 pages

Eindhoven University of Technology: Award Date: 2010

Uploaded by

ashahneela22
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views92 pages

Eindhoven University of Technology: Award Date: 2010

Uploaded by

ashahneela22
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 92

Eindhoven University of Technology

MASTER

Efficient implementation of homomorphic cryptosystems

Makkes, M.X.

Award date:
2010

Link to publication

Disclaimer
This document contains a student thesis (bachelor's or master's), as authored by a student at Eindhoven University of Technology. Student
theses are made available in the TU/e repository upon obtaining the required degree. The grade received is not published on the document
as presented in the repository. The required complexity or quality of research of student theses may vary by program, and the required
minimum study period may vary in duration.

General rights
Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners
and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights.

• Users may download and print one copy of any publication from the public portal for the purpose of private study or research.
• You may not further distribute the material or use it for any profit-making activity or commercial gain
Efficient Implementation of
Homomorphic Cryptosystems

Marc X. Makkes
Efficient Implementation of
Homomorphic Cryptosystems

Master Thesis in Information Security Technology

Technische Universiteit Eindhoven


Wiskunde en Informatica

Marc Xander Makkes


Supervisor: Prof.dr. T. Lange
Committee members: Prof.dr. S. Etalle
dr.ir. L.A.M. Schoenmakers
Faculteit der Wiskunde en Informatica
Technische Universiteit Eindhoven
Eindhoven
5800 MB Eindhoven

The investigations were partially supported by the EU’s Seventh Framework Pro-
gramme (FP7), project CACE (Computer Aided Cryptography Engineering) un-
der contract number ICT-2008-216499.

Copyright c 2010 by Marc X. Makkes

Cover by Hans Kanters ( http://www.hanskanters.com/ )


iii

To my Parents, for always being there for me

Jaap Makkes & Lisselotte Makkes-Bock


Contents

Preface ix
Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x

1 Number Theory 1
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Mathematical Preliminaries . . . . . . . . . . . . . . . . . . . . . 1
1.2.1 Congruences . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2.2 Chinese remainder Theorem . . . . . . . . . . . . . . . . . 1
1.2.3 Orders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2.4 Carmicheals λ-function . . . . . . . . . . . . . . . . . . . . 4

2 Cryptographic Terminology 7
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2.1 Symmetric and Asymmetric Cryptography . . . . . . . . . 8
2.2.2 Malleable . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2.3 Random Self-Reducibility . . . . . . . . . . . . . . . . . . 11
2.3 Bases of Cryptography . . . . . . . . . . . . . . . . . . . . . . . . 11

3 The Paillier Cryptographic system 13


3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.2 The Paillier Cryptosystem . . . . . . . . . . . . . . . . . . . . . . 14
3.2.1 Composite Residuosity . . . . . . . . . . . . . . . . . . . . 15
3.2.2 The Paillier Cryptosystem . . . . . . . . . . . . . . . . . . 17
3.2.3 Homomorphic Properties . . . . . . . . . . . . . . . . . . . 18
3.3 Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.3.1 The CRT method . . . . . . . . . . . . . . . . . . . . . . . 19
3.3.2 Smart Choices for g . . . . . . . . . . . . . . . . . . . . . . 19

v
vi Contents

3.4 Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.4.1 Recommended Key-Sizes . . . . . . . . . . . . . . . . . . . 20

4 DGK-Crypto System 23
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.2 DGK cryptosystem . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.2.1 Small Message Space variant . . . . . . . . . . . . . . . . . 24
4.2.2 Proper Decryption Variant . . . . . . . . . . . . . . . . . . 25
4.2.3 Homomorphic Properties . . . . . . . . . . . . . . . . . . . 26
4.3 Security and Key-sizes . . . . . . . . . . . . . . . . . . . . . . . . 26

5 Simultaneous Multi-Exponentiation 29
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
5.2 Pre-computation . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
5.3 Simultaneous multi exponentiation for non-fixed bases . . . . . . . 30
5.3.1 Binary Left to Right-Method . . . . . . . . . . . . . . . . 30
5.3.2 The 2k -ary Methods . . . . . . . . . . . . . . . . . . . . . 32
5.3.3 2k -ary matrix exponentiation . . . . . . . . . . . . . . . . 32
5.3.4 Simultaneous Sliding Window method . . . . . . . . . . . 33
5.3.5 Simultaneous Sliding Window Matrix method . . . . . . . 34
5.3.6 Unsigned Fractional Windows . . . . . . . . . . . . . . . . 36
5.4 Fixed-base exponentiation . . . . . . . . . . . . . . . . . . . . . . 37
5.4.1 Pre-computation of squares of g . . . . . . . . . . . . . . . 37
5.4.2 Fixed Base Comb method . . . . . . . . . . . . . . . . . . 38

6 Speedup, Results and Implementation 41


6.1 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
6.2 Benchmarking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
6.2.1 Encryption . . . . . . . . . . . . . . . . . . . . . . . . . . 43
6.2.2 Decryption Benchmarks . . . . . . . . . . . . . . . . . . . 47
6.3 Implementation Details . . . . . . . . . . . . . . . . . . . . . . . . 49
6.4 Side Channel Attack Prevention . . . . . . . . . . . . . . . . . . . 49
6.5 API description . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
6.6 Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

7 Secure Multi-Party Computation 57


7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
7.2 Verifiable Secret Sharing . . . . . . . . . . . . . . . . . . . . . . . 58
7.3 Orlandi Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
7.4 Orlandi Implementation . . . . . . . . . . . . . . . . . . . . . . . 61
7.4.1 Optimization of TripleGen() . . . . . . . . . . . . . . . . 61
7.4.2 Benchmarks . . . . . . . . . . . . . . . . . . . . . . . . . . 62
7.5 Conclusion and Future work . . . . . . . . . . . . . . . . . . . . . 62
Contents vii

A Appendix 65
A.1 Unbalanced Benchmarks Fixed . . . . . . . . . . . . . . . . . . . 66

Bibliography 69

Index 77

Abstract 79
Preface

”Mathematics is concerned only with the enumeration and comparison of relations.”


– C.F. Gauß

The field of cryptology is one of the most intriguing subjects to study. Today’s
world is filled with lots of secrets that we do not want to share with each other
such as PIN’s, credit card numbers, and social identification numbers. One of the
goals in cryptography is to hide information by encoding or obscuring it, such
that when it travels over public channels like telephone lines, internet, etc, only
the receiving party or parties can decode or de-obfuscate data to read its original
message.

Outline
Chapter 1 gives an informal introduction to number theory. In Chapter 2 gives an
introduction to basic cryptographic terminology, as far as it is necessary to follow
the remainder of the text. Chapter 3 is concerned entirely with the description
of Paillier asymmetric homomorphic cryptosystem which includes an in depth in-
formation on the encrypting and decrypting process, parameter estimation, and
algebraic optimalization. In Chapter 4 we present the DGK cryptosystem and
explain it in detail. Chapter 5 is the main part of this thesis as it dicusses different
algorithms for simultaneous multi exponentiation which is the core for both the
Paillier and the DGK cryptosystems. Chapter 6 discusses the implementation
of both the Paillier and DGK cryptosystem together with all the algorthms of
Chapter 5. In addittion, the chapter shows the performance of every simultaneous
multi exponentions algorithm in context of the Paillier and DGK cryptosystem.
Also the chapter shows the implementation of counter messeures against simple
side-channel attacks. Finally, Chapter 7 presents collaborative work on the im-
plementation and modification of a secure multiparty computation protocol by
Orlandi which relies on homomorphic property of the Paillier cryptosystem. The
Partial results of chapters 6 to 7 where published in [40] and the extended version
in [41].

ix
x Acknowledgments

Acknowledgments
First and foremost I would like to express my gratitude to Tanja Lange, who
introduced me to the problem of optimization of cryptographic protocols and
has been an infinite source of advice. For giving me a glimpse of research and
for the many enjoyable discussions. Also for giving me the opportunity to at-
tend the ACNS 2010 in Beijing, SPEED-CC congress in Berlin and joining the
EIDMA/DIAMANT Cryptography Working Group meetings in Utrecht. Many
thanks also to Daniel Bernstein for joining the discussions and bringing advice to
the subject matter and joyful anecdotes. For advice, and interesting, useful and
often fun discussions I would like to thank Peter Schwabe.
In addition I would like to thank Janus Dam Nielsen and Thomas P. Jakobsen,
for great cooperation and testing the Paillier and DGK code and collaborative
work on the ACNS paper. In addition for some pointers and suggestions I would
like to thank Ivan Darmgård and Tomas Toft.
Thanks to Peter Schwabe, Sebastiaan de Hoogh and Ilse Groot for proof-
reading this thesis. All remaining errors are mine.
Also, I would like to thank Hans Kanters for letting me use one of his beautiful
paintings for my front cover.
Also I would like thank Sandro Etalle and Berry Schoenmakers for being on
the committee.
Furthermore, I want to thank Peter Fonts and Nicole Makkes, who advised
me to enter university in the first place. Then finally, I would like to thank my
family and friends for being who they are.
Amsterdam Marc X. Makkes
June, 2010.
Chapter 1
Number Theory

1.1 Introduction
In this chapter we state definitions and simple properties of the algebraic struc-
tures we shall use constantly in the remainder of this thesis. Readers who are
unfamiliar with the number theory we refer the to book [73] for a friendly intro-
duction.

1.2 Mathematical Preliminaries

1.2.1 Congruences
For a positive integer n, two integers a and b are said to be congruent modulo n,
written:
a ≡ b (mod n)

if their difference a − b is an integer multiple of n. The number n is called the


modulus of the congruence. This is equivalent to the notion that both numbers
a and b have the same remainder when divided by n.

1.2.2 Chinese remainder Theorem


1.2.1. Definition. Suppose m1 , . . . , mn are positive integers and pairwise co-
prime. Let a1 , . . . , an be integers, then there exists an integer x that solves the
system of simultaneous congruences.

1
1.2. MATHEMATICAL PRELIMINARIES

x ≡ a1 (mod m1 ) (1.1)
x ≡ a2 (mod m2 ) (1.2)
..
. (1.3)
x ≡ an (mod mn ) (1.4)

Furthermore, all solutions x to this system are congruent modulo the product
M
M = m1 · · · mn . So there is a unique yi ∈ Z for which yi ≡ ( mi
) (mod mi ) such
that:
n
X
x= ai yi mi (mod M )
i=0

This is known as the Chinese Remainder Theorem(CRT)[29]

1.2.3 Orders
In this part we assume that the reader has a basic understanding of algebraic
structure such as groups, rings and fields. Orders of multiplicative groups and
elements play an important role in cryptography.

Fermat’s Little Theorem


Fermat’s little theorem states that if p is a prime number, then for any integer a,
ap ≡ a (mod p) and ap−1 ≡ 1 (mod p) if and only if gcd(a, p) = 1.

Euler φ-function
This is also known as the Euler’s totient function. The totient φ(n) of a positive
integer n is defined to be the number of positive integers less than or equal to n
that are co-prime to n. For all numbers we can distinguish three cases.

1. Prime numbers where all numbers less than the prime number p are co-
prime, so φ(p) = p − 1.

2. Power of primes φ(pk ) = (p − 1)pk−1

3. Composite integers: n = pk11 · · · pkr r , φ(n) = (p1 − 1)pk11 −1 · · · (pr − 1)pkr r −1

This function is very important in the field of number theory, as the result of
the function is also the size of the multiplicative group of integers modulo n.
Next, we introduce order of group elements and their properties. Let G be a
group that is multiplicatively written with neutral element 1.

2
1.2. MATHEMATICAL PRELIMINARIES

1.2.2. Definition. Let g ∈ G. If there is a positive integer e with g e = 1, then


the smallest integer that satisfies the equation is called the order of g in G. If
such an e does not exist we say that the order of g in G is infinite. The order of
g ∈ G is denoted by ordG (g).

1.2.3. Theorem. Let g ∈ G and e ∈ Z then g e = 1 if and only if e is divisible


by the order of g ∈ G.

Proof: Let n = ordG (g) and write e = kn, then

g e = g kn = (g n )k = 1k = 1

Vice versa, let g e = 1 and e = qn + r with 0 ≤ r < n. Then

g r = g e−qn = g e (g n )−q = 1.

Because n is the least positive integer with g n = 1, and since 0 ≤ r < n, we


have r = 0 and therefore e = qn. Hence, n is a divisor of e, as asserted. 

1.2.4. Theorem. Let g ∈ G and let k, l be integers. Then g l = g k if and only if


l ≡ k (mod ordG (g))

Proof: We can set e = l − k and apply Theorem 1.2.3. 

1.2.5. Theorem. If g ∈ G is of finite order e and if n is an integer, then


ordG (g n ) = e/ gcd(e, n)

Proof: We have

(g n )e/ gcd(e,n) = (g e )n/ gcd(e,n) = 1

So Theorem 1.2.4 implies that e/ gcd(e, n) is a multiple of the order of g n .


Suppose
1 = (g n )k = g nk .
Then Theorem 1.2.4 implies that e is a divisor of nk. Therefore e/ gcd(e, n) is a
divisor of k, which implies that our assertion holds. 

1.2.6. Theorem. Finding a generator of Z∗p . If the factorization of p − 1 is


unknown, no efficient algorithm is known, but if p − 1 has known factorization,
it is easy to find a generator. Generators of Z∗ are relatively common (φ(n) ≥
n/(6 ln ln n) for n ≥ 5), so one can be found by searching at random for an
element g whose order is p − 1. (Note g has order p − 1 if g p−1 = 1 (mod p) but
g (p−1)/q 6= 1 (mod p) for all prime divisors q of p − 1.

3
1.2. MATHEMATICAL PRELIMINARIES

1.2.4 Carmicheals λ-function


Carmicheal defined the λ(N )-function as the smallest integer such that aλ(N ) ≡ 1
(mod N ) for all a relative prime to n. If N is a product of two distinct primes p
and q then λ(N ) = lcm(p − 1, q − 1), then ∀a ∈ (Z/N 2 Z)∗ the following relation
holds:

aλ(N ) ≡ alcm(p−1,q−1) ≡ 1 (mod N ) (1.5)


aN ·lcm(p−1,q−1) ≡ 1 (mod N 2 ) (1.6)
(1.7)

Let p and q be two distinct primes and let N = pq.

1. φ(N ) = (p − 1)(q − 1) =| Z∗N |


2. | Z∗N 2 |= φ(N 2 ) = N φ(N )

1.2.7. Definition. An element x ∈ Z∗N 2 is said to be an N th residue if there


exists another element y ∈ Z∗N 2 such that

y = xN (mod N 2 )

1.2.8. Theorem. Every N th residue has exactly N different N th roots in Z∗N 2

Proof: In every finite cyclic group G the equation xd = a has gcd(x, ord(G))
different solutions. Since the group Z∗N 2 is not a cyclic group we cannot prove
this property directly, but it can be applied to Z∗p2 and Z∗q2 having orders φ(p2 ) =
p(p−1) and respectively φ(q 2 ) = q(q−1). So from the equation y = xN (mod N 2 ),
consider the equations

y p ≡ xN (mod p2 ) (1.8)
and
y q ≡ xN (mod q 2 ) (1.9)
Equation (1.8) has gcd(N, p(p − 1)) = p different solutions and equation (1.9) has
gcd(N, q(q − 1)) = q different solutions. Using Theorem 1.2.1, these solutions can
be combined to pq = N different solutions modulo N 2 

1.2.9. Theorem. The set

T = {(1 + xN ) (mod N 2 ) : x ∈ Z∗N }

is a subgroup of Z∗N 2 of cardinality φ(N ). Every element in T has order N .

4
1.2. MATHEMATICAL PRELIMINARIES

Proof: Since T and ZN∗ have the same cardinality, | T |= φ(N ). In addition
it is easy to check that T is a multiplicative group. Note that for every element
in (1 + yN ) ∈ T the following relation is satisfied:

(1 + yN )z (mod N 2 ) = 1 + yzN (mod N 2 )


This implies that ∀(1 + yN ) ∈ T, (1 + yN )N = 1 (mod ()N 2 ), and then the
order of all the elements in T has to be a divisor of N . Apart from N itself the
only possible divisors of N are two primes p and q. Since by definition of T for
all elements (1 + yN ) ∈ T it holds that gcd(y, N ) ≡ 1 the only possibility for the
equation (1 + yN )z ≡ 1 (mod N 2 ) to be satisfied is for z = N . 

1.2.10. Theorem. Let y be an N th residue modulo N 2 and w ∈ Z∗N 2 such that


wN = y (mod N 2 ). Consider the following set

R = {(1 + xN )w (mod N 2 ), x ∈ ZN

Then there exists only one element in R that is smaller than N

Assume that w = a + bN, a 6= 0. An element of R can be written as

(a + bN )(1 + xN ) ≡ a + (ax + b)N (mod N 2 )

Since w ≡ a (mod N ) is an invertible element of ZN , we can choose x as the


following

x ≡ −(a−1 )b (mod N ) (1.10)


and obtain

a + (ax + b)N ≡ a + (−a−1 ba + b)N ≡ a (mod N 2 )


The uniqueness comes from the fact that equation 1.10 has a unique solution.

5
1.2. MATHEMATICAL PRELIMINARIES

6
Chapter 2
Cryptographic Terminology

2.1 Introduction
The word cryptography comes from two Greek words; κρυπτ ωσ kryptos (hid-
den) and γραφω graphein (to write) and has a history going back for more than
4000 years. Today, cryptography is the art and science of making communica-
tions unintelligible to all except the intended recipients. The bases of public key
cryptographic systems are found in number theory. In this section we describe
the basic terminology of cryptography. This terminology is used to understand
the remainder of this thesis. Advanced readers, who are already familiar with
basic cryptographic terminology are advised to skip this chapter and move on to
Chapter 3.

2.2 Terminology
In the field of cryptography there are many situations described where parties
want to communicate with each other. The basic action is that a party wants
to send a message to another party neither revealing the contents of the message
nor information about contents of the message, while the message is transit. To
obscure the sent message cryptography is used, and there are according defini-
tions:

• The set M is known as the message space or plaintext space, all elements
in this set are considered to be messages,

• The set C is the ciphertext space, all elements in this set are called cipher-
texts,

• Key space K is the set of all possible keys, an element k from this space is
known as a key,

7
2.2. TERMINOLOGY

• Encryption function E(k, m) this is a function which maps an element m


from message space M to an element c in ciphertext space C for a given
key k ∈ K,

• Decryption Function D(k, E(k, m)) is a function which maps an element in


the ciphertext space C into an element in the message space M for a given
k ∈ K.

2.2.1 Symmetric and Asymmetric Cryptography


There are 2 basic classifications for cryptographic systems; symmetric and asym-
metric cryptography. The main difference between symmetric cryptosystem with
respect to asymmetric cryptosystem is that it uses the same key for encrypting
and decrypting while asymmetric cryptosystems uses two different keys for en-
crypting and decrypting. So for a symmetric cryptosystem to encrypt and decrypt
en message m ∈ M under key k ∈ K it is defined as follows: Dk (Ek (m)) = m.
There are many well known symmetric cryptosystem such as AES [24], Blow-
fish [69], RC5 [67]. They are wildly used and can be found in machines the we use
in our every day lives, for example we find these ciphers in public transport chip-
cards and mobile phones. A symmetric cryptosystem is ideal for encrypting and
decrypting long messages because the message space is vast and the encryption
speed is high when comparing to asymmetric cryptosystems.
Symmetric cryptosystems require that key k that is distributed to all com-
municating parties remain secret and trusts that every communicating party will
keep the key k secret. Trusting other parties for keeping key k secret can be a
weak spot that asymmetric cryptography solves.
In Asymmetric cryptography the key consists of two parts, a public part kpub ∈
K and a private part kpriv ∈ K, where kpriv is the inverse of kpub . This means
that the transformation that is done by the kpub can only be undone by kpriv .
These keys are known as the public-key and the private-key of the asymmetric
cryptosystem.
The public key part is distributed amongst communicating peers and the
private part is kept to one self. Then the party receiving an encrypted message
can decrypt by the following Dkpriv (Ekpub (m)) = m. Some cryptosystems use
special properties of elements as there keys as in chapters 3 and 4.
The advantage of asymmetric cryptography with respect to symmetric cryp-
tography is that one can distribute keys relatively easily and often only one key
is used for communicating with many peers, as the public key can be disclosed
to the public. The disadvantage is that it can only be used on short messages as
computation per bit is relatively high.
In many implementations symmetric and asymmetric cryptography used to-
gether to set up and use a secure communication channel between two parties.

8
2.2. TERMINOLOGY

Asymmetric cryptography is used to setup a key which is then used by symmetric


cryptography to make the secure channel.
The focus of this thesis is only on asymmetric cryptography and for readers
who are interested in the subject of symmetric cryptographic we refer to [75]
Chapters 2 till 7.

Probabilistic Encryption
We call a cryptosystem deterministic if every encryption of the same message m
results in the same ciphertext c encrypted under the same key k. If determinism
is a property of a cryptosystem then it leaks information if a same message is sent
twice using the same key. An example of such an encryption system is the RSA
system, where the message m is raised to a special exponent e, i.e. the ciphertext
c becomes c ≡ me (mod N ) and because e is fixed, encrypting the same message
m twice will always result in the same ciphertext c.
A non-deterministic cryptosystem or probabilistic cryptosystem which is a
system where the encryption of a message will result in a different ciphertext for
each time the same message is encrypted. Decryption on the other hand will
always result in the same message m that was encrypted. The non-determinism
comes from the fact that a random number (a nonce) is taken in to the equation
of the encryption function when decrypting, the nonce is taken out due to some
properties of chosen values as we see later on in this thesis.

Computational Secure
We call a cryptosystem computationally secure, if the practical side of attacking
the cryptographic system is infeasible. In this security model, we do not grant
an adversary unlimited computational resources. Instead, we are concerned with
the amount of computation required to break the security of a system. We say
that a system is computationally secure, if the level of computation necessary to
defeat it exceeds the computational resources of any hypothetical adversary by a
comfortable margin. The adversary is thereby allowed to use the best known at-
tacks against the system. Closely related are the concepts of complexity-theoretic
security and provable security.
In this thesis, we will use the term computational security to include both
notions. In complexity-theoretic security, the adversary is modelled as having only
polynomial computational power. This means that any attacks involve time and
space polynomial in the size of the underlying security parameters of the system.
In the setting of provable security, the difficulty of defeating the system’s security
is proven to be as difficult as solving a well-known problem which is thought to
be hard. Note that this does not prove the protocol to be unconditionally secure,
but only makes a statement of equivalence between the security of the protocol
and a hard to compute problem. In practice, these are often number-theoretic

9
2.2. TERMINOLOGY

problems such as factoring and the discrete logarithm problem.

Semantically Secure
A cryptosystem is called semantically secure if it infeasible for a computationally
bounded adversary to derive significant information about a plaintext when only
given its ciphertext and the corresponding public encryption system. Semantical
security considers only that the adversary is ”passive”, i.e. an adversary who only
collects ciphertexts and network traffic.

Chosen Plaintext Attack


The Chosen Plaintext Attack(CPA) is an attack on symmetric cryptosystems in
such a way that the adversary can choose an arbitrary plaintext to be encrypted
and obtains the corresponding ciphertext, in order the gain information about
the secret key, this reduces the security of the symmetric cryptosystem. There
are two types of chosen-plaintext attacks:

• Non adaptive chosen-plaintext attack is where the adversary chooses all the
plaintexts on beforehand before any plaintext is encrypted. This is also
known as ”batched” chosen-plaintext attack.

• Adaptive chosen-plaintext attack is where the adversary makes a series of


interactive queries, subsequent choices of new plaintexts are based on the
information gathered from the previous encryptions, in order to gain infor-
mation about the secret key.

Chosen Ciphertext Attack


A Chosen-Cyptertext Attack(CCA) as opposed to chosen plaintext attack is an
attack model in which the adversary gathers information by choosing a ciphertext
and obtaining its decryption under an unknown secret key. There are two distinct
types of Chosen Ciphertext Attacks

• CCA1. ”Lunchtime attack” or ”non-adaptive” Chosen Ciphertext Attack


(CCA1). This notion was first introduced by Cramer and Shoup in [23].
The adversary may make adaptive chosen-ciphertext queries only up to
a certain point. After the attack, the adversary must demonstrate some
improved ability to attack the system.

• CCA2 Adaptive Chosen Ciphertext Attack (CCA2) [8], Is a special case


of CCA1 in which the adversary is able to choose plaintext samples dy-
namically, and alter his or her choices based on the results of previous
encryptions. The only limitation of the adversary is that he or she may not
query the challenge ciphertext is self.

10
2.3. BASES OF CRYPTOGRAPHY

2.2.2 Malleable
An encryption algorithm is malleable if it is possible for an adversary to transform
a ciphertext into another valid ciphertext which decrypts to another plaintext.
Example; if an adversary intercepts a ciphertext c, he or she is able to trans-
form c with use of some function f (c) without necessarily knowing or learning
the encrypted message. The receiver is still able to decrypt the altered message
without knowing that it is altered.
A cryptosystem may be semantically secure against chosen plaintext attacks or
even non-adaptive chosen ciphertext attacks (CCA1) while still being malleable.
However, security against CCA2 is equivalent to non-malleability.

2.2.3 Random Self-Reducibility


An algorithm is said to be randomly self-reducible if the average case for solving a
certain problem is as hard as solving the worst case. The field of cryptography uti-
lizes the fact that certain number-theoretic functions are randomly self-reducible
to create a cryptographic system. An example of such a randomly self-reducible
problems is the discrete logarithm problem, i.e. solving g x = b mod p without
knowing x. Thus, Random Self-Reducibility provides assurance that an average
case or random case are hard to solve.

2.3 Bases of Cryptography


In 1877, Jevons published the book[42] titled ”The principles of science: a treatise
on logic and scientific method” in which he observed a trapdoor function; a sit-
uation where the ’direct’ operation is relatively easy, but the ’inverse’ operation
is significantly more difficult, without knowing special information. Later this
became the basic property of every public cryptosystem. The findings of Jevons
where specifically targeted at the factorization problem which many decades later
is found in the well known and established RSA [68] system.

2.3.1. Theorem. Integer Factorization Problem Is the problem of finding


the prime factorization of positive integer n. We write n = pe11 pe22 · · · pekk where
the pi are pairwise distinct primes and each ei ≥ 1.

A year before the release of the RSA cryptosystem, Diffie and Hellman came
up with a scheme for exchanging keys based on public and private parameters,
this system is known as the Diffie-Hellman key exchange [28] which relies on the
Discrete Logarithm Problem (DLP).

2.3.2. Theorem. Discrete Logarithm Problem (DLP). For a given prime p,


a generator g of Z∗p and an element a of Z∗p there is no polynomial time algorithm
for finding an integer x with 0 ≤ x ≤ p − 2 such that g x ≡ a mod p.

11
2.3. BASES OF CRYPTOGRAPHY

Seven years after the release of the Diffie-Hellman key-exchange, Elgamal


presented a cryptosystem [33] based on principles of the Diffie-Hellman key-
exchange.
The main difference between the integer factorization problem and the discrete
logarithm problem based schemes can be described as follows: schemes based on
the integer factorization related assumptions is to take advantage of the fact
that integer factorization problem is a conjecture trapdoor function with the
underlying idea that it is infeasible to extract polynomial roots over finite fields
when trapdoor information (i.e. secret key) is unavailable and computationally
intractable.
While the discrete logarithm is a one-way function, and relies on the fact that
it is infeasible to calculate the discrete log in a finite field setting. Most of the
cryptographic systems use a Diffie-Hellman[28] variant to securely encrypt and
decrypt information. The Discrete Logarithm schemes can take advantage of the
homomorphic properties of the exponentiation function.
Many cryptographic systems today can be categorized as being based on the
integer factorization problem or the discrete logarithm problem. There are of
course cryptographic systems proposed that are based on other problems, many of
these systems are broken. For example the knapsack-type schemes[53] which were
subsequently broken by [70], the lattice based cryptographic system[35] broken
by [56].
In addition there are other cryptographic systems but those have not been
embraced by the research community and the industry and need more attention
from the research community, among those is the McEliece cryptographic system
[52] which is based on Goppa codes and cryptosystems [6] that are based on braid
groups. In the end systems that are based on the integer factorization problem or
discrete logarithm based schemes are ”trusted” and only these systems are wildly
deployed and used in our every day lives.

12
Chapter 3
The Paillier Cryptographic system

This chapter describes in depth the public key cryptosystem presented by Paillier
at Eurocrypt ’99 [60].

3.1 Introduction
In recent years a new direction of research started to find cryptographic trap-
door function with homomorphic properties. These developments where known
as trapdoor discrete log and arose from the algebraic setting of high degree resid-
uosity classes and came first to light in the Goldwasser-Micali [36] scheme, where
the message space is a ring M of a modular residue and ciphertexts are in the
multiplicative group G of invertible elements of some particular ring of integers
modulo a hard to factor number.The encryption of a message m is always a group
element of the form E(m, r) = g m re ∈ G where e is some public integer, g a fixed
public element in G and r is chosen at random in some particular multiplica-
tive subgroup R of G. Since R is a subgroup, such schemes have the additive
homomorphic property (e.g. encryption of m1 + m2 can be obtained as follows
E(m1 + m2 , r1 r2 ) = E(m1 , r1 )E(m2 , r2 ).
Goldwasser and Micali based their scheme on quadratic resiudues and selected
M = Z2 , G = R = Z∗n where N = pq is a RSA modulus, e = 2 and the base g
as a pseudo-square modulo N . The sematic security follows from the quadratic
residuosity assumption.

3.1.1. Definition. Quadratic Residuosity Assumption: Let N be a prod-


uct of two large primes p and q. If y ≡ x2 (mod N ) has a solution, i.e. there
exists a square root for y, then y is called as a quadratic residue modulo N . Let
the symbol QRN denotes the set of all quadratic residues in [1, N − 1]. Then the
quadratic residuosity assumption can be described as follows. Suppose y ∈ QRN .
It is computationally infeasible to decide whether y is a without the knowledge
of p and q due to the difficulty of factoring N .

13
3.2. THE PAILLIER CRYPTOSYSTEM

The small message space, i.e Z2 limits the bandwidth of the Goldwasser-
Micali scheme. The Benaloh-Fischer scheme [9] later improved the bandwidth
of the Goldwass-Micali scheme by using higher-order residues: It basically is
the same scheme but uses M = Ze for the message space, e is a small prime
number dividing φ(N ) such that e2 does not divide φ(N ) and g is a non e-th
residue modulo N . The semantic security is proven under the prime residuosity
assumption. Despite being secure, the scheme is inefficient as the decryption
involves some kind of exhaustive search, implying that e must be small. Naccache
and Stern [55] proposed a variant of the Benaloh-Fischer scheme which allows high
bandwidth. This is achieved by taking e not as a prime but as a product of small
primes e1 , . . . , e2 such that φ(N ) is divisible by ei but none of the e2i , and g is
an ei -th non-residue modulo N , for all i. The exhaustive search is still needed in
order to decrypt the message.
Okamoto and Uchiyama [57] significantly extended the encryption rate by
investigating two different approaches: residuosity of smooth degree in Zpq and
residuosity of prime degree p in Zp2 q instead of Zpq for R, G. Use Zp for message
space M and choose g such that the order of g p (mod p) is p. The scheme reaches
bandwidths similar to Naccache Stern crypto system, but is more efficient in
decrypting and is semantically secure under the P-Subgroup assumption.

3.1.2. Definition. P-Subgroup assumption: Let p and q be large primes and


N = p2 q. There is no polynomial time algorithm that can determine whether a
random element x in Z∗N is in the subgroup of order p − 1 in Z∗N without knowing
p or q.

3.2 The Paillier Cryptosystem


Paillier proposed in [60] an extension of the Okamoto-Uchiyama scheme. The
main differences are the modulus N , which is a RSA-modulus (N = pq), the
working group which is M = Z∗N 2 and R = Z∗N and g is an element of order di-
visible by N . This scheme can be proven semantically secure under the decisional
composite residuosity assumption.

3.2.1. Definition. Decisional Composite Residuosity Assumption Given


N without its factorization, there is no polynomial time algorithm that can decide
whether an element in Z∗N 2 is an N th power of an element in Z∗N 2 .

The Decisional Composite Residuosity Assumption is denoted in [60] as CR[N ],


in this thesis we keep this notation. The resulting cryptographic scheme is more
efficient with respect to the schemes previous mentioned. In addition there are
no known adaptive chosen-ciphertext attacks for the Paillier scheme, to recover
to recover any message.

14
3.2. THE PAILLIER CRYPTOSYSTEM

3.2.1 Composite Residuosity


The Paillier cryptographic scheme heavily relies on the computation of composite
residuosity classes. In order to compute these classes we need the following: Let
g ∈ Z∗N 2 and consider the function:

Eg : ZN × Z∗N → Z∗N 2

defined as follows

Eg = g x y N (mod N 2 ) (3.1)

Paillier proved the following in his paper:

3.2.2. Lemma. If the order of g is a non zero multiple of N than Eg is a bijection.

Proof: Since the sets ZN × Z∗N and Z∗N 2 have the same cardinality we just need
to prove that Eg is injective. Assume we have

0
g x y N ≡ g x y 0N (mod N 2 ) (3.2)

If both sides of the equation 3.2 are raised to the power λ(N ) we get:
0
(g x y N )λ(N ) ≡ (g x y 0N )λ(N ) (mod N 2 )
0
g x·λ(N ) y N ·λ(N ) ≡ g x ·λ(N ) y 0N ·λ(N ) (mod N 2 )

By applying Carmichel’s Theorem 1.5 it can be shown that:


0
g xλ(N ) ≡ g x λ(N ) (mod N 2 ) (3.3)

Note that since g has order multiple of N and gcd(N, λ(N )) = 1, g λ(N ) has
order N . Consequently it can be written as (1 + zN ) for some z ∈ ZN , with z 6= 0
and becomes:
0
(1 + zN )x ≡ (1 + zN )x (mod N 2 )

This implies that x ≡ x0 (mod N ) and we can rewrite equation (3.2) to:

y N ≡ y 0N (mod N 2 )
y
( 0 )N ≡ 1 (mod N 2 )
y

By applying Theorem 1.2.10 this is satisfied for y ≡ y 0 (mod N ) 

15
3.2. THE PAILLIER CRYPTOSYSTEM

Classes
We denote the subgroup Bα ⊂ Z∗N 2 as the set of elements of order N α with α 6= 0
by B their disjoint union for α = 1, · · · , N, α | (N ).

3.2.3. Definition. Assume that g ∈ B, then for an element w ∈ Z∗N 2 , we call


the n-th residuosity class of w with respect to g the unique integer x ∈ ZN for
which there exists y ∈ Z∗N such that

Eg (x, y) ≡ g x · y n ≡ w (mod N 2 )

In Benaloh’s notation [9], the class of w is denoted JwKg . In [60], Paillier


defines the n-th Residuosity Class Problem and proves as follows: for a given
base g and a random element w ∈ Z∗N 2 , the N -th Residuosity Class Problem is
the problem of computing the class to which w belongs with respect to the base
g. This problem is denoted as Class[N, g].

3.2.4. Theorem. Let g ∈ B. For every w ∈ Z∗N 2 , JwKg = 0 if and only if w is


an N -th residue modulo N 2

Proof: if JwKg = 0, e.g. w = g 0 y N (mod N 2 ), then it is an N -th residuo


modulo N 2 . Then let w be an N -th residue modulo N 2 . If y is less than N
we are done, because it is a permutation of Eg . If y > N , it can be written as
y = (a + bN ) with a ∈ ZN , therefor w = y N = (a + bN )N ≡ aN (mod N 2 ) and
so JwKg = 0. 

3.2.5. Theorem. For every w ∈ Z∗N 2 , the function that is associates to w its
corresponding class JwKg is a homomorphism from (Z∗N 2 , ×) to (ZN , +).

Proof: For a given w1 ≡ g a xN (mod N 2 ) and w2 ≡ g b y N (mod N 2 ), then


w1 w2 ≡ g a+b (xy)N (mod N 2 ). Since a, b, < N and a + b < 2N . We can rewrite
a + b as (a + b (mod N )) + cN where c can be either 0 if a + b < N and otherwise
1. If we now write the equation as w1 w2 = g a+b (xy)N = g a+b (mod N ) (xyg c )N
(mod N 2 ) and this implies Jw1 w2 Kg = Jg a+b (xycc )N Kg ≡ a + b (mod N ) and also
proves that Jw1 Kg + Jw2 Kg ≡ a + x (mod N ). 

The L-function
Consider the following set

SN = {u < N 2 | u ≡ 1 (mod N )}

The L-function is defined as follows:


u−1
L(u) = , ∀u ∈ SN (3.4)
n

16
3.2. THE PAILLIER CRYPTOSYSTEM

3.2.6. Lemma. For any w ∈ Z∗N 2 , L(wλ(N ) ) = λJwK1+xN (mod N ).

Proof: Since 1 + N ∈ B, there exists a unique pair (a, b) in the set ZN × Z∗N
such that w = (1 + N )a bN (mod N 2 ). By definition, a = JwK1+N . Then

wλ ≡ (1 + N )aλ bN λ
≡ 1 + aλ(N )N (mod N 2 )

3.2.2 The Paillier Cryptosystem


Paillier has presented multiple closely related cryptosystems [60, 61]. In this
thesis we will focus on the main- and subgroup variants of these cryptosystems.
The main variant uses the RSA trapdoor, while the subgroup variant exploits of
a smaller subgroup within Z∗N 2 .

Main Variant

Key Generation. Let N be a RSA modulus N = pq, where p and q are large prime integers.
Let g ∈ Z∗N 2 and let the order of g be a multiple of N . Let λ = lcm(p −
1, q − 1). The public key is (g, N ), and the private key is λ.

Encryption. To encrypt a message m ∈ ZN , randomly chose r ∈ Z∗N and compute the


ciphertext c = g m rN mod N 2 .
λ
L(c (mod N ))2
Decryption The decryption of c is defined by L(g λ (mod N 2 )) (mod N ) Where the L(u)
u−1
function is defined as N and takes inputs of SN = {u < N 2 | u ≡ 1
mod N }.

Figure 3.1: The Paillier cryptosytem

Next, we show why the Paillier main scheme works. Let message m ∈ ZN and
random r ∈ ZN . Now encryption of message m is c = g m rN (mod N 2 ). Now the
L(cλ (mod N 2 )) m h λ
decryption works as follows. L(g λ (mod N 2 )) (mod N ) = L((g r ) (mod N 2 ))(L(g λ
(mod N 2 )))−1 (mod N ) and due to equation (1.5) we get L(g mλ (mod N 2 ))(L(g λ
(mod N 2 )))−1 (mod N ). Now, applying the L-function will result in the original
plain text (λm)(λ)−1 = m.

Subgroup Variant
The subgroup variant is slightly different as it computes residues in a subgroup
of λ(n).

17
3.2. THE PAILLIER CRYPTOSYSTEM

Key Generation Let N be a RSA modulus N = pq, where p and q are large prime integers.
Let λ = lcm(p − 1, q − 1) and choose α such that it divides λ. Let h ∈ Z∗N 2
such that it has maximal order of N λ, and g ≡ hλ/α mod N 2 . The public
key is (g, N ), and the private key is α

Encryption To encrypt a message m ∈ ZN , randomly chose r ∈ Z∗N and compute the


ciphertext c ≡ g m+r·N (mod N 2 ).
α
L(c (mod N )) 2
Decryption The decryption of c is defined by m ≡ L(g α (mod N 2 )) (mod N ) where the
u−1
L(u) function is defined as N and takes inputs of SN = {u < N 2 | u ≡ 1
(mod N )}.

Figure 3.2: The subgroup variant.

The Paillier subgroup variant : Let message m ∈ ZN and random nonce r <
Z∗N ,and the encryption c ≡ g m+r·N ≡ g m (g N )r (mod N 2 ). Because we are work-
ing in the subgroup variant g ≡ hλ/α (mod N 2 ) we write c as (hλ/α )m (h(λ α)N )r
(mod N 2 ).
In the decryption criphertext c is raised to α and due to equation (1.5) this
becomes ((h(λ/α)m )α (h(λ/α)N )rα ) ≡ hλm · hλrN ≡ hλm · 1 (mod N 2 ). Which results
λm (mod N 2 ))
in L(h
L(hλ (mod N 2 ))
≡ mλλ−1 ≡ m (mod N ), as seen before in the Paillier’s main
variant.

3.2.3 Homomorphic Properties


For both the main- and subgroup- variant of the encryption function, i.e. Emain (m) =
g m nr (mod n2 ) and Esubgroup (m) = g m+r·n (mod n2 ), are homomorphic on Zn ,
and leads to the following identities:

∀m1 , m2 ∈ ZN and k ∈ N

D(E(m1 )E(m2 ) (mod N 2 )) = m1 + m2 (mod N )


D(E(m1 )g m2 (mod N 2 )) = m1 + m2 (mod N )

D(E(m1 )m2 (mod N 2 ))
= m1 m2 (mod N )
D(E(m2 )m1 (mod N 2 ))
D(E(m)k (mod N 2 )) = km (mod N )

These homomorphic properties makes that the Paillier scheme only secure
against Chosen Ciphertext Attack (CCA1), as the homomorphic properties allow
computation on ciphertexts without knowing the context. This implies that the
Paillier cryptosystem is malleable. In order the make Paillier cryptosystem with

18
3.3. OPTIMIZATION

homomorphic properties secure against CCA2, additional information, such as


signatures or MAC’s are needed to prevent malleability of the ciphertext. Paillier
also proposed a cryptosystem in [60] that is secure against CCA2 but lacks the
homomorphic properties.

3.3 Optimization
3.3.1 The CRT method
In the RSA cryptosystem, CRT can be applied to speed up the decryption process.
This can also be applied to the Paillier cryptosystem as suggested in [60]. Using
factors p and q we can define the following functions:

x−1 x−1
Lp = and Lq =
p q
Decryption can be made faster by separately computing the message modulo
p2 and modulo q 2 and the recombining modular remainders afterwards using the
CRT Theorem 1.2.1.

mp = Lp (cp−1 (mod p2 ))hp (mod p)


mq = Lq (cq−1 (mod q 2 ))hq (mod q)
m = CRT(mp , mq ) (mod pq)

with hp and hq :

hp = Lp (g p−1 (mod p2 ))
hq = Lq (g q−1 (mod q 2 ))

Because all computation happens in a smaller group, operations on elements


within the group are usually faster. The CRT decrypting can also be applied to
the subgroup variant, p − 1 and q − 1 are replaced by α. In addition, comput-
ing L(u) for u ∈ S can be achieved by only one multiplication modulo 2|N | by
pre-computing N −1 (mod 2|N | ). This can also be used in for p and q in CRT
decrypting if and only if p and q are same size in bits.

3.3.2 Smart Choices for g


If using the Paillier main scheme with p, q of equivalent length, a simpler variant
of the key generation steps is possible. If g is set to n + 1, then computations for
encrypting will just be (1 + mN ) · rN (mod N 2 ) as seen in Theorem 1.2.9. This

19
3.4. SECURITY

also allows λ to be equal to φ(N ), and u in equation (3.4) can be set to φ(N )−1
(mod N ).
So encrypting becomes basically just one exponentiation and a multiplication
modulo N 2 . For decrypting the only requirement is evaluation of two exponenti-
ations modulo p and q. Since all the values in the exponentiations are fixed we
can make use of addition chains. The problem of computing the optimal addition
chain that it is a N P hard problem. An extensive survey is done by Bernstein in
[13].

3.4 Security
In order for an adversary to break the Paillier cryptosystem, i.e. to know the
factorization of N = pq such that the secret key λ(p − 1, q − 1) can be calculated,
an adversary has to invest both time and money to buy computers or chips
to compute factorization of N within a reasonable amount of time. In order
to prevent this factorization by the adversary we need to choose N as large as
possible so that it takes an infeasible amount of time. As the size of N affects the
speed of the encryption and decryption of the Paillier function it is necessary to
choose N big enough so that it will not be factored by the adversary and small
enough so that the cryptosystem is usable.

3.4.1 Recommended Key-Sizes


The best known algorithm with asymptotic running time for factoring the mod-
ulus N is the number field sieve (NFS) algorithm [4]. The NFS algorithm exists
for over 25 years.
There have been long standing research on how much time and effort it will
cost to break a system of certain length using basic computers [79, 45, 43]. As
of this writing, it is considered that it will cost roughly 1 billion Euro’s and
a year computation for factoring a 1024-bit number on normal PC’s. Besides
the software implementation the NFS can be implemented in ASIC [71, 46, 32]
which cost considerably less. Current recommendations for RSA key sizes 1536
bits or even larger, are based directly on implementation and speed of the NFS
algorithm.
Depending on the time for which one or multiple secrets has to be secret, and
the value of the secret(s), the length of N has to be chosen accordingly. That is
why this thesis adopting the ECRYPT II recommendation’s [30].
The security of parameter α in the Paillier subgroup variant relies on the fact
that its untraceable, i.e. determining α from g and N without known h, λ, and
α. Hence, the size of α can be chosen freely, it should be noted that the size of α
spans the size of the message space.

20
3.4. SECURITY

In addition if large quantum computers are built then they will break the
Paillier cryptographic scheme and all other cryptographic system that rely on
the problem of factorization. See [72] for details of Shor’s quantum factorization
algorithm.

21
Chapter 4
DGK-Crypto System

4.1 Introduction
In past decades there have been quite some new homomorphic cryptographic
schemes with multiplicative homomorphic properties [68, 33] and additive ho-
momorphic properties [50, 36, 9, 60]. It is often suggested that choosing large
subgroups of ZN increases security. But when reducing the order of the subgroups
it is possible to obtain similar but more efficient schemes while the underlying
assumption is the strong RSA assumption.
For example, the Paillier schemes have message space M = ZN and work
optimally when a message m is of the same size as the random number r. If the
message m is much smaller than r, say only a few bits, then computation of the
exponentiation becomes a lot of overhead. The smaller message space is often used
in secure multi-party computation(SMP). One way of reducing pre-computation is
to scan the exponent and determine which values are needed to pre-computation
of the auxiliary table. Another problem with the Paillier scheme is that it uses
N 2 as modulus when encrypting. In 1978 Rivest, Shamir, and Adelman showed
a cryptosystem that exploited the order of composite modulus N = pq, i.e. such
that the message m is raised by kφ(N ) + 1 to get back to its original form. In
2005, Groth presented a cryptographic scheme[38] that exploits hidden subgroup
of Z∗N . The system has two base elements with special order. The first element
has a multiple of the order of the second element as its order. When raising the
multiplication of the two elements to the order of the second element, only the
first element remains and the message can be extracted. The message space M
of the Groth cryptosystem is the size of the subgroup.
In [25] Damgrad, Geisler and Kroigard presented their cryptosystem, named
DGK and is heavily based on Groth hidden subgroup scheme. The DGK cryp-
tosystem is in its original form described in [25] and the correction [26] rely
on an auxiliary table for decryption of the ciphertext. In this thesis we present
an implementation of the DGK system that uses this auxiliary table as well as a

23
4.2. DGK CRYPTOSYSTEM

decryptable version of the corrected DGK system with is suggested in [27]. This
chapter describes the DGK homomorphic cryptosystem.

4.2 DGK cryptosystem

The DGK cryptosystem is a cryptosystem that is optimized for secure comparison


of integers in secure multiparty computation, i.e it has lower computational effort
when compared to Paillier or other additive homomorphic cryptosystems. The
DGK cryptosystem is based on the Strong RSA subgroup assumption:

4.2.1. Theorem. Strong RSA Subgroup Assumption. Let K be a key gen-


eration algorithm that produces a RSA subgroup pair (N, g). The strong RSA
subgroup assumption for this key generation algorithm is that it is infeasible to
find u ∈ Z∗N , w ∈ G and d, e > 1 such that g ≡ uwe (mod N ) and ud ≡ 1
(mod N )

The DGK cryptosystem actually uses two subgroups, where one subgroup is
contained in the other. The consequence of working in a subgroup of φ(N ) is
that the message space M is also smaller (e.g. 16-bits suggested in the original
paper [25]). The other subgroup should contain the message space and have
group order of around 160 bits for key of length 1024-bits
The DGK cryptosystem has two variants. The first one we call the Small
Message Space (SMS) variant. The second is the proper decryption variant.

4.2.1 Small Message Space variant

Security Parameters

The DGK needs three parameters k, t and l with k > t > l in order to generate
keys. The first parameter is k which is the size in bits of the RSA modulus N ,
the second parameter t is the size of the two small primes vp and vq . The last
parameter l is used for the message space size in bits.

24
4.2. DGK CRYPTOSYSTEM

Key-generation Construct two t-bit primes vp and vq , and two distinct primes p and q of
equal bit length such that vp | p − 1 and vq | q − 1. Then choose an l-bit
prime u and an element g ∈ Z∗N with order uvp vq and choose h to have
order vp vq . The public key is (N, g, h, u) and the private key is (p, q, vp , vq ).
In addition an auxiliary table is generated of tuples (g vp vq )i for 0 ≤ i ≤ u
and i itself are stored.

Encryption The encryption is as follows given message m and a randomly chosen nonce
r is chosen, the ciphertext is c = g m hr (mod N ).

Decryption The decryption is as follows: cvp vq (mod n) is searched in the auxiliary


table, the returning index number i is the message.

Figure 4.1: The DGK cryptosystem

Now we show why this system works. Let message m < u and random nonce
r ∈ ZN , then the encryption c = g m hr (mod N ). The decryption of c is cvp vq ≡
(g m hr )vp vq ≡ (g m )vp vq (hr )vp vq (mod N ). Due the fact that h has order vp vq this
becomes 1. This leaves (g vp vq )m (mod N ), as our auxiliary table holds tuples of
{(g vp vq )i , i)} for 0 ≤ i ≤ u we can easily find the corresponding message of g vp vq m .

4.2.2 Proper Decryption Variant

The small message space variant of the DGK system can be turned in to a full or
proper decryption variant as opposed to using an auxiliary table. The auxiliary
table basically limits the message space as we have to store every individual
ciphertext and it corresponding message. Today’s disk space will not be sufficient
for storing the auxiliary table if the message space is chosen to be large. Hence,
in order to use the full message space there has to be a decryption algorithm
does not use an auxiliary table and this can be achieved by carefully selecting the
parameters in the system.

25
4.3. SECURITY AND KEY-SIZES

Key-Generation Two primes p and q which are generated of the form 2uvp r = p − 1 and
2uvq r = q − 1, with vp and vq prime and u = 2l , with r being the size in
bits of the message space, and l a random number. We set N = pq, and
an element g is generated such that it has order 2l . Finally, element is h
chosen such that the order is vp vq .

Encryption The plaintext m and random nonce r are used to compute c ≡ g m hr


(mod N )

Decryption The decryption is basically done by raising c to the power vp vq , because


cvp vq has order 2l we can use the Pohlig-Hellman [65] to decrypt c.

Figure 4.2: DGK proper decryption variant

The proper decryption variant work because the following: Decrypting is


cvp vq ≡ (g m hr )vp vq ≡ (g m )vp vq (hr )vp vq (mod N ). Because hvp vq r (mod N ) has
multiple of the maximal order, its becomes 1. This leaves (g m )vp vq (mod N ) and
because g vp vq (mod N ) has order of 2l we can apply the Pohlig-Hellman algorithm
to recover the plaintext.
Pohlig Hellman algorithm can compute the plaintext with in O(l) modular
multiplications as the order of cvp vq is 2l . It should be noted that the computa-
tional penalty for finding an element of order 2l can be quite large. That is why
we recommend the Paillier cryptosystem when a larger message space is needed.

4.2.3 Homomorphic Properties


The DGK is additively homomorphic and has the same homomorphic properties
as the Paillier cryptosystems.

∀m1 , m2 and k ∈ ZN

D(E(m1 )E(m2 ) (mod N )) = (m1 + m2 )vp vq (mod N )


D(E(m1 )g m2 (mod N )) = (m1 + m2 )vp vq (mod N )

D(E(m1 )m2 (mod N ))
= (m1 m2 )vp vq (mod N )
D(E(m2 )m1 (mod N ))
D(E(m)k (mod N )) = (km)vp vq (mod N )

4.3 Security and Key-sizes


The security of the system relies on the following assumption The DGK-system
is secure under the following assumption as presented in [25]:

26
4.3. SECURITY AND KEY-SIZES

For any constant l and appropriate choice of t as a function of k, the tuple


(N, g, h, u, x) is computationally indistinguishable from (N, g, h, u, x), where (N, g, h, u)
are generated as the above key generation, x is uniform in G and y is uniform in
H.

This means that there is no polynomial time algorithm from recovering secret
key. The security proof is given in [25]. In addition the DGK cryptosystem is
malleable.
For an attacker to recover secret keys vp vq such that hey can compute (g vp vq )
to recover the plaintext. He first has to be factor N in p and q in order to find
vp and vq by means of factoring p − 1 and q − 1. The second factorization (i.e.
factoring p − 1 and q − 1 ) will cost less than factoring N . We recommend the
same key-sizes for N as the Paillier scheme (see Section 3.4.1 for details).

27
Chapter 5
Simultaneous Multi-Exponentiation

5.1 Introduction
At the core of different public key cryptographic systems such as Paillier, DGK,
and other cryptosystems lies a multi-exponentiation for some commutative group
G, i.e. evaluating a product of exponentiations:
Y
giei ,
1≤i≤k

where gi is an element of G, and each exponent ei ∈ Z. Examples are given


in Chapter 3 and 4. As keys become longer for security reasons, the speed of
these algorithms is crucial to the usability. Over the past four decades many
exponentiation algorithms have been presented and usually they have the focus
on single exponentiation. The cryptosystems described in previous chapters use a
product of two exponentiations, i.e. simultaneous multi-exponentiation. Certain
techniques can be applied when computing two or more exponentiations to speed
up computations.
In this chapter we extent different exponentiation algorithms to simultaneous
multi-exponentiation algorithms with the boundary that we only look at a product
of two exponentiations and show how this affects the performance.
Section 5.2 discusses the pre-computation steps that can be done in order
to speed up computation. In Section 5.3 we discuss different types of multi
exponentiation algorithms and evaluate their performance for non-fixed bases.
Finally, in Section 5.4.1 we explore fixed base exponentiation.

5.2 Pre-computation
In order to speed up computation some algorithms make use of an auxiliary
table. An auxiliary table contains limited number of values to speed up com-

29
5.3. SIMULTANEOUS MULTI EXPONENTIATION FOR NON-FIXED
BASES

putation of an algorithm. This computation is done beforehand and called the


pre-computation stage. The actual computation of the exponentiation using such
an auxiliary table is called the evaluation stage. In addition we assume that the
auxiliary table holds a number of entries parameterized by some value.
In the computation stage of the simultaneous exponentiation algorithms we
denote b as the bit-length of the longest exponent. Let parameter w be the
window size which is the number of bits that are evaluated simultaneously. Let
it be noted that large window sizes make the pre-computation stage less efficient
while speeding up the evaluation stage, finding a balance in the exponent bit
length, computation of the auxiliary table, and the window size is a challenging
task and it not possible to give a general rule for selecting an optimal w.
In the remainder of this thesis we make a distinction between general multi-
plications and squarings, since squaring can be computed more efficiently. For
each algorithm the number of expected multiplication and squaring are given for
parameters b and w. In addition it is assumed that the window size w is much
smaller than length the b, otherwise the bottle neck and make it infeasible for
cryptograph systems
In this chapter we also make a distinction between fixed-base and non-fixed
base exponentiation algorithms. Fixed bases exponentiation has an advantage
that the base is fixed for every encryption, and thus computing the auxiliary
table has to be generated once and so we allow it to take more computational
time. Non-fixed bases exponentiation has does not have this advantage, and has
to compute a new auxiliary table for every exponentiation.

5.3 Simultaneous multi exponentiation for non-


fixed bases
5.3.1 Binary Left to Right-Method
The binary left-to-right method (ltr) appeared about 200 BC in Pingala’s Hindu
book Chandah-sutra. The ltr method works because it is representing the expo-
nent in its binary form:

e = eb−1 2b−1 + eb−2 2b−2 + . . . + e1 2 + e0


Thus, the exponentiation of g e can be written as:
b−1 b−2 b−3
g e = ((· · · ((g eb−1 2 )g eb−2 2 )g eb−3 2 ) · · · )g e1 )g e0 )

Thus, the binary representation e is evaluated from left-to-right starting at the


most significant bit (MSB), this method is also known as the square and multiply
algorithm.

30
5.3. SIMULTANEOUS MULTI EXPONENTIATION FOR NON-FIXED
BASES

If we compute the simultaneous multi exponentiation g1e1 g2e2 in naive way, i.e.
computing g1e2 and g2e2 separately and the multiply the result we end up doing
2(b − 1) squarings and at most 2(b − 1) multiplications. Rewriting the equation
allows us the save b − 1 Squaring we comparing the naive way.

e 2b−1 1,e1,b−2 2b−2 e e b−1 b−2 e e


g1e1 g2e2 = (· · · ((g11,b−1 )g1 ) · · · )g11,1 )g11,0 )(· · · ((g e2,b−1 2 )g e2,b−2 2 ) · · · )g11,1 )g11,0 )
e e b−1 e b−2 e e 1 e e 0
= (· · · ((g11,b−1 g22,b−1 )2 g11,b−2 g e2,b−2 )2 · · · g11,1 g22,1 )2 g11,0 g22,0 )2

The implementation of this algorithm scans the exponent e from starting from
the LSB, looking at the most significant first down to the least significant bit last.
Depending on where a bit is a 0 or a 1, it squares the intermediate product or
its squares the intermediate product and multiplies it by its base, as seen in
Algorithm 1. It is also possible to scan the bits from right-to-left, this requires
additional storage , see [77] for details. In tables and graphs we denote this system
as ”ltr”.

Algorithm 1 Left-to-right binary simultaneous exponentiation


Input: g1 , g2 ,e1 = (e1,1 , e1,2 , . . . , e1,b−1 )2 , e2 = (e2,1 , e2,2 , . . . , e2,b−1 )2 and n
Output: g1e1 · g2e2 (mod n)
g1 g2 ← g1 · g2 A ← 1
for i from b − 1 down to 0
A ← A2 (mod n)
if e1,i == 1 and e2,i == 1
A ← A · g1 g2 (mod n)
else if e1,i == 1 and e2,i == 0
A ← A · g1 (mod n)
else if e1,i == 0 and e2,i == 1
A ← A · g2 (mod n)
return A

x = 232 1 1 1 0 1 0 0 0
y = 98 0 1 1 0 0 0 1 0
3 7 3 14 6 29 12 58 24 116 49
A a a ·b a ·b a ·b a ·b a ·b a ·b a232 · b98
result 12 270 100 1080 205 770 235 590

Table 5.1: Example 12232 · 35127 (mod 1115) by means of the left-to-right algo-
rithm, which requires 7 squarings and 6 multiplications

31
5.3. SIMULTANEOUS MULTI EXPONENTIATION FOR NON-FIXED
BASES

Computation Efficiency
The simultaneous binary left-to-right exponentiation has a small pre-computation
stage which computes g1 · g2 , i.e 1 multiplication. Such that it if e1,i and e2,i are
both 1 only one multiplication is needed. Let exponents e1 and e2 be uniformly
chosen and let e1 and e2 have both b-bits. Then, the evaluation stage can be
computed in b−1 squarings and 43 b multiplications. In total the algorithm runtime
is on average 1 + 34 b multiplications and b − 1 squarings.

5.3.2 The 2k -ary Methods


The 2k -ary exponentiation algorithm computes the ei -th power by writing the
exponent on a larger base 2k . This larger base lets the 2k -ary method compute
several bits at a time and can be seen as spanning a fixed size window over bits of
the exponent. Because the window evaluates multiple bits at the same time some
pre-commutation is required. The pre-computation requires an auxiliary table
k
which consists of powers of 1, g, g 2 , . . . , g 2 −1 . The general idea of this method
was first introduced by Brauer [17] and later more generalized by Strauss [76]
to suit multi exponentiation. The 2k -ary algorithm can be modified to compute
simultaneous products. The Paillier encryption function uses product of two
powers, this requires two auxiliary tables for both g1 and g2 which contains power
k
of gi , i.e. 1, gi , gi2 , . . . , gi2 −1 . In tables and graphs we denote this algorithm as
”2kary”.

Algorithm 2 2k -ary Method


Input: auxa and auxb with w = 2k − 1
Output: g1e1 · g2e2 (mod n)
for j = d(b − 1)/wew down to 0 do
k
A ← A2
A ← A·auxg1 [e1j+1 , e1j+2 , . . . e1j+w−1 ]·auxg2 [e2j+1 , e2j+2 , . . . e2j+w−1 ]

Computational Efficiency
By representing the exponent in a larger bases of w-bits, the number of multi-
plications are reduced to twice per iteration, but has the requirement that an
auxiliary table is constructed which hold 2k entries. The computational costs for
building this table are 2k − 4 as the first two entries from each table are fixed (i.e.
1 and g ). The evaluation costs are b squarings and at most 2d kb e multiplications.

5.3.3 2k -ary matrix exponentiation


The 2k -ary matrix method is basically the same as the 2k -ary except it has its
auxiliary table consisting of 22k entries and can be viewed as a square. The entries

32
5.3. SIMULTANEOUS MULTI EXPONENTIATION FOR NON-FIXED
BASES

x = 232 1 1 1 0 1 0 0 0
y = 98 0 1 1 0 0 0 1 0
3 3 3
A a3 · b (a3 · b)2 · a5 · b4 ((a3 · b)2 · a5 · b4 )2 · 1 · b2
A a3 · b a24 · b8 a232 · b98
result 270 205 590

Figure 5.1: Example 2k -ary algorithm, which requires 4 multiplications and 6


squarings

are built as follows: auxiliary table aux[i][j] = g1i g2j with 0 ≤ i, j < 2k . To find
the entry in the auxiliary table both k-bits are used from each exponent to locate
the entry. Doing this saves a multiplication for every iteration in the main loop of
the evaluation stage. In graphs and tables we denote the algorithm as ”2karym”.
k
1 g1 g12 g13 ... g12 −1
k
g2 g1 g2 g12 g2 g13 g2 ... g12 −1 g2
k
g22 g1 g22 g12 g22 g13 g22 ... g12 −1 g22
k
g23 g1 g23 g12 g23 g13 g23 ... g12 −1 g23
.. .. .. .. .. ..
. . . . . .
k −1 k −1 k −1 k −1 k −1 k −1
g22 g1 g22 g12 g22 g13 g22 . . . g12 g22

Algorithm 3 2k -ary Matrix algorithm


Input: auxiliary table aux, n, b and k
Output: g1e1 · g2e2 (mod n)
for j = d(b − 1)/ke down to 0 do
k
A ← A2
A ← A· aux[e1,j e1,j+1 . . . e1,j+k−1 ][e2,j e2,j+1 . . . e2,j+k−1 ]

Computational Efficiency
The cost of computation for the auxiliary table is (22k ) − 3 multiplications. Three
multiplications are saved as 1, g1 and g2 are already known. The Evaluation stage
takes b squarings and at most d kb e multiplications. This saves d wb e multiplication
in the evaluation stage with respect to simultaneous 2k -ary simultaneous multi-
exponentiation method.

5.3.4 Simultaneous Sliding Window method


The sliding window exponentiation method of Yen, Laih, and Lenstra [80] is an
improvement of the 2k -ary method. The method consists of slicing the binary

33
5.3. SIMULTANEOUS MULTI EXPONENTIATION FOR NON-FIXED
BASES

representation of e into pieces using a windows of length w-bits and processing


the part one by one. The strength of the algorithm comes from the fact that it
only processes odd numbers. Thus, if a number in the window is even it ”slides”
such that it becomes odd and skipping consecutive zeros in e.
Example, let e be 3009 = (101111000001)2 and the window width be 3 then
101 111 00000 1 .
The simultaneous sliding window exponentiation has two auxiliary tables with
odd entries. This safes not only on the storage requirement but also cpu cycles in
pre-computation stage. In tables and graphs we denote this algorithm as ”ssw”.

x = 232 1 1 1 0 1
y = 98 0 1 1 0 0
7 3 7 3 2 7 3 2 2
A a · b (a · b ) ((a · b ) ) · a
result 12 270 100 1080 205
0 0 0
0 1 0
(((a · b ) ) · a) ((((a · b ) ) · a) ) · b (((((a · b ) ) · a) ) · b)2
7 3 2 2 2 7 3 2 2 2 2 7 3 2 2 2 2

770 235 590

Table 5.2: Example by means of the Simultaneous Sliding Window method with
window width w = 3

Computational Efficiency
The simultaneous sliding window exponentiation algorithm is potentially very
fast. It is very difficult to convey an accurate idea about the complexity of this
algorithm. Worst case the algorithm performs just as well as the simultaneous
2k -ary exponentiation method. The computation of each auxiliary table requires
2w−1 − 1 multiplications and 2 squarings, as we can first compute g 2 ; then itera-
tively compute g 3 = g · g 2 , · · · , g r = g r−2 · g 2 and will require storage for 2w−1 + 1
entries for each base.

5.3.5 Simultaneous Sliding Window Matrix method


The simultaneous sliding window matrix method is basically a combination of the
simultaneous sliding window method and the simultaneous 2k -ary matrix method.
The pre-computation is a special matrix, where not every value is computed, as
consecutive zero’s in both exponents are skipped. In addition we can save on
computation of the auxiliary table as we don’t have to evaluate windows of even
numbers, i.e. we are adjusting our window size such that only odd-odd or odd-
even products are evaluated. In figure 5.1 shows the auxiliary table for a window
size of w = 3. The reason for computing the even squares also (e.g. g14 ) is that
they are needed in multiplying with their odd counterpart (e.g. g14 g1 ). If we

34
5.3. SIMULTANEOUS MULTI EXPONENTIATION FOR NON-FIXED
BASES

1 g1 g12 g13 g14 g15 g16 g17


g2 g1 g2 g12 g2 g13 g2 g14 g2 g15 g2 g16 g2 g17 g2
g22 g1 g22 g13 g22 g15 g22 g17 g22
g23 g1 g23 g12 g23 g13 g23 g14 g23 g15 g23 g16 g23 g17 g23
(5.1)
g24 g1 g24 g13 g24 g15 g24 g17 g24
g25 g1 g25 g12 g25 g13 g25 g14 g25 g15 g25 g16 g25 g17 g25
g26 g1 g26 g13 g26 g15 g26 g17 g26
g27 g1 g27 g1 g2 g13 g27
2 7
g1 g2 g15 g27
4 7
g1 g2 g17 g27
6 7

Figure 5.2: Auxiliary table for simultaneous sliding window matrix exponentia-
tion with w = 3

detect only zeros for one exponent in the evaluation stage then it can be used
as a normal 2k -ary window. In tables and graphs we denote this algorithm as
”sswm”.

Evaluation Stage
The evaluation of two exponents e1 and e2 and their corresponding bases g1 and
g2 are evaluated in a left-to-right manner and let the window width be w. The
loop distinguishes four different possibilities of values in our two windows.
1. Both values in the windows are odd. We first compute the 2w -th power of
the intermediate by the size of the window. Then, the intermediate product
is multiplied by the lookup of the auxiliary tables with two values that are
in our window.
2. One of the values in the window is even. We apply the same technique if
both values are odd.
3. Both values are even and at least one exponent is non zero, the window
size is adjusted by l-bits such that at least one of the windows becomes
odd. The intermediate result is raised by power 2w−l and multiplied by the
lookup of the table of both values of the smaller window.
4. Both windows are zero, then the window expands by r − 1 bits, where r is
the smallest number that at least one of the exponent bits becomes 1. Then
the intermediate result is raised to power of 2w+r−1 .
Example: e1 = 353 = (101100001)2 and e2 = 385 = (110000001)2 and take
window size k = 3. Now going form left to right
101 100001
110 000001

35
5.3. SIMULTANEOUS MULTI EXPONENTIATION FOR NON-FIXED
BASES

we get g15 g26 as 5 is odd we apply rule 2 and do a lookup in our auxiliary table.
In the next window:

e1 =101 100 001


e2 =110 000 001

we get (100)2 = 4 and (000)2 = 0, because they are both even rule 3 applied
and get (g15 g26 )2 · g1 ) = g111 g212 . The next windows consists both zeros and gets
expanded by rule 4 by 2-bits and the intermediate result is raised by 3 + 2 − 1,

1011 0000 1
1100 0000 1
4
(g111 g212 )2 = g1176 g2192 . Finally the window size can only be one, resulting in
(g1176 g2192 )2 · g1 g2 = g1353 g1385 .

Computational Efficiency
w w
The generation of the auxiliary tables it takes ( 2 2−2 + 2w ) 2 2+2 − 3 multiplica-
w w
tions and ( 2 2−2 + 2w ) 2 2+2 entries. For this algorithm hard to it is pinpoint the
exact operations per bit, but in worst the case it will perform as good as the
simultaneous 2k -ary matrix algorithm.

5.3.6 Unsigned Fractional Windows


The unsigned fraction window exponentiation was first introduced by Möller in
[54]. The algorithm is a derivate from the sliding window algorithm, it is gener-
ally faster and more usable for devices with limit storage. The sliding window
algorithm uses an axillary table which holds (2w−1 + 1) entries while the devices
has space for some additional entries, for example a sliding window exponentia-
tion with a window width of w = 3 has 5 entries while the device has may have
space for 7. The unsigned fraction exponentiation method is a solution for this
short coming.
The unsigned fractional windows algorithm maps an integers e within a win-
dow of width w + 1: {1, 2, . . . , 2w+1 } → {1, 3, . . . , 2w + r}, with r being odd, by
using the following rules:

• if e is even return 0

• otherwise if 0 ≤ e ≤ 2w + r, return e

• otherwise return e − 2w

36
5.4. FIXED-BASE EXPONENTIATION

This results in a flexible way to use storage. In order to create a simultaneous


exponentiation variant of the unsigned factional window method we use the same
we create a variant with two separate auxiliary tables and a variant using one
auxiliary matrix.
a two separate auxiliary variant and a matrix variant. The w
two wauxiliary
variant holds 2w + 2r-entries while the matrix variant holds (2 +m−1)((2 2
+r+3))
+
w
2 +r+3
w
(2 + r) 2 entries.
The computation for the two auxiliary variant is basically the same as for the
simultaneous sliding window and the computation of matrix variant is the same
as for the simultaneous sliding window matrix exponentiation method.
In graphs and tables we denote two auxiliary variant of unsigned fraction
window algorithm as ”ufrac” and the matrix variant as ”ufracm”

Computational Efficiency
The storage requirement for the axillary table is 2w+1 + 2 + 2r while the computa-
tion needed is to generate this table is 2w+1 − 2 + 2r and one squaring. Because it
is a variant on the sliding window exponentiation algorithm performance figures
are hard to pinpoint but the worst case should be as hard as 2k -ary algorithm.

5.4 Fixed-base exponentiation


In the previous section we looked at algorithms where the performance of the
auxiliary table has a huge impact on the performance of the whole algorithm.
But if a base is ”fixed”, e.g. the algorithm is using the same basis for multiple
encryptions, we can store the table on the computer. After the initial penalty,
performance becomes better.

5.4.1 Pre-computation of squares of g


If we allow pre-computation of larger tables we can speedup the evaluation pro-
cesses. Pre-computing the hole message space, e.g. g i for 0 < i ≤ 2M , when
it is large we consider it infeasible, except for small message space such as 216 .
Computing squares of g on the other hand is a possibility. The exponent can
be represented as sum of powers of 2. Let l be the size of message space M
and let (el−1 , . .P . , e1 , e0 ) be the binary representation of an integer e, then e can
be written as l−1 i=0 ei 2
i
with ei ∈ {0, 1}. Thus, it is possible to represent an
e l−1 ei 2i i
with ei ∈ {0, 1}. When storing values of g 2
Q
exponentiation g as i=0 g
for ≤ i ≤ l then it is possible to compute g e by at most l − 1 multiplications.
Example, let exponent e = 232 = (11101000)2 then g e can be computed as
g 128·1 · g 64·1 · g 32·1 · g 16·0 · g 8·1 · g 4·0 · g 2·0 · g 1·0 = g 232 . This takes only 3 multipli-
cations. When comparing this method against previously described methods it

37
5.4. FIXED-BASE EXPONENTIATION

has the main advantage that multiplications can be done independently, and so
can better exploit the multi core CPU’s which are present in today’s mainstream
computers. In addition CPU level 2 caches are growing also rapidly to 4, 6 and
sometimes 8 MB. So, using large message spaces, e.g. 4096-bits the auxiliary
table will easily fit in todays level 2 CPU caches. Of course computing large
tables will initially take up quite some computational time. This time can only
be won back if the base is used several times for computation. More detail will
be presented in the implementation of Chapter 6. In graphs and tables we denote
the squares of g algorithm as ”sg”.

Computational Efficiency
The auxiliary tables hold 2l entries and it takes 2l − 4 squares to generate both
tables. The expected computation is l multiplications.

5.4.2 Fixed Base Comb method


The comb exponentiation algorithm is in fact a special case of Pippenger’s algo-
rithm [13] and is often referred to as Lim-Lee exponentiation method [47] which
in turn is based on the BGMW [18] method. The idea behind the Comb method
is to divide the r-bit exponent e in to h blocks with ei of size a = d nh e. Then the
ei blocks are divided in v smaller blocks ei,j of size r = d av e. This puts the binary
representation of exponent e in a 2 dimensional array, such that it can be written
as:
h−1 X
X v−1
e= ( ei,j 2jr )2ia
i=0 j=0

Using the representation we can also represent g e this way. Let g0 = g and
2a ia
gi = gi−1 = g 2 for 0 < i < h then :

h−1 v−1 h−1


Y ei,j
jr
Y Y
e ei
g = g = gi2
i=0 j=0 i=0

Now let ei = ei,a−1 · · · ei,1 ei,0 then the binary representation of ei (0 ≤ i < h),
then ei,j is represented in binary as

ei,j = (ei,jr+r−1 , . . . , ei,jr+k , . . . , ei,jr+1 ei,jr )2

So we can write:
v−1 h−1
r−1 Y 2k
2jr ei,jr+k
Y Y
ge = gi
k=0 j=0 i=0

2jr ei,jr+k
Storing values of gi in an auxiliary table results in a drastic speedup.

38
5.4. FIXED-BASE EXPONENTIATION

Auxiliary Table

The auxiliary table is a two dimensional array of size (a×b)×h and is constructed
in the following way :

e e
h−1
G[0][i] = gh−1 h−2
gh−2 · · · g1e1 g0e0
r jr
G[j][i] = (G[j − 1][i])2 = (G[0][i])2

Hence, index i is equal to the binary representation of value eh−1 · · · e1 e0 , and g e


can be computed by:
r−1 Y
Y v−1 2k
e
g = G[j][ij , k]
k=0 j=0

Evaluation Stage

Algorithm 4 Comb’s Exponentiation Method


Output: g e (mod n)
Input: auxiliary table G[][], exponent e
with ei,j = (ei,jr+r−1 , . . . , ei,jr+k , . . . , ei,jr+1 ei,jr )2 of length r

A←1
for k ← r − 1 down to 0
A ← A2
for j ← v − 1 down to 0
A ← A · G[j][ij , k]
return A

Modifying Comb’s algorithm in such way that it will compute simultaneous


exponentiation is replacing A · G[j][ij , k] by multiplying with an extra auxiliary
table where all values for g2 are stored in.
The interesting part with this algorithm is that it can compute intermediate
results independently.

b−1
k
Y
ci = (G1 [j][Ij,k ] · G2 [j][Ij , k])2
k=0

Finally, all the ci are multiplied to one result. This way the algorithm can
exploit thread level parallelism. In tables and figures we denote Comb’s exponen-
tiation algorithm as ”cmb”.

39
5.4. FIXED-BASE EXPONENTIATION

Computation Efficiency
The pre-computation required for a product of two exponentiations with same
size exponents is 2(a + b − 2) multiplications and b − 1 squares, the space re-
quired for storing the auxiliary table is 2v(2k − 1). The computation requires vb
multiplications and v squares.

Method elm. storage #(avg. comp.)


ltr 1 (b − 1)S + ( 43 t)M

2kary 2k+1 (b − 1)S + 2(d kb e)M

2karym (2k )2 (b − 1)S + (d kb e)M

ssw (2w + 2) (b − 1)S + ≤ 2(d wb e)M


w −2 w +2
sswm (2 2
+ 2w ) 2 2
(b − 1)S + ≤ (d wb e)M

ufrac 2w + 2r (b − 1)S + ≤ 2(d wb e)M

(2w +r−1)((2w +r+3)) w +r+3


ufracm 2
+ (2w + r) 2 2
(b − 1)S, ≤ (d wb e)M

sg M 0.5b M

cmb v(2h − 1) a + r − 2M+vS

Table 5.3: Comparison of different exponentiation methods for computing g1e1 g2e2 .
M denotes the cost of a multiplication, S denotes the cost of a squaring.

40
Chapter 6
Speedup, Results and Implementation

This section describes the implementation of the algorithms described in the pre-
vious chapter. In addition we benchmark the encryption and decryption functions
of the Paillier cryptosystem. The systems are tested with different parameters
against different key sizes. Section 6.4 discusses the problem of timing attacks
against these implementations, and show how modifications can avoid these prob-
lems. Finally, we give an overview of the programmer application interface (API)
for the library.

6.1 Implementation

Prime Generations:
The Paillier cryptosystem as well as the DGK cryptosystem heavily rely on the
special properties of primes. Generating large prime numbers can be quite a
challenge the most widely deployed algorithm for prime verification is the Miller-
Rabin primality test [66]. The Miller-Rabin algorithm only gives a probability
that a number is prime, this can have serious drawbacks. If Miller-Rabin gives
high probability that a composite number is prime, the cryptosystems may com-
pletely fail to work or it may be seriously weakened in a way that makes recovering
the secret key easy. Even if the possibility of such a failure is extremely small, it
is still present.
So, to construct proven primes our implementation use Maurer’s prime gen-
eration algorithm [51] which uses the Pocklington primality test [64] to generate
primes. The Pocklington primality test gives a proof that a number is really a
prime in contrast to the Miller-Rabin algorithm which only gives probability of
1 − (1/4)q with q being the number of ”witnesses” or bases tested.

41
6.1. IMPLEMENTATION

Algorithm 512 mean 512 std. dev. 1024 mean 1024 std. dev.
Miller Rabin 5 325645058 2847486 553169659 69327463
Miller Rabin 10 175999995 17857889 946079386 54715946
Maurer 1770051847 756462737 1147004185 919299253
GDSA 14 89163917 28846268 178923001 41647527
GDSA 12 62921398 18585666 105310173 95310173

Table 6.1: Prime algorithms(1)

Algorithm 1536 mean 1536 std.dev. 2048 mean 2048 std.dev.


Miller Rabin 5 1459935560 76291533 1663120660 129714853
Miller Rabin 10 1262155120 81251638 1891359086 96596122
Maurer 23421423523 920463420 64381479468 3893758103
GDSA 14 498564939 365382853 1041051625 445671296
GDSA 12 457732074 221610033 990509089 781926682

Table 6.2: Prime algorithms(2)

To generate primes with special properties we use the GDSA [44] algorithm.
The algorithm takes a prime p0 as input, and generates a prime p such that
p = zp0 + 1. These primes are needed for both the subgroup variant of Paillier
and DGK cryptosystems. The GDSA Algorithm uses Miller-Rabin primality test
to check if (zp0 + 1) is prime. Of course its possible for the GDSA Algorithm to
use the Pocklington primality test.

Tables 6.1 and 6.2 present the timings of: the Miller-Rabin algorithm provided
by GMP’s mpz probab prime p with 5 and 10 witnesses, our implementation of
Maurer algorithm and implementation of GDSA for generating primes of sizes
512, 1024, 1536, and 2048 -bits. The GDSA algorithm takes a prime as input
with size of 14 or 12 of the target size prime.
The timings were taken on a 1.6 GHz AMD Athlon Neo X2 Dual Core Pro-
cessor L335 with 256 KB L2 cache per core and 2GB. The system is running
Ubuntu 10.04 x86 64 with kernel 2.6.32-22-generic. The timings were taken over
1000 times.
In our key-generation process we generate prime numbers with the Maurer
prime generation algorithm, in cases where we need a GDSA prime p we first
generate a Maurer prime p0 which is fed to the GDSA algorithm. The consequence
of using Maurer’s algorithm that it takes significantly longer to generate primes
than when using the Miller-Rabin primality test as seen in Tables 6.1 and 6.2 .
It can be clearly seen that the Miller-Rabin algorithm with 5 and 10 witnesses

42
6.2. BENCHMARKING

takes only a fraction of the CPU-cycles when compared to the Maurer prime
generation algorithm.

6.2 Benchmarking
6.2.1 Encryption
In this section the implementation of the of algorithms in chapter 5 are bench-
marked. As it is infeasible to benchmark every possible combination of key
length of N and possible for exponentiation parameter the chosen key lengths
are: 1024, 2048, 3072, and 4096. These lengths are representative and feasible
for the interval of interesting key lengths, with the exception of 1024 which is
included for comparison, and should not be used in a production setting, due to
security reasons discussed in Sections 3.4.1 and 4.3.
For the algorithms in Section 5.3.2 till 5.3.6 we use parameter w for the fol-
lowing values 2, . . . , 8 as it is obvious that w = 1 is just binary left-to-right
exponentiation method. Furthermore, for the squares of g method we calculate
the full message space, and for the comb method we use parameters {2, 1}, {4, 2}
and {8, 4}.
For the encryption of a message m we distinguish tree different scenarios.

1. The bases g1 and g2 are not fixed. This means these values are only used
once and the cost of generating the auxiliary tables are added to the total
cost of exponentiation. This is used if keys are used only once.

2. The base values g1 and g2 are fixed, i.e. used multiple times. This allows
us to pre-compute auxiliary tables once and can use them multiple times to
encrypt different messages. The cost of the auxiliary tables are not added
to the total cost of computation.

3. One of the two bases if fixed the other one is not fixed. Such behaviour we
see in Paillier’s main encryption scheme. The cost of generating the auxil-
iary table of the non-fixed base is added to total time of the computation
while the cost generating the auxiliary table of the fixed bases is not.

The benchmarks showed in the next sections were performed by using an AMD
Athlon Neo X2 Dual Core Processor L335 with 512 KB L2 cache 2 Gb Ram. The
system is running Ubuntu 10.04 x86 64 with kernel 2.6.32-22-generic in 64-bit
mode. For benchmarking we used cpucycles with is part of part of eBACS [10] to
measure the amount of CPU cycles used by the execution. The implementation
is written in C using the LIBGMP [37] for multi precision arithmetic.

43
6.2. BENCHMARKING

Benchmarks Non-fixed bases

For non-fixed bases we use the following exponentiation algorithm binary left-
to-right, simultaneous 2k -ary exponentiation, simultaneous sliding-window and
unsigned fractional windows (ufw) and all the matrix variants. To express the
window size (with exception of binary left-to-right algorithm) all algorithms have
a number attached to the name to mark their window size (e.g ssw4 is simulta-
neous sliding window with a window size of 4-bits). The parameters used in the
graph are as follows:
In Figure 6.1 we see the total cost of running time which includes the cost
of generation the auxiliary table. It can be clearly seen the unsigned fractional
windows matrix version has the best performance using parameters w = 3 and
m = 1 for key-length 2048 to 4096, while the best performing algorithm for key-
length 1024 is the unsigned fractional window matrix variant with parameter
w = 2, m = 1. After that we see the same algorithm with parameters w = 3
and m = 3. Followed by the simultaneous sliding window exponentiation matrix
variant with parameter w = 3 which performs second best on the key-length
4096. Clearly the matrix variants of the exponentiation algorithms have a clear
advantage over the other approaches.
It can also be clearly seen that the generation of the auxiliary table of the 2k
ary-matrix with window width 8 takes up most time. This is due to the large
number of elements that have to be generated, which are almost 29 multiplications
for parameter 8. This problem is seen for all matrix variants of the algorithms.
Then we see the comb method with parameters h = 8 and v = 4 being slowest,
this is also due to the nature of how the auxiliary table is built.

Benchmark fixed bases

In Figure 6.2 we show the different types of simultaneous multi exponentiation


algorithms, with key sizes {1024, 2048, 3072, 4096} for N . It should be noted that
the actual encryption algorithm is using N 2 . It can clearly be seen that the comb
method with parameters h = 8 and v = 4 is running fastest closely followed by
squares of g and another instance of comb running with parameters h = 4 and
h = 2. After these algorithms we see a cluster of algorithms that perform roughly
the same, these are window algorithms, as performance decreases the window
size of these algorithms descents to 2. While Figure 6.2 shows the timings of the
exponentiation algorithms for different key-sizes, its does not take the time into
account to generate the auxiliary table(s).
In appendix A we show the impact of unbalanced encryption where e1 > e2
with e2 respectively 12 and 14 size of | e1 |2 . Figure A.1 shows the encryption of e2
half the size of e1 . Comparing this with 6.2 we see despite the overall performance
of all algorithms are faster, there are no major changes. The same accounts for
Figure A.2, where e2 is one fourth of the size of e1 .

44
6.2. BENCHMARKING

cmb21
cmb42
dsw2
dsw3
dsw4
dsw5
dsw4
dsw7
kary2
kary3
kary4
kary5
kary6
kary7
kary8
karym2
karym3
karym4
karym5
karym6
229 karym7
rtl
sg1
sswm3
ufrac21
ufrac31
ufrac33
ufrac43
sg2
CPU Cycles

228

27
2

26
2

225

223
1024 2048 3072 4096
Keysize of N in bits

Figure 6.1: The total time of CPU cycles for generating auxiliary tables and
running time for different algorithms.

45
6.2. BENCHMARKING

cmb21
cmb42
cmb82
dsw2
dsw3
dsw4
dsw5
dsw4
dsw7
kary2
kary3
kary4
kary5
kary6
kary7
kary8
karym2
karym3
karym4
karym5
karym6
karym7
karym8
sswm3
sswm4
sswm5
sswm6
sswm7
ufracm31
ufracm33
ufracm41
ufracm43
227 ufracm45
ufracm51
CPU Cycles

ufracm53
ufracm55
ufracm61
ufracm63
ufracm65
ufracm67
ufracm71
ufracm73
ufracm75
ufracm77
rtl
sg1
sg2

226

225

223

1024 2048 3072 4096


Keysize of N in bits

Figure 6.2: The execution time in CPU cycles with different simultaneous algo-
rithms with different keys sizes of N , with e1 and e2 with equal length (i.e. 21 size
of N )

46
6.2. BENCHMARKING

Benchmark fixed non-fixed bases

For benchmarking the combination of fixed and non-fixed bases, we pick the
fastest algorithm from our fixed bases performance which is combs exponentia-
tion method and combine it with the fastest non-fixed base algorithm, unsigned
fractional windows. When combine both algorithms we can save v squarings of
the total computation, but gain an additional b + a − 2 multiplications. When
comparing this method against unsigned fractional windows matrix exponentia-
tion the trade off is pre-computation versus running time multiplications.
To reducing the number of multiplications for the combination of comb and
unsigned fractional windows we select larger parameters for combs method as a
function. We choose it in such a way that it will have less multiplications in total
compared to the unsigned fractional windows matrix method. ‘
In Figure 6.3 it can be seen that combination of comb exponentiation method
with parameters {8, 4} and unsigned fractional windows {3, 3} slower as apposed
to unsigned fractional window matrix method with parameters {3, 3} this is due
to the extra computation of combs method. Increasing the parameters of the
comb exponentiation method to {16, 8} or {32, 16} will reduce the number of
overall multiplications.
In addition we show computation of the naive exponentiation method, i.e.
computing two exponentiation separately and multiply the result. With one
exponentiation having a fixed base. Clearly it can be seen that it is much slower.

6.2.2 Decryption Benchmarks


The decryption process for both Paillier and DGK cryptosystem is basically one
exponentiation. In case of Paillier this might be also two independent exponen-
tiations when the CRT method is used. The decryption benchmarks are taken
from the ”regular” or single exponentiation algorithms, as the decryption uses the
ciphertext c which is unknown beforehand. This means that using the fix-base
algorithms, such as squares of g and comb fixed base algorithms are infeasible
due the high costs of building the auxiliary table.
Hence, the algorithms used for benchmarking the decryption are: right-to-
left(rtl), sliding windows(sl), unsigned fractional windows(ufw) and can be seen
in Figure 6.4.
The Figure 6.4 shows the exponentiation algorithms with a selected set of
parameters. It can be seen that the unsigned fractional window algorithm runs
best for all benchmarked key-sizes. Also the performance gain of choosing the
right parameter is very small when for when w is 2 to 5 with small value’s of m.
It should be noted that not all algorithms and parameters of the algorithms
are shown for clarity reasons, only the best performing parameters and some
algorithms for comparison.

47
6.2. BENCHMARKING

cmb3216-ufrac33
cmb84-ufrac33
cmb168-ufrac33
naive33
ufracm33

226 41
CPU Cycles

Time in ms
225 20

223 5

1024 2048 3072 4096


Keysize in bits

Figure 6.3: Execution time of fix-non-fixed base exponentiation in cpucycles for


a given key of size N

48
6.3. IMPLEMENTATION DETAILS

6.3 Implementation Details


In order to make the implementation as fast as possible we applied different
general techniques to speed up the computation. These techniques include:

• Branch avoidance: avoid jumps in code, which may cause wrong speculative
execution which results in loss of potential CPU execution cycles.

• Continuous memory: Allocate memory such that all values are next to each
other. This gives a high probability that the data is in cache compared to
when values are placed in a linked list.

• Make use of compiler optimizations

• Function call avoidance: function calls cause overhead, and potential loss of
CPU cycles. In the code we try to have all computation in a single function
and make use of macros as substitution for functions.

Additional speed is achieved due to the fact that for some exponentiation
methods we could use multiple threads, i.e. by using CPU full ability to compute.
Of course the creation of threads will introduce additional overhead, but the
computational gain is much higher, this can clearly be seen in Figure 6.2 for
function sg1 and sg2. Where sg1 is running a single thread and sg2 is running
two threads to compute a product of exponentiations.
Furthermore speed is gained by using the modulo instruction between multiply
and squaring instructions. This reduces the size of intermediate value, and cuts
cost on the next instruction.

6.4 Side Channel Attack Prevention


From the very first use of cryptography, people have tried to decrypt encrypted
messages in order to gain access to secret information. A cryptosystems can be
theoretically secure against actual cryptanalysis, in implementing a cryptosys-
tem one faces other threats than the mathematical attacks; such as side-channel
attacks(SCA). A SCA is an attack in which the adversary attempts to compro-
mise a cryptosystem by analyzing e.g. the time taken to execute a cryptographic
algorithm. Every operation takes time to execute, some more than others. Cryp-
tographic operations are build up as list of CPU instruction. An instruction such
as add and shift may run in less CPU cycles than multiply or divide. Thus, tim-
ings between cryptographic operation may differ. To prevent an adversary from
learning information by means of SCA we must implement the different opera-
tions in such a manner that the information obtained from the side-channels is
useless, i.e. It does not reveal information about the key or the message.

49
6.4. SIDE CHANNEL ATTACK PREVENTION

The field of side channel attacks is relatively new and there is no complete
theory of side channel analysis. But, there are many studies conducted that break
different cryptosystems [31, 58, 63, 12] using side-channel analysis, also many
remedies have been suggested for different cryptosystems such as [14, 20]. In this
section we present countermeasures against simple SCA. As this thesis focuses on
software implementations we take only software side channels in to account and
we discard the possibility of an invasive attack such as power or electromagnetic
analysis (EMA) attacks.

Simple SCA
Simple SCA (SSCA) is if the adversary obtains information from a single expo-
nentiation. To harden our exponentiation methods against SSCA we must make
the observable information independent of the of the secret, such as messages,
nonces, and the private keys in a way that the adversary only sees a fixed se-
quence of operations that cannot be linked to the bits of the processed secrets.
Looking back at Chapter 5 we can distinguish two key problems that reveal
SSCA information to the adversary:

• Arithmetic instructions: The time difference between two operations.

• Table lookups: The time can be different for a table to fetch from main
memory into the CPU’s cache memory can be observed.

One solution against SSCA for arithmetic instructions is to insert dummy


instructions, such that the adversary only can observe uniform operations. The
drawback of inserting dummy operations that simple operation may take as long
as complex instructions. For example, consider the code snippet in Figure 6.5,
this is the core of the binary left-to-right right exponentiation algorithm. The
intermediate result is A, the base is g and e is the exponent. It can be clearly
seen that the extra multiplication is depending on our secret e. This extra mul-
tiplication can leak SSCA information to the adversary.
To remedy this problem, the multiplication is always executed, see Figure 6.6.
Then ei is used for selection either just the squaring of the intermediate result
or the squaring and multiplication. The selection of either B or C happens in a
time invariant way. The intermediate result A is the result of a logical OR of two
multiplications. One of these multiplications results always in zero because of ei
and e¯i (logical NOT). The time of selecting is always done in fixed time and will
not leak any information to the adversary. This technique can be applied to all
the algorithms discussed in Chapter 5 with, some needed more dramatic changes
then others such as fixed windowing exponentiation methods.
To remedy SCA for lookups in auxiliary tables such that every lookup is
done uniformly requires that every entry in the auxiliary table passes through
the CPU’s arithmetic logical unit (ALU). The method for doing this is basically

50
6.4. SIDE CHANNEL ATTACK PREVENTION

the same as selecting the value depending on ei , and pseudo code can be seen in
Figure 6.7. The index i of the auxiliary table is selected by a logical AND with ei ,
if the table i does not match the ei then (i − ei ) does does not becomes zero and
logical NOT becomes zero and it multiplied by the table entry. If the i matches
ei then the result of the logical NOT becomes one, and the table entry is stored
in E. The entry E can then be used in a safe and secure way by the algorithm.

f o r ( f o r i i n b−1 down t o 0 )
{
i f ( e i == 1 )
{
A = SQR(A)
A = MULT(A, g )
} else {
A = SQR(A)
}
}

Figure 6.5: The core of the binary left-to-right exponentiation method with is
prone to SSCA.

for ( for i i n b−1 down t o 0 )


{
B = SQR(A)
C = MULT(B, g )
A = ( ! e i ∗ B) ˆ ( e i ∗ C)
}

Figure 6.6: Timing invariant version of the binary left-to-right exponentiation


method, which performs exponentiation in constant time.

For ( i =0; i < l ; i ++)


E = E OR ( ( ! ( i − e i ) ∗ TABLE[ i ] )

Figure 6.7: Algorithm for timing invariant table lookups

In Figure 6.8 we show the impact of both the dummy operations, and the
table lookup for binary left-to-right, 2k -ary (matrix) and the comb method. The
exponentiation algorithms with protection against SSCA are denoted with asc
(anti side channel), example: cmb42asc is combs exponentiation algorithm with

51
6.5. API DESCRIPTION

parameters 4.2. We have chosen to display only the results of the fixed base
exponentiation as the generation the auxiliary table as it is most clear this way.
It can be seen that these countermeasures have an impact on the performance.
The comb exponentiation method is, while still being the fastest method, almost
half the time extra at key length of 3072-bits.
In addition the figure shows a ”v”-shape for the 2k -ary matrix method with
the turning point being k between 4 and 5. This is due to the large number of
entries in the auxiliary table.

6.5 API description


In this section we describe the application programmer interface (API) for our im-
plementation. The implementation provides numerous functions, which include
low level functions such as prime generation and exponentiation algorithms, as
well as high level functions for simple integration of Paillier and DGK cryptosys-
tems. In this section we only highlight the high level functions, readers who are
interested in low level function are advised to download the source code which
will soon published on [48].
The API of Paillier and DGK cryptosystems are basically the same, and so in
the API we define all methods relating to the Paillier start with ”paillier ” and
all methods relating to the DGK system start with ”dgk ”. After the cryptosys-
tem prefix all methods have their type of implementation. For the Paillier scheme
this is ”main ” or ”sub ” and for DGK this is ”sms ” for the small message space
variant and ”prp ” for the proper decryption variant.
After identifying the cryptosystem and the variant, the key size is concate-
nated to the function string. The following key-sizes are predefined 1024, 2048,
3072, 4096. Although the API provides a predefined 1024-bit function, we do
not recommend it for implementation in other systems, but it should only be used
for comparison reasons.
After the keysize we define the following postfix functions:

genkey() This function generates all parameters necessary, such as primes and special
primes and returns a public and private key pair that are needed for the
cryptosystems.

keyinit() This function takes a public key as parameter, depending on which variant
of the cryptosystems is used the, it will build an auxiliary table for all
fixed-bases using appropriate parameters for the key size.

encrypt() This function takes two parameters m and a public key. The function
generates a random number r which will be used to encrypt m using simul-
taneous multi exponentiation with predefined parameters for the encryption
depending on the size of the key and on whether there is a non-fixed base.

52
6.6. COMPARISON

In addition this method used countermeasures against SSCA. This method


will return the ciphertext c of encrypting of m.

encrypt unsafe() This function is basically the same as encrypt(), except this function does
not provide counter measures against SSCA and will be faster compared to
encrypt().

decrypt() The decrypt function takes two parameters ciphertext c and a private key,
and will decrypt c returning message m. Depending on the key size and
variant of the cryptosystem the function has set the optimal parameters for
decryption. In addition the function has countermeasures against SSCA.

decrypt unsafe() This function provides functionality as decrypt() with the exception of
having no countermeasures against SSCA.

Example: If we want to encrypt a message m under public key k using Paillier


subgroup cryptosystem with 3072-bit we use the function paillier sub 3072 encrypt(m,
k). An other example would be if we want to use the DGK system as fast as pos-
sible without caring about SSCA using a 2048-bits key we would use the following
function dgk sms 2048 encrypt unsave(m,k).

6.6 Comparison
When comparing our implementation of simultaneous multi-exponentiation, e.g.
g1e1 g2e2 (mod N ), to other implementations such as Horn’s work [39] we see quite
some differences. In [39] we see two basic approaches 1) computing the two
exponentiations g1e1 and g2e2 in the naive way, i.e. computing both exponentiations
separately an then multiply the results and 2) pre-compute squares of g, also
known as aggressive caching, seen Section performs quite well for fixed base.
But the strategy fails horribly when computing fixed-non-fixed exponentiations,
as pre-computation for the non-fixed has to happen every time. Combing the
two approaches to compute fixed-non-fixed bases is an option. But as they are
computed individually will be slower as our implementation where we combine
the comb method and ufrac-method in such a way that that we save squarings,
as seen in Section 6.2.1. For comparison reasons we have included the naive
exponentiation of unsigned fractional windows in Figure 6.3
When comparing our Paillier implementation to the Paillier implementation
of Virtual Ideal Functionality Framework (VIFF) software package for secure
multi-party computation version 0.7.1 (viff-51167e387cc3), in which the Paillier
cryptosystem is implemented in Python language using the gmpy extension for
GMP library we see a huge difference in performance compared to our work. See
Table 6.3.

53
6.6. COMPARISON

Keysize in bits VIFF version 0.7.1 This work factor


1024 0.275 Sec 0.0008 Sec 356×
2048 1.75 Sec 0.0047 Sec 372×
3072 5.46 Sec 0.013 Sec 420×
4096 12.5 Sec 0.027 Sec 463×

Table 6.3: Comparison between VIFF Paillier implementation and our Paillier
implementation

ltr
kary2
kary3
kary4
kary5
kary6
kary7
229 ssw3 335
ssw4
ufrac21
ufrac31
ufrac33
ufrac35
ufrac41
ufrac43
ufrac45
CPU Cycles

Time in ms
228 167

27
2 83

226 41

225 20

223 5
10
2
1024 2048 3072 4096
Keysize in bits

Figure 6.4: Execution time of single non-fixed base exponentiation in milliseconds


and cpucycles as function of the keysize.

54
6.6. COMPARISON

karm2asc
karm3asc
karm4asc
karm5asc
karm6asc
karm7asc
karm8asc
cmb42asc
cmb84asc
kary7asc
kary8asc
karm2
229 karm3
karm7
karm8
cmb42
cmb84
kary7
kary8
CPU Cycles

28
2

27
2

26
2

225

23
2
1024 2048 3072 4096
Keysize of N in bits

Figure 6.8: Execution time of different exponentiation methods with and without
protection against SSCA.

55
Chapter 7
Secure Multi-Party Computation

7.1 Introduction
Secure multi-party computation is a cryptographic technique allowing n parties
to jointly compute the result of a function f (x1 , x2 , ..., xn ) while ensuring that
the input xi of each party Pi is kept private, even with a number t of the parties
cheating. The only information that is allowed to be revealed is the result of the
function.
A classic example of such an application is the millionaire’s problem [78]:
A group of n millionaires wish to figure out who is the richest, but no single
millionaire wants to disclose the magnitude of his fortune in fear of humiliation.
Finding out who is the richest would normally not be possible without a trusted
party, but using secure multi-party computation the millionaires can safely com-
pute the result without a trusted party. This is a toy problem, but the technique
enables several important applications such as e-voting, secure auctions, secure
online gaming, and secure data mining.
In the 80s it was proved that secure multi-party computation could in fact be
applied to any computable function, making it an extremely general and useful
technique, at least in theory. This was first done by Yao [78] in the restricted
case of two parties, but soon followed by similar results for the general case of n
parties [5, 21]. These results were, however, mostly of theoretical interest due to
the complexity of the protocols.
Since then a large body of results have been obtained using different security-
and adversary- models, underlying network assumptions, and improvements of
previously known results.
In the recent years, the theory has advanced enough to allow practical im-
plementations of secure multi-party computation. Examples of practical systems
which support evaluation of general multi-party computation are the FairPlay
[49], VIFF [1], ShareMind [15], and SIMAP [16] systems. However, many
applications are still infeasible in practice, especially those that rely on quick

57
7.2. VERIFIABLE SECRET SHARING

response times like online auctions. Also, in order to be practical, the aforemen-
tioned systems tend to either be restricted to a limited number of parties or to
loosen up the security model. Some examples of the latter could be assuming
that the corrupted parties do not deviate from the protocol (the passive security
model) or that at most a certain threshold t of parties gets corrupted (threshold
security model).
This chapter explains in detail the Secure Multiparty Computation protocol
presented by Orlandi. In addition we show how to modify the protocol in order
to gain computational speedup. Also, we describe our implementation for the
Orlandi protocol. Finally we discus the benchmarks of these modifications. This
part is joint work with Thomas P. Jakobsen and Janus Dam Nielsen and has
appeared at Applied Cryptography and Network Security 2010, Beijing, China.

7.2 Verifiable Secret Sharing


The notion of VSS was first introduced by Benny Chor et. al. [22] and we say
that a shared secret is verifiable if auxiliary information is included that allows
players to verify their shares as consistent. More formally, VSS ensures that even
if a player is malicious there is a well-defined secret that other players can later
reconstruct. At the basis of VSS there are two phases, a sharing phase and a
reconstruction phase.

Sharing phase In this phase each player generates some special values and a some sort of
commitment for the values and distributes them over all players.

Reconstruction phase After receiving modified shares form co-players, the player can check with
a certain probability if the share is mangled with and discard if necessary.

7.3 Orlandi Protocol


The Orlandi protocol is a VSS scheme, which is secure against a dishonest ma-
jority, augmented with a protocol for generating random shared multiplicative
triples, based on a homomorphic cryptosystem.
The Orlandi protocol is part of the Virtual Ideal Functionality Framework
(VIFF) and VIFF describes itself as follows:

VIFF allows you to do secure multi-party computations, in which a number of


parties (three or more at the moment) execute a cryptographic protocol to do
some joint computation. The computation could be anything, but elections and
auctions are good examples of what you would want to do with secure multi-party
computations (SMPC or simply MPC if it is implied that the protocol is secure).

58
7.3. ORLANDI PROTOCOL

This Framework is part of a European FP7 project called Computer Aided


Cryptography Engineering (CACE). This section presents in detail the Orlandi
Secure Multiparty Computation scheme. In addition we discuss the modified
Orlandi protocol to speed up computation. The Orlandi protocol is intergrated
into VIFF, which makes use of the homomorphic schemes described in Chapters
3 and 4.
We will introduce the protocol in this section by giving a high-level description
of the protocol and a detailed account of the parts of the protocol which have
been the target of our optimizations. Secret x ∈ Zp is shared in the Orlandi
protocol using additive secret sharing. Every party of the computation holds a
share xi of the secret, two uniform at random chosen additively secret shared
elements ρ1 and ρ2 in Zp and a public commitment C . The two random elements
ρ1 and ρ2 are needed in order to compute the commitment to the secret, and
the commitment is used when reconstructing the secret to check that no party
contributed a wrong share.
The commitment is computed using a double trapdoor Pedersen commitment
scheme [62] based on the hardness of the discrete logarithm in a group G of order
p generated by g. The commitment C is computed as C = Com(x, ρ1 , ρ2 ) =
g x h1ρ1 h2ρ2 where hi = g ti for i ∈ {1, 2} and ti being a trapdoor. We denote h1 , h2
as the public key of the commitment scheme. This makes a share in the Orlandi
protocol into a four-tuple (Zp × Zp × Zp × C), consisting of the share of the secret,
xi ∈ Zp , two uniformly randomly chosen numbers ρ1 , ρ2 ∈ Zp , and a commitment
to the secret computed as Com(x, ρ1 , ρ2 ). We write a share of the secret x as [x].
The protocol is secure in the Common Reference String (CRS) Model [19],
and a proof of the security is sketched in Orlandi’s PhD progress report [59]
under the assumption of the hardness of the discrete logarithm problem, a secure
broadcast protocol, and the semantic security of the homomorphic cryptosystem.
The security of the protocol holds, up to the security level 2−s if µ and d are
chosen such that
s < d log2 (M ) + (d + 1)log2 (ln(1 + µ)) + 2,
where M is the number of multiplicative triples needed for a given computation.
We refer the reader to Orlandi’s PhD progress report for the intuition behind
the above expression. The variables µ and d are used in the definition of the
commands below.
The protocol can be divided into two parts: a preprocessing part where mul-
tiplicative triples are generated and an online part where arithmetic expressions
are evaluated. The online part provides the commands one would usually ex-
pect from a VSS scheme such as commands for sharing a given secret Input(x),
reconstructing a secret Open([x]), creating a random secret Rand(), addition, sub-
traction, and multiplication (Mul([x], [y], [a], [b], [c])) of shared numbers.
We will not explain these commands further, except for the multiplication
command. We instead refer the reader to Orlandi’s PhD progress report.

59
7.3. ORLANDI PROTOCOL

The preprocessing part is divided into a number of building blocks (leak-


tolerant multiplication, triple generation, and triple test), which are composed
into the final triple generating (random triple generation) functionality which
produces a list of triples. We will describe online multiplication, triple generation,
and random triple generation below.

Basic Multiplication We define the multiplication of the shares [x] and [y] as
[z] = Mul([x], [y], [a], [b], [c]) where we assume that the parties are given
a random triple ([a], [b], [c]) s.t. c = a · b from an honest dealer. The
multiplication is realized as follows:

1. d = Open([x] − [a]), e = Open([y] − [b]),


2. [z] = e[x] + d[y] − de + [c],

The basic multiplication is used both as a building block in the preprocessing


and also for performing online multiplications. This is the main reason why
multiplicative triples are generated in the preprocessing, so that they can be used
in online multiplications. It also indicates that one multiplication requires one
triple.

Triple Generation The triple generation command TripleGen() creates a mul-


tiplicative triple which is shared among the parties. The triple generation
is realized as follows:

1. Every party Pi chooses ai , ri1 , ri2 ∈R Zp × Zp × Zp , computes αi =


Enceki (ai ), Ai = Com(ai , ri1 , ri2 ), and broadcasts them
2. Every party Pj does:
(a) choose bj , sj1 , sj2 ∈R Zp × Zp × Zp
(b) compute Bj = Com(bj , sj1 , sj2 ) and broadcast it
(c) Party Pj does, for every other party Pi :
i. choose di,j ∈R Zp3
b
ii. compute and send γi,j = αi j Enceki (1; 1)di,j to Pi
3. Every party Pi does:
P P
(a) compute ci = j Decski (γi,j ) − j di,j mod p
(b) pick ti1 , ti2 ∈R Zp ×Zp , compute and broadcast Ci = Com(ci , ti1 , ti2 )
Q Q Q
4. Everyone computes (A, B, C) = ( i Ai , i Bi , i Ci )
5. Every party Pi outputs:
([a]i , [b]i , [c]i ) = ((ai , ri1 , ri2 , Ai ), (bi , si1 , si2 , Bi ), (ci , ti1 , ti2 , Ci ))

60
7.4. ORLANDI IMPLEMENTATION

TripleGen() generates a triple by having each party first choose random shares
[a] and [b] including the needed randomness and the commitments. Second, each
party encrypts (Enceki (ai )) his share ai using his public key eki and a homomor-
phic cryptosystem. Then he broadcasts the encrypted share, the corresponding
commitment, and the commitment for bj . The share of the product [c] = [a] · [b]
is computed by using the homomorphic property of the received encrypted values
to multiply the shares [ai ] and [bj ]. The product is then masked with some ran-
domness di,j and sent. ci is then computed by decrypting Decski (γi,j ) the product
shares, adding them up and subtracting the randomness. ski is the private key of
party i. The computation inside encrypted values gives rise to the requirement
that the modulus of the cryptosystem which must be much larger than the mod-
ulus of the shares and the commitment scheme p. This is not an issue in practice
because the key size of a factorization based cryptosystem is usually much bigger
than the order of the group of points on an elliptic curve, if the same level of
security is to be obtained.

7.4 Orlandi Implementation


In this section we describe the implementation of our contribution to the VIFF
Orlandi implementation. The main contribution is an efficient implementation
of Paillier cryptosystem, including a python wrapper. Our second contribution
is the implementation of TripleGen() step 2c by using a python wrapper for the
Paillier library and our final contribution is an implementation of the python
module of step 3a.

7.4.1 Optimization of TripleGen()


As the TripleGen() is the core protocol and it is the most computational intensive
part of the whole protocol, it needs to be as fast and efficient as possible. Within
the TripleGen() there are two functions that consume most time 1) the commit-
ment scheme C (step 1 and 2) the calculation of γi,j (step 2c). VIFF project make
uses of C which is supplied by the PrimeInk ECC library v. 6.4.0 [7], the main
focus of this chapter is to optimize step 2.
In step 2c of the TripleGen() function, the value of γi,j is computed by mul-
b
tiplying Enc(1, 1)di.j with αi j . As the Orlandi protocol uses Paillier to encrypt
the value of (m = 1 and r = 1), as seen in chapter 6 this can be done quite
efficiently. But if we look closer at the whole product of γi,j it is possible to do it
even better. Hence, the encrypting of (1, 1) is determined beforehand as g and g N
are known for every party. Doing this, will result in one known or fixed-base ex-
b
ponentiation (i.e. Enc(1, 1)di,j ) and one non-fixed base exponentiation αi j which
is not known on forehand. As performance of initialization is less important than
the TripleGen() function, it allows us to use a fixed non-fixed exponentiation,

61
7.5. CONCLUSION AND FUTURE WORK

as seen in chapter 6. The implementation of the python wrapper lets us use fast
and efficient C code which performs better than python as seen later on. The
python wrapper for this optimization is a specially tailored one, but uses the
same underlying exponentiation methods, comb and unsigned fractional window
exponentiation methods, for calculating the result.
Additional speedup can be achieved by using
P homomorphic properties
P of Pail-
lier cryptosystem
Q in changing
P step 3a) ci = j Decski (γi,j ) − j di,j mod p to
ci = Decski ( j γi,j ) − j di,j mod p. This minor modification results in only one
exponentiation in total instead of one per party. This optimization is included in
the specially tailored wrapper for the Orlandi protocol.

7.4.2 Benchmarks
Figure 6.3 gives the average execution time of triple generation for two, three,
and nine parties. We have benchmarked different revisions of the VIFF Orlandi
implementation corresponding to the various optimizations we have performed.
Revision 1231 is the initial unoptimized implementation of the Orlandi protocol
which uses the python implementation of Paillier, revision 1355 uses an in-lined
step 1, 2a, and 2b of TripleGen(), revision 1370 uses our first C implemented
version of Paillier, and enormous improvements can be seen compared to the
previous version. Also it might be noted that two players are slower than tree
players. This is due to the implementation design. Revision 1393 uses our first
implementation of step 2c as described in TripleGen(). In revision 1393 our
specially written wrapper for step 3a is used. It makes use of the homomorphic
properties such that we only have to decrypt once. In addition we see a noticeable
performance gain. In version 1440 the fixed non-fixed bases is used which uses the
combination of comb and unsigned fractional window exponentiation methods.
For up to 6 players this gives a performance gain (see table 7.2, with more players
this becomes slower, the reason for this is not the module itself, but due to other
modification within the Orlandi python code.
It can be clearly seen that if the number of parties increases the performance
gain becomes higher, up almost 40 times for 9 players compared to the initial
implementation. For more details on implementation of the whole protocol we
like to refer the reader to [40, 41]

7.5 Conclusion and Future work


The results of the research that have been conducted in this thesis is an efficient
implementation the Paillier and DGK cryptosystems, and provided a good base
for secure multi party computation protocols that use homomorphic encryption.
In addition to the C implementation we provide python wrappers for the libraries
for such that development of new schemes, systems and protocols can be easily

62
7.5. CONCLUSION AND FUTURE WORK

implemented and tested. Our implementation provides an easy API for key gener-
ation, encryption and decryption for these cryptosystems. In addition it provides
multiple prime generation methods to construct (special) primes. Also, one can
access the different (simultaneous multi) exponentiation algorithms is provided
thought the API.
Our implementation uses the GNU gmp library for multi precision arithmetic,
it would be interesting, for future work, to use another multi precision arithmetic
library specially targeted at fields and modular operations, and make use of ad-
vanced compilers such as qhasm [11].
Also, an interesting direction of future work would be an implementation of a
homomorphic encryption scheme which is protected against quantum computers
based on lattices [34, 74]

parties rev. num. 1231 1355 1370 1393 1399 1400 1440
2 time 3519.6 3519.6 894.6 243.8 226.5 224.2 201.5
2 stdvar 1.0 0.8 3.2 0.9 0.7 0.7 1.2
3 time 3972.7 4012.1 376.3 155.0 168.3 170.9 135.0
3 stdvar 94.8 157.4 72.1 59.2 35.9 38.2 49.9
9 time 8937.4 8849.7 846.9 237.0 188.9 188.4 224.9
9 stdvar 460.2 281.2 27.0 36.5 20.7 29.0 29

Table 7.1: The average execution time in ms. of triple generation as a function
of number of parties.

63
7.5. CONCLUSION AND FUTURE WORK

parties time (ms) stdvar (ms)


2 201.5 1.2
3 135.4 49.9
4 155.3 39.5
5 180.5 36.2
6 192.4 38.3
7 204.0 29.1
8 224.9 31.3
9 235.2 29.0

Table 7.2: The average execution time in ms. of triple generation for latest
revision 1440.

64
Appendix A

Appendix

65
A.1. UNBALANCED BENCHMARKS FIXED

A.1 Unbalanced Benchmarks Fixed

cmb21
cmb42
cmb82
dsw2
dsw3
dsw4
dsw5
dsw4
dsw7
kary2
kary3
kary4
kary5
kary6
kary7
kary8
karym2
karym3
karym4
karym5
karym6
karym7
karym8
sswm3
sswm4
sswm5
sswm6
sswm7
ufracm31
ufracm33
ufracm41
ufracm43
227 ufracm45
ufracm51
CPU Cycles

ufracm53
ufracm55
ufracm61
ufracm63
ufracm65
ufracm67
ufracm71
ufracm73
ufracm75
ufracm77
rtl
sg1
sg2

226

225

223

1024 2048 3072 4096


Keysize in bits

Figure A.1: Unbalanced Encryption with bit length of e2 half the size of e1

66
A.1. UNBALANCED BENCHMARKS FIXED

cmb21
cmb42
cmb82
dsw2
dsw3
dsw4
dsw5
dsw4
dsw7
kary2
kary3
kary4
kary5
kary6
kary7
kary8
karym2
karym3
karym4
karym5
karym6
karym7
karym8
sswm3
sswm4
sswm5
sswm6
sswm7
ufracm31
ufracm33
ufracm41
ufracm43
227 ufracm45
ufracm51
CPU Cycles

ufracm53
ufracm55
ufracm61
ufracm63
ufracm65
ufracm67
ufracm71
ufracm73
ufracm75
ufracm77
rtl
sg1
sg2

226

225

223

1024 2048 3072 4096


Keysize in bits

Figure A.2: Unbalanced Encryption with bit length of e2 one fourth the size of
e1

67
Bibliography

[1] VIFF - The Virtual Ideal Functionality Framework. http://viff.dk.


[2] Proceedings of the Twentieth Annual ACM Symposium on Theory of Com-
puting, 2-4 May 1988, Chicago, Illinois, USA. ACM, 1988.
[3] Advances in Cryptology - CRYPTO ’98, 18th Annual International Cryp-
tology Conference, Santa Barbara, California, USA, August 23-27, 1998,
H. Krawczyk (Editor), volume 1462 of Lecture Notes in Computer Science.
Springer, 1998.
[4] H. W. Lenstra A. K. Lenstra, editor. The development of the number field
sieve, volume 1554 of Lecture Notes in Mathematics. Springer Berlin / Hei-
delberg, 1993.
[5] M. Ben-Or an S. Goldwasser and A. Wigderson. Completeness theorems
for non-cryptographic fault-tolerant distributed computation (extended ab-
stract). In STOC, [2], pages 1–10, 1988.
[6] I. Anshel, M. Anshel, and D. Goldfeld. An algebraic method for public-key
cryptography. Mathematical Research Letters, 6:287–292, 1999.
[7] Cryptomatic A/S. PrimeInk ECC library v. 6.4.0. http://www.
cryptomatic.com.
[8] M. Bellare, A. Desai, D. Pointcheval, and P. Rogaway. Relations among
notions of security for public-key encryption schemes. In CRYPTO [3], pages
26–45.
[9] J.C. Benaloh. Verifiable Secret-Ballot Elections, PhD Thesis. Yale Univer-
sity, 1987.
[10] D. J. Bernstein and T. Lange. eBACS: ECRYPT benchmarking of crypto-
graphic systems. http://bench.cr.yp.to.

69
Bibliography

[11] D. J. Bernstein and P. Schwabe. New AES Software Speed Records. In


D. Roy Chowdhury, V. Rijmen, and A. Das, editors, INDOCRYPT, volume
5365 of Lecture Notes in Computer Science, pages 322–336. Springer, 2008.

[12] Daniel J. Bernstein. Cache-timing attacks on AES, 2004. URL:


http://cr.yp.to/papers.html#cachetiming.

[13] D.J. Bernstein. Pippengers exponentiation algorithm, to be incorporated into


Bernsteins High-speed cryptography book. URL: http://cr. yp. to/papers.
html# pippenger.

[14] S. Bhasin, S. Guilley, L. Sauvage, and J. Danger. Unrolling cryptographic cir-


cuits: A simple countermeasure against side-channel attacks. In J. Pieprzyk,
editor, CT-RSA, volume 5985 of Lecture Notes in Computer Science, pages
195–207. Springer, 2010.

[15] D. Bogdanov, S. Laur, and J. Willemson. Sharemind: A framework for


fast privacy-preserving computations. In S. Jajodia and J. López, editors,
ESORICS, volume 5283 of Lecture Notes in Computer Science, pages 192–
206. Springer, 2008.

[16] P. Bogetoft, I. Damgård, T. P. Jakobsen, K. Nielsen, J. Pagter, and T. Toft.


A practical implementation of secure auctions based on multiparty integer
computation. In Giovanni Di Crescenzo and Aviel D. Rubin, editors, Finan-
cial Cryptography, volume 4107 of Lecture Notes in Computer Science, pages
142–147. Springer, 2006.

[17] A. Brauer. On addition chains. Bulletin of the American Mathematical


Society, 45(10):736–739, 1939.

[18] E. F. Brickell, D. M. Gordon, K. S. McCurley, and D. Bruce Wilson. Fast


exponentiation with precomputation. In EUROCRYPT ’92, pages 200–207,
1992.

[19] R. Canetti and M. Fischlin. Universally composable commitments. In J. Kil-


ian, editor, CRYPTO, volume 2139 of Lecture Notes in Computer Science,
pages 19–40. Springer, 2001.

[20] S. Chari, V. V. Diluoffo, P. A. Karger, E. R. Palmer, T. Rabin, J. R. Rao,


P. Rohatgi, H. Scherzer, M. Steiner, and D. C. Toll. Designing a side chan-
nel resistant random number generator. In D. Gollmann, J. L. Lanet, and
J. Iguchi-Cartigny, editors, CARDIS, volume 6035 of Lecture Notes in Com-
puter Science, pages 49–64. Springer, 2010.

[21] D. Chaum, C. Crépeau, and I. Damgård. Multiparty unconditionally secure


protocols. In STOC [2], pages 11–19, 1988.

70
Bibliography

[22] B. Chor, S. Goldwasser, S. Micali, and B. Awerbuch. Verifiable secret sharing


and achieving simultaneity in the presence of faults. In Proceedings of the
26th Annual Symposium on Foundations of Computer Science, pages 383–
395. IEEE Computer Society, 1985.
[23] R. Cramer and V. Shoup. A practical public key cryptosystem provably
secure against adaptive chosen ciphertext attack. In CRYPTO [3], pages
13–25, 1998.
[24] J. Daemen and V. Rijmen. The design of Rijndael. Springer-Verlag New
York, Inc. Secaucus, NJ, USA, 2002.
[25] I. Damgård, M. Geisler, and M. Krøigaard. Efficient and secure comparison
for on-line auctions. In J. Pieprzyk, H. Ghodosi, and E. Dawson, editors,
ACISP, volume 4586 of Lecture Notes in Computer Science, pages 416–430.
Springer, 2007.
[26] I. Damgård, M. Geisler, and M. Krøigaard. A correction to ’efficient and
secure comparison for on-line auctions’. IJACT, 1(4):323–324, 2009.
[27] I. Damgard, M. Geisler, and M. Kroigard. Homomorphic encryption and
secure comparison. International Journal of Applied Cryptography, 1(1):22–
31, 2008.
[28] W. Diffie and M. Hellman. New directions in cryptography. IEEE Transac-
tions on information Theory, 22(6):644–654, 1976.
[29] C. Ding, D. Pei, and A. Salomaa. Chinese remainder theorem: applications
in computing, coding, cryptography. World Scientific Publishing Co., Inc.
River Edge, NJ, USA, 1996.
[30] N. Smart (Editor). ECRYPT II Yearly Report on Algorithms and Keysizes
(2010), Revision 1.0. 2010.
[31] M. A. Elaabid and S. Guilley. Practical improvements of profiled side-channel
attacks on a hardware crypto-accelerator. In D. J. Bernstein and T. Lange,
editors, AFRICACRYPT 2010, volume 6055 of Lecture Notes in Computer
Science, pages 243–260. Springer, 2010.
[32] J. Franke, T. Kleinjung, C. Paar, J. Pelzl, C. Priplata, and C. Stahlke.
SHARK: A realizable special hardware sieving device for factoring 1024-bit
integers. In Josyula R. Rao and Berk Sunar, editors, CHES, volume 3659 of
Lecture Notes in Computer Science, pages 119–130. Springer, 2005.
[33] T. El Gamal. A public key cryptosystem and a signature scheme based on
discrete logarithms. IEEE Transactions on Information Theory, 31(4):469–
472, 1985.

71
Bibliography

[34] C. Gentry. Fully homomorphic encryption using ideal lattices. In M. Mitzen-


macher, editor, STOC, pages 169–178. ACM, 2009.

[35] O. Goldreich, S. Goldwasser, and S. Halevi. Public-key cryptosystems from


lattice reduction problems. In Burton S. Kaliski Jr., editor, CRYPTO, vol-
ume 1294 of Lecture Notes in Computer Science, pages 112–131. Springer,
1997.

[36] S. Goldwasser and S. Micali. Probabilistic encryption and how to play mental
poker keeping secret all partial information. In STOC, pages 365–377. ACM,
1982.

[37] T. Granlund. GNU MP. The GNU Multiple Precision Arithmetic Library,
1996.

[38] J. Groth. Cryptography in subgroups of Zn . In J. Kilian, editor, TCC’05,


volume 3378 of Lecture Notes in Computer Science, pages 50–65. Springer,
2005.

[39] M. Horn. Design and implementation of an interface for cyclic finite groups
in Java. Technical note, April 2003.

[40] T. P. Jakobsen, M. X. Makkes, and J. Dam Nielsen. Efficient Implementation


of the Orlandi Protocol. In J. Zhou and M. Yung, editors, ACNS, volume
6123 of Lecture Notes in Computer Science, pages 255–272, 2010.

[41] T. P. Jakobsen, M. X. Makkes, and J. Dam Nielsen. Efficient Implementa-


tion of the Orlandi Protocol Extended Version. Cryptology ePrint Archive,
Report 2010/224, 2010. http://eprint.iacr.org/.

[42] W.S. Jevons. The principles of science: a treatise on logic and scientific
method. Classworks, 1877.

[43] A. Joux, editor. Advances in Cryptology - EUROCRYPT 2009, 28th Annual


International Conference on the Theory and Applications of Cryptographic
Techniques, Cologne, Germany, April 26-30, 2009., volume 5479 of Lecture
Notes in Computer Science. Springer, 2009.

[44] M. Joye, P. Paillier, and S. Vaudenay. Efficient generation of prime numbers.


In Çetin Kaya Koç and Christof Paar, editors, CHES, volume 1965 of Lecture
Notes in Computer Science, pages 340–354. Springer, 2000.

[45] T. Kleinjung, K. Aoki, J. Franke, A.K. Lenstra, E. Thomé, J.W. Bos,


P. Gaudry, A. Kruppa, P.L. Montgomery, D.A. Osvik, et al. Factorization
of a 768-bit RSA modulus. 2010.

72
Bibliography

[46] Al K. Lenstra, E. Tromer, A. Shamir, W. Kortsmit, B. Dodson, J. Hughes,


and P. C. Leyland. Factoring estimates for a 1024-bit RSA modulus. In
Chi-Sung Laih, editor, ASIACRYPT ’03, volume 2894 of Lecture Notes in
Computer Science, pages 55–74. Springer, 2003.

[47] C.H. Lim and P.J. Lee. More flexible exponentiation with precomputation.
Lecture Notes in Computer Science, 839:95–107, 1994.

[48] M.X. Makkes. Paillier and DGK libraries including python modules. http:
//www.kr85.org/.

[49] D. Malkhi, N. Nisan, B. Pinkas, and Y. Sella. Fairplay - Secure Two-Party


Computation System. In USENIX Security Symposium, pages 287–302.
USENIX, 2004.

[50] T. Matsumoto and H. Imai. Public Quadratic Polynomial-tuples for effi-


cient signature-verification and message-encryption. In Eurocrypt, volume 88,
pages 419–453. Springer.

[51] U.M. Maurer. Fast generation of prime numbers and secure public-key cryp-
tographic parameters. Journal of Cryptology, 8(3):123–155, 1995.

[52] R.J. McEliece. A public-key cryptosystem based on algebraic coding theory.


DSN progress report, 4244:114–116, 1978.

[53] R. Merkle and M. Hellman. Hiding information and signatures in trapdoor


knapsacks. IEEE Transactions on Information Theory, 24(5):525–530, 1978.

[54] B. Möller. Improved techniques for fast exponentiation. In P.J. Lee and C.H.
Lim, editors, ICISC, volume 2587 of Lecture Notes in Computer Science,
pages 298–312. Springer, 2002.

[55] D. Naccache and J. Stern. A new public-key cryptosystem. In W. Fumy, ed-


itor, EUROCRYPT ’97, volume 1233 of Lecture Notes in Computer Science,
pages 27–36. Springer Berlin / Heidelberg, 1997.

[56] P. Q. Nguyen. Cryptanalysis of the goldreich-goldwasser-halevi cryptosystem


from crypto ’97. In M. J. Wiener, editor, CRYPTO, volume 1666 of Lecture
Notes in Computer Science, pages 288–304. Springer, 1999.

[57] T. Okamoto and S. Uchiyama. A new public-key cryptosystem as secure as


factoring. In K. Nyberg, editor, EUROCRYPT ’98, volume 1403 of Lecture
Notes in Computer Science, pages 308–318. Springer Berlin / Heidelberg,
1998.

73
Bibliography

[58] L.D. Olson. Side-channel attacks in ECC: A general technique for varying
the parametrization of the elliptic curve. In M. Joye and J. J. Quisquater,
editors, CHES, volume 3156 of Lecture Notes in Computer Science, pages
220–229. Springer, 2004.

[59] C. Orlandi. LEGO and Other Cryptographic Constructions - PhD Progress


Report. http://www.cs.au.dk/~orlandi/, March 2009.

[60] P. Paillier. Public-key cryptosystems based on composite degree residuosity


classes. In J. Stern, editor, EUROCRYPT ’99, volume 1592 of Lecture Notes
in Computer Science, pages 223–238. Springer, 1999.

[61] P. Paillier and D. Pointcheval. Efficient public-key cryptosystems provably


secure against active adversaries. In K.Y. Lam, E. Okamoto, and C. Xing,
editors, ASIACRYPT ’99, volume 1716 of Lecture Notes in Computer Sci-
ence, pages 165–179. Springer, 1999.

[62] T.P. Pedersen. Non-interactive and information-theoretic secure verifiable


secret sharing. In Joan Feigenbaum, editor, CRYPTO, volume 576 of Lecture
Notes in Computer Science, pages 129–140. Springer, 1991.

[63] C. Percival. Cache missing for fun and profit. BSDCan 2005, 2005.

[64] H.C. Pocklington. The determination of the prime or composite nature of


large numbers by Fermats theorem. In Proceedings of the Cambridge Philo-
sophical Society, volume 18, pages 29–30, 1914.

[65] S. Pohlig and M. Hellman. An improved algorithm for computing logarithms


over GF (p) and its cryptographic significance (Corresp.). IEEE Transactions
on information Theory, 24(1):106–110, 1978.

[66] M.O. Rabin. Probabilistic algorithm for testing primality. Journal of Number
Theory, 12(1):128–138, 1980.

[67] R.L. Rivest. The RC5 encryption algorithm. Dr Dobb’s Journal-Software


Tools for the Professional Programmer, 20(1):146–149, 1995.

[68] R.L. Rivest, A. Shamir, and L. Adleman. On Digital Signatures and Public-
Key Cryptosystems. Laboratory for Computer Science, Massachusetts Insti-
tute of Technology, 1977.

[69] B. Schneier. The Blowfish encryption algorithm. Dr Dobb’s Journal-Software


Tools for the Professional Programmer, 19(4):38–43, 1994.

[70] A. Shamir. A polynomial-time algorithm for breaking the basic Merkle-


Hellman cryptosystem. IEEE Transactions on Information Theory,
30(5):699–704, 1984.

74
Bibliography

[71] A. Shamir and E. Tromer. On the cost of factoring RSA-1024. RSA Cryp-
toBytes, 6(2):10–19, 2003.

[72] P.W. Shor. Polynomial-time algorithms for prime factorization and discrete
logarithms on a quantum computer. SIAM Review, pages 303–332, 1999.

[73] J.H. Silverman. A friendly introduction to number theory. Prentice Hall,


2001.

[74] N.P. Smart and F. Vercauteren. Fully homomorphic encryption with rela-
tively small key and ciphertext sizes. In P.Q. Nguyen and D. Pointcheval,
editors, Public Key Cryptography, volume 6056 of Lecture Notes in Computer
Science, pages 420–443. Springer, 2010.

[75] W. Stallings. Cryptography and network security: principles and practice.


Prentice Hall, 2006.

[76] E.G. Straus. Addition chains of vectors (problem 5125). American Mathe-
matical Monthly, 70(806-808):16, 1964.

[77] H.C.A. van Tilborg, editor. Encyclopedia of Cryptography and Security.


Springer, 2005.

[78] Andrew Chi-Chih Yao. How to generate and exchange secrets. In FOCS,
pages 162–167. IEEE, 1986.

[79] Y.S. Yeh, T.Y. Huang, H.Y. Lin, and Y.H. Chang. A study on parallel RSA
factorization. Journal of computers, 4(2):112–118, 2009.

[80] S.M. Yen, C.S. Laih, and A.K. Lenstra. Multi-exponentiation (cryptographic
protocols). IEE Proceedings-Computers and Digital Techniques, 141:325,
1994.

75
Index

2k -ary matrix method, 32 Paillier Cryptosystem, 13


2k -ary method, 32 Main Variant, 17
Subgroup Variant, 17
auxiliary table, 29
Quadratic Residuosity Assumption, 13
Binary Left to Right Method, 30
Random Self-Reducible, 11
CCA1, 10
CCA2, 10 Secure Multi-Party Computation, 57
Chinese Remainder Theorem, 2 Semantically Secure, 10
Chosen Plaintext Attack, 10 Simultaneous Multi exponentiation, 30
Adaptive, 10 Simultaneous Sliding Window Matrix
Batch method, 34
non-adaptive, 10 Simultaneous Sliding Window method,
Chosen-Ciphertext Attack, 10 33
Composite Residuosity, 15
Computational Security, 9 Verifiable Secret Sharing, 58
VSS, 58
Decisonal Composite Residuosity As-
sumption, 14
DGK Cryptosystem, 24
Proper Decryption Variant, 25
Small Message Space Variant, 24
Discrete Logarithm Problem, 11
DLP, 11
Fixed Base Comb Exponentiation, 38
Integer Factorization Problem, 11
Malleability, 11
Orlandi Protocol, 58

77
Abstract

In this thesis we present an efficient and secure software implementation of the


Paillier and DGK homomorphic cryptosystems which depend on the performance
of simultaneous multi-exponentiation algorithms. We describe various algorithms
which are suitable for simultaneous multi-exponentiation and benchmark the im-
plementation. Also, we show how an implementation of such algorithms and
cryptosystems can be modified such that they are secure against timing-attacks.
In addition we present an implementation of the Orlandi protocol, which is the
first implementation of a protocol for multiparty computation on arithmetic cir-
cuits, which is secure against up to n − 1 static, active adversaries. The efficiency
of the implementation is largely obtained through an efficient implementation of
the Paillier cryptosystem.

79

You might also like