Eindhoven University of Technology: Award Date: 2010
Eindhoven University of Technology: Award Date: 2010
MASTER
Makkes, M.X.
Award date:
2010
Link to publication
Disclaimer
This document contains a student thesis (bachelor's or master's), as authored by a student at Eindhoven University of Technology. Student
theses are made available in the TU/e repository upon obtaining the required degree. The grade received is not published on the document
as presented in the repository. The required complexity or quality of research of student theses may vary by program, and the required
minimum study period may vary in duration.
General rights
Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners
and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights.
• Users may download and print one copy of any publication from the public portal for the purpose of private study or research.
• You may not further distribute the material or use it for any profit-making activity or commercial gain
Efficient Implementation of
Homomorphic Cryptosystems
Marc X. Makkes
Efficient Implementation of
Homomorphic Cryptosystems
The investigations were partially supported by the EU’s Seventh Framework Pro-
gramme (FP7), project CACE (Computer Aided Cryptography Engineering) un-
der contract number ICT-2008-216499.
Preface ix
Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x
1 Number Theory 1
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Mathematical Preliminaries . . . . . . . . . . . . . . . . . . . . . 1
1.2.1 Congruences . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2.2 Chinese remainder Theorem . . . . . . . . . . . . . . . . . 1
1.2.3 Orders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2.4 Carmicheals λ-function . . . . . . . . . . . . . . . . . . . . 4
2 Cryptographic Terminology 7
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2.1 Symmetric and Asymmetric Cryptography . . . . . . . . . 8
2.2.2 Malleable . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2.3 Random Self-Reducibility . . . . . . . . . . . . . . . . . . 11
2.3 Bases of Cryptography . . . . . . . . . . . . . . . . . . . . . . . . 11
v
vi Contents
3.4 Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.4.1 Recommended Key-Sizes . . . . . . . . . . . . . . . . . . . 20
4 DGK-Crypto System 23
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.2 DGK cryptosystem . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.2.1 Small Message Space variant . . . . . . . . . . . . . . . . . 24
4.2.2 Proper Decryption Variant . . . . . . . . . . . . . . . . . . 25
4.2.3 Homomorphic Properties . . . . . . . . . . . . . . . . . . . 26
4.3 Security and Key-sizes . . . . . . . . . . . . . . . . . . . . . . . . 26
5 Simultaneous Multi-Exponentiation 29
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
5.2 Pre-computation . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
5.3 Simultaneous multi exponentiation for non-fixed bases . . . . . . . 30
5.3.1 Binary Left to Right-Method . . . . . . . . . . . . . . . . 30
5.3.2 The 2k -ary Methods . . . . . . . . . . . . . . . . . . . . . 32
5.3.3 2k -ary matrix exponentiation . . . . . . . . . . . . . . . . 32
5.3.4 Simultaneous Sliding Window method . . . . . . . . . . . 33
5.3.5 Simultaneous Sliding Window Matrix method . . . . . . . 34
5.3.6 Unsigned Fractional Windows . . . . . . . . . . . . . . . . 36
5.4 Fixed-base exponentiation . . . . . . . . . . . . . . . . . . . . . . 37
5.4.1 Pre-computation of squares of g . . . . . . . . . . . . . . . 37
5.4.2 Fixed Base Comb method . . . . . . . . . . . . . . . . . . 38
A Appendix 65
A.1 Unbalanced Benchmarks Fixed . . . . . . . . . . . . . . . . . . . 66
Bibliography 69
Index 77
Abstract 79
Preface
The field of cryptology is one of the most intriguing subjects to study. Today’s
world is filled with lots of secrets that we do not want to share with each other
such as PIN’s, credit card numbers, and social identification numbers. One of the
goals in cryptography is to hide information by encoding or obscuring it, such
that when it travels over public channels like telephone lines, internet, etc, only
the receiving party or parties can decode or de-obfuscate data to read its original
message.
Outline
Chapter 1 gives an informal introduction to number theory. In Chapter 2 gives an
introduction to basic cryptographic terminology, as far as it is necessary to follow
the remainder of the text. Chapter 3 is concerned entirely with the description
of Paillier asymmetric homomorphic cryptosystem which includes an in depth in-
formation on the encrypting and decrypting process, parameter estimation, and
algebraic optimalization. In Chapter 4 we present the DGK cryptosystem and
explain it in detail. Chapter 5 is the main part of this thesis as it dicusses different
algorithms for simultaneous multi exponentiation which is the core for both the
Paillier and the DGK cryptosystems. Chapter 6 discusses the implementation
of both the Paillier and DGK cryptosystem together with all the algorthms of
Chapter 5. In addittion, the chapter shows the performance of every simultaneous
multi exponentions algorithm in context of the Paillier and DGK cryptosystem.
Also the chapter shows the implementation of counter messeures against simple
side-channel attacks. Finally, Chapter 7 presents collaborative work on the im-
plementation and modification of a secure multiparty computation protocol by
Orlandi which relies on homomorphic property of the Paillier cryptosystem. The
Partial results of chapters 6 to 7 where published in [40] and the extended version
in [41].
ix
x Acknowledgments
Acknowledgments
First and foremost I would like to express my gratitude to Tanja Lange, who
introduced me to the problem of optimization of cryptographic protocols and
has been an infinite source of advice. For giving me a glimpse of research and
for the many enjoyable discussions. Also for giving me the opportunity to at-
tend the ACNS 2010 in Beijing, SPEED-CC congress in Berlin and joining the
EIDMA/DIAMANT Cryptography Working Group meetings in Utrecht. Many
thanks also to Daniel Bernstein for joining the discussions and bringing advice to
the subject matter and joyful anecdotes. For advice, and interesting, useful and
often fun discussions I would like to thank Peter Schwabe.
In addition I would like to thank Janus Dam Nielsen and Thomas P. Jakobsen,
for great cooperation and testing the Paillier and DGK code and collaborative
work on the ACNS paper. In addition for some pointers and suggestions I would
like to thank Ivan Darmgård and Tomas Toft.
Thanks to Peter Schwabe, Sebastiaan de Hoogh and Ilse Groot for proof-
reading this thesis. All remaining errors are mine.
Also, I would like to thank Hans Kanters for letting me use one of his beautiful
paintings for my front cover.
Also I would like thank Sandro Etalle and Berry Schoenmakers for being on
the committee.
Furthermore, I want to thank Peter Fonts and Nicole Makkes, who advised
me to enter university in the first place. Then finally, I would like to thank my
family and friends for being who they are.
Amsterdam Marc X. Makkes
June, 2010.
Chapter 1
Number Theory
1.1 Introduction
In this chapter we state definitions and simple properties of the algebraic struc-
tures we shall use constantly in the remainder of this thesis. Readers who are
unfamiliar with the number theory we refer the to book [73] for a friendly intro-
duction.
1.2.1 Congruences
For a positive integer n, two integers a and b are said to be congruent modulo n,
written:
a ≡ b (mod n)
1
1.2. MATHEMATICAL PRELIMINARIES
x ≡ a1 (mod m1 ) (1.1)
x ≡ a2 (mod m2 ) (1.2)
..
. (1.3)
x ≡ an (mod mn ) (1.4)
Furthermore, all solutions x to this system are congruent modulo the product
M
M = m1 · · · mn . So there is a unique yi ∈ Z for which yi ≡ ( mi
) (mod mi ) such
that:
n
X
x= ai yi mi (mod M )
i=0
1.2.3 Orders
In this part we assume that the reader has a basic understanding of algebraic
structure such as groups, rings and fields. Orders of multiplicative groups and
elements play an important role in cryptography.
Euler φ-function
This is also known as the Euler’s totient function. The totient φ(n) of a positive
integer n is defined to be the number of positive integers less than or equal to n
that are co-prime to n. For all numbers we can distinguish three cases.
1. Prime numbers where all numbers less than the prime number p are co-
prime, so φ(p) = p − 1.
This function is very important in the field of number theory, as the result of
the function is also the size of the multiplicative group of integers modulo n.
Next, we introduce order of group elements and their properties. Let G be a
group that is multiplicatively written with neutral element 1.
2
1.2. MATHEMATICAL PRELIMINARIES
g e = g kn = (g n )k = 1k = 1
g r = g e−qn = g e (g n )−q = 1.
Proof: We have
3
1.2. MATHEMATICAL PRELIMINARIES
y = xN (mod N 2 )
Proof: In every finite cyclic group G the equation xd = a has gcd(x, ord(G))
different solutions. Since the group Z∗N 2 is not a cyclic group we cannot prove
this property directly, but it can be applied to Z∗p2 and Z∗q2 having orders φ(p2 ) =
p(p−1) and respectively φ(q 2 ) = q(q−1). So from the equation y = xN (mod N 2 ),
consider the equations
y p ≡ xN (mod p2 ) (1.8)
and
y q ≡ xN (mod q 2 ) (1.9)
Equation (1.8) has gcd(N, p(p − 1)) = p different solutions and equation (1.9) has
gcd(N, q(q − 1)) = q different solutions. Using Theorem 1.2.1, these solutions can
be combined to pq = N different solutions modulo N 2
4
1.2. MATHEMATICAL PRELIMINARIES
Proof: Since T and ZN∗ have the same cardinality, | T |= φ(N ). In addition
it is easy to check that T is a multiplicative group. Note that for every element
in (1 + yN ) ∈ T the following relation is satisfied:
R = {(1 + xN )w (mod N 2 ), x ∈ ZN
5
1.2. MATHEMATICAL PRELIMINARIES
6
Chapter 2
Cryptographic Terminology
2.1 Introduction
The word cryptography comes from two Greek words; κρυπτ ωσ kryptos (hid-
den) and γραφω graphein (to write) and has a history going back for more than
4000 years. Today, cryptography is the art and science of making communica-
tions unintelligible to all except the intended recipients. The bases of public key
cryptographic systems are found in number theory. In this section we describe
the basic terminology of cryptography. This terminology is used to understand
the remainder of this thesis. Advanced readers, who are already familiar with
basic cryptographic terminology are advised to skip this chapter and move on to
Chapter 3.
2.2 Terminology
In the field of cryptography there are many situations described where parties
want to communicate with each other. The basic action is that a party wants
to send a message to another party neither revealing the contents of the message
nor information about contents of the message, while the message is transit. To
obscure the sent message cryptography is used, and there are according defini-
tions:
• The set M is known as the message space or plaintext space, all elements
in this set are considered to be messages,
• The set C is the ciphertext space, all elements in this set are called cipher-
texts,
• Key space K is the set of all possible keys, an element k from this space is
known as a key,
7
2.2. TERMINOLOGY
8
2.2. TERMINOLOGY
Probabilistic Encryption
We call a cryptosystem deterministic if every encryption of the same message m
results in the same ciphertext c encrypted under the same key k. If determinism
is a property of a cryptosystem then it leaks information if a same message is sent
twice using the same key. An example of such an encryption system is the RSA
system, where the message m is raised to a special exponent e, i.e. the ciphertext
c becomes c ≡ me (mod N ) and because e is fixed, encrypting the same message
m twice will always result in the same ciphertext c.
A non-deterministic cryptosystem or probabilistic cryptosystem which is a
system where the encryption of a message will result in a different ciphertext for
each time the same message is encrypted. Decryption on the other hand will
always result in the same message m that was encrypted. The non-determinism
comes from the fact that a random number (a nonce) is taken in to the equation
of the encryption function when decrypting, the nonce is taken out due to some
properties of chosen values as we see later on in this thesis.
Computational Secure
We call a cryptosystem computationally secure, if the practical side of attacking
the cryptographic system is infeasible. In this security model, we do not grant
an adversary unlimited computational resources. Instead, we are concerned with
the amount of computation required to break the security of a system. We say
that a system is computationally secure, if the level of computation necessary to
defeat it exceeds the computational resources of any hypothetical adversary by a
comfortable margin. The adversary is thereby allowed to use the best known at-
tacks against the system. Closely related are the concepts of complexity-theoretic
security and provable security.
In this thesis, we will use the term computational security to include both
notions. In complexity-theoretic security, the adversary is modelled as having only
polynomial computational power. This means that any attacks involve time and
space polynomial in the size of the underlying security parameters of the system.
In the setting of provable security, the difficulty of defeating the system’s security
is proven to be as difficult as solving a well-known problem which is thought to
be hard. Note that this does not prove the protocol to be unconditionally secure,
but only makes a statement of equivalence between the security of the protocol
and a hard to compute problem. In practice, these are often number-theoretic
9
2.2. TERMINOLOGY
Semantically Secure
A cryptosystem is called semantically secure if it infeasible for a computationally
bounded adversary to derive significant information about a plaintext when only
given its ciphertext and the corresponding public encryption system. Semantical
security considers only that the adversary is ”passive”, i.e. an adversary who only
collects ciphertexts and network traffic.
• Non adaptive chosen-plaintext attack is where the adversary chooses all the
plaintexts on beforehand before any plaintext is encrypted. This is also
known as ”batched” chosen-plaintext attack.
10
2.3. BASES OF CRYPTOGRAPHY
2.2.2 Malleable
An encryption algorithm is malleable if it is possible for an adversary to transform
a ciphertext into another valid ciphertext which decrypts to another plaintext.
Example; if an adversary intercepts a ciphertext c, he or she is able to trans-
form c with use of some function f (c) without necessarily knowing or learning
the encrypted message. The receiver is still able to decrypt the altered message
without knowing that it is altered.
A cryptosystem may be semantically secure against chosen plaintext attacks or
even non-adaptive chosen ciphertext attacks (CCA1) while still being malleable.
However, security against CCA2 is equivalent to non-malleability.
A year before the release of the RSA cryptosystem, Diffie and Hellman came
up with a scheme for exchanging keys based on public and private parameters,
this system is known as the Diffie-Hellman key exchange [28] which relies on the
Discrete Logarithm Problem (DLP).
11
2.3. BASES OF CRYPTOGRAPHY
12
Chapter 3
The Paillier Cryptographic system
This chapter describes in depth the public key cryptosystem presented by Paillier
at Eurocrypt ’99 [60].
3.1 Introduction
In recent years a new direction of research started to find cryptographic trap-
door function with homomorphic properties. These developments where known
as trapdoor discrete log and arose from the algebraic setting of high degree resid-
uosity classes and came first to light in the Goldwasser-Micali [36] scheme, where
the message space is a ring M of a modular residue and ciphertexts are in the
multiplicative group G of invertible elements of some particular ring of integers
modulo a hard to factor number.The encryption of a message m is always a group
element of the form E(m, r) = g m re ∈ G where e is some public integer, g a fixed
public element in G and r is chosen at random in some particular multiplica-
tive subgroup R of G. Since R is a subgroup, such schemes have the additive
homomorphic property (e.g. encryption of m1 + m2 can be obtained as follows
E(m1 + m2 , r1 r2 ) = E(m1 , r1 )E(m2 , r2 ).
Goldwasser and Micali based their scheme on quadratic resiudues and selected
M = Z2 , G = R = Z∗n where N = pq is a RSA modulus, e = 2 and the base g
as a pseudo-square modulo N . The sematic security follows from the quadratic
residuosity assumption.
13
3.2. THE PAILLIER CRYPTOSYSTEM
The small message space, i.e Z2 limits the bandwidth of the Goldwasser-
Micali scheme. The Benaloh-Fischer scheme [9] later improved the bandwidth
of the Goldwass-Micali scheme by using higher-order residues: It basically is
the same scheme but uses M = Ze for the message space, e is a small prime
number dividing φ(N ) such that e2 does not divide φ(N ) and g is a non e-th
residue modulo N . The semantic security is proven under the prime residuosity
assumption. Despite being secure, the scheme is inefficient as the decryption
involves some kind of exhaustive search, implying that e must be small. Naccache
and Stern [55] proposed a variant of the Benaloh-Fischer scheme which allows high
bandwidth. This is achieved by taking e not as a prime but as a product of small
primes e1 , . . . , e2 such that φ(N ) is divisible by ei but none of the e2i , and g is
an ei -th non-residue modulo N , for all i. The exhaustive search is still needed in
order to decrypt the message.
Okamoto and Uchiyama [57] significantly extended the encryption rate by
investigating two different approaches: residuosity of smooth degree in Zpq and
residuosity of prime degree p in Zp2 q instead of Zpq for R, G. Use Zp for message
space M and choose g such that the order of g p (mod p) is p. The scheme reaches
bandwidths similar to Naccache Stern crypto system, but is more efficient in
decrypting and is semantically secure under the P-Subgroup assumption.
14
3.2. THE PAILLIER CRYPTOSYSTEM
Eg : ZN × Z∗N → Z∗N 2
defined as follows
Eg = g x y N (mod N 2 ) (3.1)
Proof: Since the sets ZN × Z∗N and Z∗N 2 have the same cardinality we just need
to prove that Eg is injective. Assume we have
0
g x y N ≡ g x y 0N (mod N 2 ) (3.2)
If both sides of the equation 3.2 are raised to the power λ(N ) we get:
0
(g x y N )λ(N ) ≡ (g x y 0N )λ(N ) (mod N 2 )
0
g x·λ(N ) y N ·λ(N ) ≡ g x ·λ(N ) y 0N ·λ(N ) (mod N 2 )
Note that since g has order multiple of N and gcd(N, λ(N )) = 1, g λ(N ) has
order N . Consequently it can be written as (1 + zN ) for some z ∈ ZN , with z 6= 0
and becomes:
0
(1 + zN )x ≡ (1 + zN )x (mod N 2 )
This implies that x ≡ x0 (mod N ) and we can rewrite equation (3.2) to:
y N ≡ y 0N (mod N 2 )
y
( 0 )N ≡ 1 (mod N 2 )
y
15
3.2. THE PAILLIER CRYPTOSYSTEM
Classes
We denote the subgroup Bα ⊂ Z∗N 2 as the set of elements of order N α with α 6= 0
by B their disjoint union for α = 1, · · · , N, α | (N ).
Eg (x, y) ≡ g x · y n ≡ w (mod N 2 )
3.2.5. Theorem. For every w ∈ Z∗N 2 , the function that is associates to w its
corresponding class JwKg is a homomorphism from (Z∗N 2 , ×) to (ZN , +).
The L-function
Consider the following set
SN = {u < N 2 | u ≡ 1 (mod N )}
16
3.2. THE PAILLIER CRYPTOSYSTEM
Proof: Since 1 + N ∈ B, there exists a unique pair (a, b) in the set ZN × Z∗N
such that w = (1 + N )a bN (mod N 2 ). By definition, a = JwK1+N . Then
wλ ≡ (1 + N )aλ bN λ
≡ 1 + aλ(N )N (mod N 2 )
Main Variant
Key Generation. Let N be a RSA modulus N = pq, where p and q are large prime integers.
Let g ∈ Z∗N 2 and let the order of g be a multiple of N . Let λ = lcm(p −
1, q − 1). The public key is (g, N ), and the private key is λ.
Next, we show why the Paillier main scheme works. Let message m ∈ ZN and
random r ∈ ZN . Now encryption of message m is c = g m rN (mod N 2 ). Now the
L(cλ (mod N 2 )) m h λ
decryption works as follows. L(g λ (mod N 2 )) (mod N ) = L((g r ) (mod N 2 ))(L(g λ
(mod N 2 )))−1 (mod N ) and due to equation (1.5) we get L(g mλ (mod N 2 ))(L(g λ
(mod N 2 )))−1 (mod N ). Now, applying the L-function will result in the original
plain text (λm)(λ)−1 = m.
Subgroup Variant
The subgroup variant is slightly different as it computes residues in a subgroup
of λ(n).
17
3.2. THE PAILLIER CRYPTOSYSTEM
Key Generation Let N be a RSA modulus N = pq, where p and q are large prime integers.
Let λ = lcm(p − 1, q − 1) and choose α such that it divides λ. Let h ∈ Z∗N 2
such that it has maximal order of N λ, and g ≡ hλ/α mod N 2 . The public
key is (g, N ), and the private key is α
The Paillier subgroup variant : Let message m ∈ ZN and random nonce r <
Z∗N ,and the encryption c ≡ g m+r·N ≡ g m (g N )r (mod N 2 ). Because we are work-
ing in the subgroup variant g ≡ hλ/α (mod N 2 ) we write c as (hλ/α )m (h(λ α)N )r
(mod N 2 ).
In the decryption criphertext c is raised to α and due to equation (1.5) this
becomes ((h(λ/α)m )α (h(λ/α)N )rα ) ≡ hλm · hλrN ≡ hλm · 1 (mod N 2 ). Which results
λm (mod N 2 ))
in L(h
L(hλ (mod N 2 ))
≡ mλλ−1 ≡ m (mod N ), as seen before in the Paillier’s main
variant.
∀m1 , m2 ∈ ZN and k ∈ N
These homomorphic properties makes that the Paillier scheme only secure
against Chosen Ciphertext Attack (CCA1), as the homomorphic properties allow
computation on ciphertexts without knowing the context. This implies that the
Paillier cryptosystem is malleable. In order the make Paillier cryptosystem with
18
3.3. OPTIMIZATION
3.3 Optimization
3.3.1 The CRT method
In the RSA cryptosystem, CRT can be applied to speed up the decryption process.
This can also be applied to the Paillier cryptosystem as suggested in [60]. Using
factors p and q we can define the following functions:
x−1 x−1
Lp = and Lq =
p q
Decryption can be made faster by separately computing the message modulo
p2 and modulo q 2 and the recombining modular remainders afterwards using the
CRT Theorem 1.2.1.
with hp and hq :
hp = Lp (g p−1 (mod p2 ))
hq = Lq (g q−1 (mod q 2 ))
19
3.4. SECURITY
also allows λ to be equal to φ(N ), and u in equation (3.4) can be set to φ(N )−1
(mod N ).
So encrypting becomes basically just one exponentiation and a multiplication
modulo N 2 . For decrypting the only requirement is evaluation of two exponenti-
ations modulo p and q. Since all the values in the exponentiations are fixed we
can make use of addition chains. The problem of computing the optimal addition
chain that it is a N P hard problem. An extensive survey is done by Bernstein in
[13].
3.4 Security
In order for an adversary to break the Paillier cryptosystem, i.e. to know the
factorization of N = pq such that the secret key λ(p − 1, q − 1) can be calculated,
an adversary has to invest both time and money to buy computers or chips
to compute factorization of N within a reasonable amount of time. In order
to prevent this factorization by the adversary we need to choose N as large as
possible so that it takes an infeasible amount of time. As the size of N affects the
speed of the encryption and decryption of the Paillier function it is necessary to
choose N big enough so that it will not be factored by the adversary and small
enough so that the cryptosystem is usable.
20
3.4. SECURITY
In addition if large quantum computers are built then they will break the
Paillier cryptographic scheme and all other cryptographic system that rely on
the problem of factorization. See [72] for details of Shor’s quantum factorization
algorithm.
21
Chapter 4
DGK-Crypto System
4.1 Introduction
In past decades there have been quite some new homomorphic cryptographic
schemes with multiplicative homomorphic properties [68, 33] and additive ho-
momorphic properties [50, 36, 9, 60]. It is often suggested that choosing large
subgroups of ZN increases security. But when reducing the order of the subgroups
it is possible to obtain similar but more efficient schemes while the underlying
assumption is the strong RSA assumption.
For example, the Paillier schemes have message space M = ZN and work
optimally when a message m is of the same size as the random number r. If the
message m is much smaller than r, say only a few bits, then computation of the
exponentiation becomes a lot of overhead. The smaller message space is often used
in secure multi-party computation(SMP). One way of reducing pre-computation is
to scan the exponent and determine which values are needed to pre-computation
of the auxiliary table. Another problem with the Paillier scheme is that it uses
N 2 as modulus when encrypting. In 1978 Rivest, Shamir, and Adelman showed
a cryptosystem that exploited the order of composite modulus N = pq, i.e. such
that the message m is raised by kφ(N ) + 1 to get back to its original form. In
2005, Groth presented a cryptographic scheme[38] that exploits hidden subgroup
of Z∗N . The system has two base elements with special order. The first element
has a multiple of the order of the second element as its order. When raising the
multiplication of the two elements to the order of the second element, only the
first element remains and the message can be extracted. The message space M
of the Groth cryptosystem is the size of the subgroup.
In [25] Damgrad, Geisler and Kroigard presented their cryptosystem, named
DGK and is heavily based on Groth hidden subgroup scheme. The DGK cryp-
tosystem is in its original form described in [25] and the correction [26] rely
on an auxiliary table for decryption of the ciphertext. In this thesis we present
an implementation of the DGK system that uses this auxiliary table as well as a
23
4.2. DGK CRYPTOSYSTEM
decryptable version of the corrected DGK system with is suggested in [27]. This
chapter describes the DGK homomorphic cryptosystem.
The DGK cryptosystem actually uses two subgroups, where one subgroup is
contained in the other. The consequence of working in a subgroup of φ(N ) is
that the message space M is also smaller (e.g. 16-bits suggested in the original
paper [25]). The other subgroup should contain the message space and have
group order of around 160 bits for key of length 1024-bits
The DGK cryptosystem has two variants. The first one we call the Small
Message Space (SMS) variant. The second is the proper decryption variant.
Security Parameters
The DGK needs three parameters k, t and l with k > t > l in order to generate
keys. The first parameter is k which is the size in bits of the RSA modulus N ,
the second parameter t is the size of the two small primes vp and vq . The last
parameter l is used for the message space size in bits.
24
4.2. DGK CRYPTOSYSTEM
Key-generation Construct two t-bit primes vp and vq , and two distinct primes p and q of
equal bit length such that vp | p − 1 and vq | q − 1. Then choose an l-bit
prime u and an element g ∈ Z∗N with order uvp vq and choose h to have
order vp vq . The public key is (N, g, h, u) and the private key is (p, q, vp , vq ).
In addition an auxiliary table is generated of tuples (g vp vq )i for 0 ≤ i ≤ u
and i itself are stored.
Encryption The encryption is as follows given message m and a randomly chosen nonce
r is chosen, the ciphertext is c = g m hr (mod N ).
Now we show why this system works. Let message m < u and random nonce
r ∈ ZN , then the encryption c = g m hr (mod N ). The decryption of c is cvp vq ≡
(g m hr )vp vq ≡ (g m )vp vq (hr )vp vq (mod N ). Due the fact that h has order vp vq this
becomes 1. This leaves (g vp vq )m (mod N ), as our auxiliary table holds tuples of
{(g vp vq )i , i)} for 0 ≤ i ≤ u we can easily find the corresponding message of g vp vq m .
The small message space variant of the DGK system can be turned in to a full or
proper decryption variant as opposed to using an auxiliary table. The auxiliary
table basically limits the message space as we have to store every individual
ciphertext and it corresponding message. Today’s disk space will not be sufficient
for storing the auxiliary table if the message space is chosen to be large. Hence,
in order to use the full message space there has to be a decryption algorithm
does not use an auxiliary table and this can be achieved by carefully selecting the
parameters in the system.
25
4.3. SECURITY AND KEY-SIZES
Key-Generation Two primes p and q which are generated of the form 2uvp r = p − 1 and
2uvq r = q − 1, with vp and vq prime and u = 2l , with r being the size in
bits of the message space, and l a random number. We set N = pq, and
an element g is generated such that it has order 2l . Finally, element is h
chosen such that the order is vp vq .
∀m1 , m2 and k ∈ ZN
26
4.3. SECURITY AND KEY-SIZES
This means that there is no polynomial time algorithm from recovering secret
key. The security proof is given in [25]. In addition the DGK cryptosystem is
malleable.
For an attacker to recover secret keys vp vq such that hey can compute (g vp vq )
to recover the plaintext. He first has to be factor N in p and q in order to find
vp and vq by means of factoring p − 1 and q − 1. The second factorization (i.e.
factoring p − 1 and q − 1 ) will cost less than factoring N . We recommend the
same key-sizes for N as the Paillier scheme (see Section 3.4.1 for details).
27
Chapter 5
Simultaneous Multi-Exponentiation
5.1 Introduction
At the core of different public key cryptographic systems such as Paillier, DGK,
and other cryptosystems lies a multi-exponentiation for some commutative group
G, i.e. evaluating a product of exponentiations:
Y
giei ,
1≤i≤k
5.2 Pre-computation
In order to speed up computation some algorithms make use of an auxiliary
table. An auxiliary table contains limited number of values to speed up com-
29
5.3. SIMULTANEOUS MULTI EXPONENTIATION FOR NON-FIXED
BASES
30
5.3. SIMULTANEOUS MULTI EXPONENTIATION FOR NON-FIXED
BASES
If we compute the simultaneous multi exponentiation g1e1 g2e2 in naive way, i.e.
computing g1e2 and g2e2 separately and the multiply the result we end up doing
2(b − 1) squarings and at most 2(b − 1) multiplications. Rewriting the equation
allows us the save b − 1 Squaring we comparing the naive way.
The implementation of this algorithm scans the exponent e from starting from
the LSB, looking at the most significant first down to the least significant bit last.
Depending on where a bit is a 0 or a 1, it squares the intermediate product or
its squares the intermediate product and multiplies it by its base, as seen in
Algorithm 1. It is also possible to scan the bits from right-to-left, this requires
additional storage , see [77] for details. In tables and graphs we denote this system
as ”ltr”.
x = 232 1 1 1 0 1 0 0 0
y = 98 0 1 1 0 0 0 1 0
3 7 3 14 6 29 12 58 24 116 49
A a a ·b a ·b a ·b a ·b a ·b a ·b a232 · b98
result 12 270 100 1080 205 770 235 590
Table 5.1: Example 12232 · 35127 (mod 1115) by means of the left-to-right algo-
rithm, which requires 7 squarings and 6 multiplications
31
5.3. SIMULTANEOUS MULTI EXPONENTIATION FOR NON-FIXED
BASES
Computation Efficiency
The simultaneous binary left-to-right exponentiation has a small pre-computation
stage which computes g1 · g2 , i.e 1 multiplication. Such that it if e1,i and e2,i are
both 1 only one multiplication is needed. Let exponents e1 and e2 be uniformly
chosen and let e1 and e2 have both b-bits. Then, the evaluation stage can be
computed in b−1 squarings and 43 b multiplications. In total the algorithm runtime
is on average 1 + 34 b multiplications and b − 1 squarings.
Computational Efficiency
By representing the exponent in a larger bases of w-bits, the number of multi-
plications are reduced to twice per iteration, but has the requirement that an
auxiliary table is constructed which hold 2k entries. The computational costs for
building this table are 2k − 4 as the first two entries from each table are fixed (i.e.
1 and g ). The evaluation costs are b squarings and at most 2d kb e multiplications.
32
5.3. SIMULTANEOUS MULTI EXPONENTIATION FOR NON-FIXED
BASES
x = 232 1 1 1 0 1 0 0 0
y = 98 0 1 1 0 0 0 1 0
3 3 3
A a3 · b (a3 · b)2 · a5 · b4 ((a3 · b)2 · a5 · b4 )2 · 1 · b2
A a3 · b a24 · b8 a232 · b98
result 270 205 590
are built as follows: auxiliary table aux[i][j] = g1i g2j with 0 ≤ i, j < 2k . To find
the entry in the auxiliary table both k-bits are used from each exponent to locate
the entry. Doing this saves a multiplication for every iteration in the main loop of
the evaluation stage. In graphs and tables we denote the algorithm as ”2karym”.
k
1 g1 g12 g13 ... g12 −1
k
g2 g1 g2 g12 g2 g13 g2 ... g12 −1 g2
k
g22 g1 g22 g12 g22 g13 g22 ... g12 −1 g22
k
g23 g1 g23 g12 g23 g13 g23 ... g12 −1 g23
.. .. .. .. .. ..
. . . . . .
k −1 k −1 k −1 k −1 k −1 k −1
g22 g1 g22 g12 g22 g13 g22 . . . g12 g22
Computational Efficiency
The cost of computation for the auxiliary table is (22k ) − 3 multiplications. Three
multiplications are saved as 1, g1 and g2 are already known. The Evaluation stage
takes b squarings and at most d kb e multiplications. This saves d wb e multiplication
in the evaluation stage with respect to simultaneous 2k -ary simultaneous multi-
exponentiation method.
33
5.3. SIMULTANEOUS MULTI EXPONENTIATION FOR NON-FIXED
BASES
x = 232 1 1 1 0 1
y = 98 0 1 1 0 0
7 3 7 3 2 7 3 2 2
A a · b (a · b ) ((a · b ) ) · a
result 12 270 100 1080 205
0 0 0
0 1 0
(((a · b ) ) · a) ((((a · b ) ) · a) ) · b (((((a · b ) ) · a) ) · b)2
7 3 2 2 2 7 3 2 2 2 2 7 3 2 2 2 2
Table 5.2: Example by means of the Simultaneous Sliding Window method with
window width w = 3
Computational Efficiency
The simultaneous sliding window exponentiation algorithm is potentially very
fast. It is very difficult to convey an accurate idea about the complexity of this
algorithm. Worst case the algorithm performs just as well as the simultaneous
2k -ary exponentiation method. The computation of each auxiliary table requires
2w−1 − 1 multiplications and 2 squarings, as we can first compute g 2 ; then itera-
tively compute g 3 = g · g 2 , · · · , g r = g r−2 · g 2 and will require storage for 2w−1 + 1
entries for each base.
34
5.3. SIMULTANEOUS MULTI EXPONENTIATION FOR NON-FIXED
BASES
Figure 5.2: Auxiliary table for simultaneous sliding window matrix exponentia-
tion with w = 3
detect only zeros for one exponent in the evaluation stage then it can be used
as a normal 2k -ary window. In tables and graphs we denote this algorithm as
”sswm”.
Evaluation Stage
The evaluation of two exponents e1 and e2 and their corresponding bases g1 and
g2 are evaluated in a left-to-right manner and let the window width be w. The
loop distinguishes four different possibilities of values in our two windows.
1. Both values in the windows are odd. We first compute the 2w -th power of
the intermediate by the size of the window. Then, the intermediate product
is multiplied by the lookup of the auxiliary tables with two values that are
in our window.
2. One of the values in the window is even. We apply the same technique if
both values are odd.
3. Both values are even and at least one exponent is non zero, the window
size is adjusted by l-bits such that at least one of the windows becomes
odd. The intermediate result is raised by power 2w−l and multiplied by the
lookup of the table of both values of the smaller window.
4. Both windows are zero, then the window expands by r − 1 bits, where r is
the smallest number that at least one of the exponent bits becomes 1. Then
the intermediate result is raised to power of 2w+r−1 .
Example: e1 = 353 = (101100001)2 and e2 = 385 = (110000001)2 and take
window size k = 3. Now going form left to right
101 100001
110 000001
35
5.3. SIMULTANEOUS MULTI EXPONENTIATION FOR NON-FIXED
BASES
we get g15 g26 as 5 is odd we apply rule 2 and do a lookup in our auxiliary table.
In the next window:
we get (100)2 = 4 and (000)2 = 0, because they are both even rule 3 applied
and get (g15 g26 )2 · g1 ) = g111 g212 . The next windows consists both zeros and gets
expanded by rule 4 by 2-bits and the intermediate result is raised by 3 + 2 − 1,
1011 0000 1
1100 0000 1
4
(g111 g212 )2 = g1176 g2192 . Finally the window size can only be one, resulting in
(g1176 g2192 )2 · g1 g2 = g1353 g1385 .
Computational Efficiency
w w
The generation of the auxiliary tables it takes ( 2 2−2 + 2w ) 2 2+2 − 3 multiplica-
w w
tions and ( 2 2−2 + 2w ) 2 2+2 entries. For this algorithm hard to it is pinpoint the
exact operations per bit, but in worst the case it will perform as good as the
simultaneous 2k -ary matrix algorithm.
• if e is even return 0
• otherwise if 0 ≤ e ≤ 2w + r, return e
• otherwise return e − 2w
36
5.4. FIXED-BASE EXPONENTIATION
Computational Efficiency
The storage requirement for the axillary table is 2w+1 + 2 + 2r while the computa-
tion needed is to generate this table is 2w+1 − 2 + 2r and one squaring. Because it
is a variant on the sliding window exponentiation algorithm performance figures
are hard to pinpoint but the worst case should be as hard as 2k -ary algorithm.
37
5.4. FIXED-BASE EXPONENTIATION
has the main advantage that multiplications can be done independently, and so
can better exploit the multi core CPU’s which are present in today’s mainstream
computers. In addition CPU level 2 caches are growing also rapidly to 4, 6 and
sometimes 8 MB. So, using large message spaces, e.g. 4096-bits the auxiliary
table will easily fit in todays level 2 CPU caches. Of course computing large
tables will initially take up quite some computational time. This time can only
be won back if the base is used several times for computation. More detail will
be presented in the implementation of Chapter 6. In graphs and tables we denote
the squares of g algorithm as ”sg”.
Computational Efficiency
The auxiliary tables hold 2l entries and it takes 2l − 4 squares to generate both
tables. The expected computation is l multiplications.
Using the representation we can also represent g e this way. Let g0 = g and
2a ia
gi = gi−1 = g 2 for 0 < i < h then :
Now let ei = ei,a−1 · · · ei,1 ei,0 then the binary representation of ei (0 ≤ i < h),
then ei,j is represented in binary as
So we can write:
v−1 h−1
r−1 Y 2k
2jr ei,jr+k
Y Y
ge = gi
k=0 j=0 i=0
2jr ei,jr+k
Storing values of gi in an auxiliary table results in a drastic speedup.
38
5.4. FIXED-BASE EXPONENTIATION
Auxiliary Table
The auxiliary table is a two dimensional array of size (a×b)×h and is constructed
in the following way :
e e
h−1
G[0][i] = gh−1 h−2
gh−2 · · · g1e1 g0e0
r jr
G[j][i] = (G[j − 1][i])2 = (G[0][i])2
Evaluation Stage
A←1
for k ← r − 1 down to 0
A ← A2
for j ← v − 1 down to 0
A ← A · G[j][ij , k]
return A
b−1
k
Y
ci = (G1 [j][Ij,k ] · G2 [j][Ij , k])2
k=0
Finally, all the ci are multiplied to one result. This way the algorithm can
exploit thread level parallelism. In tables and figures we denote Comb’s exponen-
tiation algorithm as ”cmb”.
39
5.4. FIXED-BASE EXPONENTIATION
Computation Efficiency
The pre-computation required for a product of two exponentiations with same
size exponents is 2(a + b − 2) multiplications and b − 1 squares, the space re-
quired for storing the auxiliary table is 2v(2k − 1). The computation requires vb
multiplications and v squares.
sg M 0.5b M
Table 5.3: Comparison of different exponentiation methods for computing g1e1 g2e2 .
M denotes the cost of a multiplication, S denotes the cost of a squaring.
40
Chapter 6
Speedup, Results and Implementation
This section describes the implementation of the algorithms described in the pre-
vious chapter. In addition we benchmark the encryption and decryption functions
of the Paillier cryptosystem. The systems are tested with different parameters
against different key sizes. Section 6.4 discusses the problem of timing attacks
against these implementations, and show how modifications can avoid these prob-
lems. Finally, we give an overview of the programmer application interface (API)
for the library.
6.1 Implementation
Prime Generations:
The Paillier cryptosystem as well as the DGK cryptosystem heavily rely on the
special properties of primes. Generating large prime numbers can be quite a
challenge the most widely deployed algorithm for prime verification is the Miller-
Rabin primality test [66]. The Miller-Rabin algorithm only gives a probability
that a number is prime, this can have serious drawbacks. If Miller-Rabin gives
high probability that a composite number is prime, the cryptosystems may com-
pletely fail to work or it may be seriously weakened in a way that makes recovering
the secret key easy. Even if the possibility of such a failure is extremely small, it
is still present.
So, to construct proven primes our implementation use Maurer’s prime gen-
eration algorithm [51] which uses the Pocklington primality test [64] to generate
primes. The Pocklington primality test gives a proof that a number is really a
prime in contrast to the Miller-Rabin algorithm which only gives probability of
1 − (1/4)q with q being the number of ”witnesses” or bases tested.
41
6.1. IMPLEMENTATION
Algorithm 512 mean 512 std. dev. 1024 mean 1024 std. dev.
Miller Rabin 5 325645058 2847486 553169659 69327463
Miller Rabin 10 175999995 17857889 946079386 54715946
Maurer 1770051847 756462737 1147004185 919299253
GDSA 14 89163917 28846268 178923001 41647527
GDSA 12 62921398 18585666 105310173 95310173
To generate primes with special properties we use the GDSA [44] algorithm.
The algorithm takes a prime p0 as input, and generates a prime p such that
p = zp0 + 1. These primes are needed for both the subgroup variant of Paillier
and DGK cryptosystems. The GDSA Algorithm uses Miller-Rabin primality test
to check if (zp0 + 1) is prime. Of course its possible for the GDSA Algorithm to
use the Pocklington primality test.
Tables 6.1 and 6.2 present the timings of: the Miller-Rabin algorithm provided
by GMP’s mpz probab prime p with 5 and 10 witnesses, our implementation of
Maurer algorithm and implementation of GDSA for generating primes of sizes
512, 1024, 1536, and 2048 -bits. The GDSA algorithm takes a prime as input
with size of 14 or 12 of the target size prime.
The timings were taken on a 1.6 GHz AMD Athlon Neo X2 Dual Core Pro-
cessor L335 with 256 KB L2 cache per core and 2GB. The system is running
Ubuntu 10.04 x86 64 with kernel 2.6.32-22-generic. The timings were taken over
1000 times.
In our key-generation process we generate prime numbers with the Maurer
prime generation algorithm, in cases where we need a GDSA prime p we first
generate a Maurer prime p0 which is fed to the GDSA algorithm. The consequence
of using Maurer’s algorithm that it takes significantly longer to generate primes
than when using the Miller-Rabin primality test as seen in Tables 6.1 and 6.2 .
It can be clearly seen that the Miller-Rabin algorithm with 5 and 10 witnesses
42
6.2. BENCHMARKING
takes only a fraction of the CPU-cycles when compared to the Maurer prime
generation algorithm.
6.2 Benchmarking
6.2.1 Encryption
In this section the implementation of the of algorithms in chapter 5 are bench-
marked. As it is infeasible to benchmark every possible combination of key
length of N and possible for exponentiation parameter the chosen key lengths
are: 1024, 2048, 3072, and 4096. These lengths are representative and feasible
for the interval of interesting key lengths, with the exception of 1024 which is
included for comparison, and should not be used in a production setting, due to
security reasons discussed in Sections 3.4.1 and 4.3.
For the algorithms in Section 5.3.2 till 5.3.6 we use parameter w for the fol-
lowing values 2, . . . , 8 as it is obvious that w = 1 is just binary left-to-right
exponentiation method. Furthermore, for the squares of g method we calculate
the full message space, and for the comb method we use parameters {2, 1}, {4, 2}
and {8, 4}.
For the encryption of a message m we distinguish tree different scenarios.
1. The bases g1 and g2 are not fixed. This means these values are only used
once and the cost of generating the auxiliary tables are added to the total
cost of exponentiation. This is used if keys are used only once.
2. The base values g1 and g2 are fixed, i.e. used multiple times. This allows
us to pre-compute auxiliary tables once and can use them multiple times to
encrypt different messages. The cost of the auxiliary tables are not added
to the total cost of computation.
3. One of the two bases if fixed the other one is not fixed. Such behaviour we
see in Paillier’s main encryption scheme. The cost of generating the auxil-
iary table of the non-fixed base is added to total time of the computation
while the cost generating the auxiliary table of the fixed bases is not.
The benchmarks showed in the next sections were performed by using an AMD
Athlon Neo X2 Dual Core Processor L335 with 512 KB L2 cache 2 Gb Ram. The
system is running Ubuntu 10.04 x86 64 with kernel 2.6.32-22-generic in 64-bit
mode. For benchmarking we used cpucycles with is part of part of eBACS [10] to
measure the amount of CPU cycles used by the execution. The implementation
is written in C using the LIBGMP [37] for multi precision arithmetic.
43
6.2. BENCHMARKING
For non-fixed bases we use the following exponentiation algorithm binary left-
to-right, simultaneous 2k -ary exponentiation, simultaneous sliding-window and
unsigned fractional windows (ufw) and all the matrix variants. To express the
window size (with exception of binary left-to-right algorithm) all algorithms have
a number attached to the name to mark their window size (e.g ssw4 is simulta-
neous sliding window with a window size of 4-bits). The parameters used in the
graph are as follows:
In Figure 6.1 we see the total cost of running time which includes the cost
of generation the auxiliary table. It can be clearly seen the unsigned fractional
windows matrix version has the best performance using parameters w = 3 and
m = 1 for key-length 2048 to 4096, while the best performing algorithm for key-
length 1024 is the unsigned fractional window matrix variant with parameter
w = 2, m = 1. After that we see the same algorithm with parameters w = 3
and m = 3. Followed by the simultaneous sliding window exponentiation matrix
variant with parameter w = 3 which performs second best on the key-length
4096. Clearly the matrix variants of the exponentiation algorithms have a clear
advantage over the other approaches.
It can also be clearly seen that the generation of the auxiliary table of the 2k
ary-matrix with window width 8 takes up most time. This is due to the large
number of elements that have to be generated, which are almost 29 multiplications
for parameter 8. This problem is seen for all matrix variants of the algorithms.
Then we see the comb method with parameters h = 8 and v = 4 being slowest,
this is also due to the nature of how the auxiliary table is built.
44
6.2. BENCHMARKING
cmb21
cmb42
dsw2
dsw3
dsw4
dsw5
dsw4
dsw7
kary2
kary3
kary4
kary5
kary6
kary7
kary8
karym2
karym3
karym4
karym5
karym6
229 karym7
rtl
sg1
sswm3
ufrac21
ufrac31
ufrac33
ufrac43
sg2
CPU Cycles
228
27
2
26
2
225
223
1024 2048 3072 4096
Keysize of N in bits
Figure 6.1: The total time of CPU cycles for generating auxiliary tables and
running time for different algorithms.
45
6.2. BENCHMARKING
cmb21
cmb42
cmb82
dsw2
dsw3
dsw4
dsw5
dsw4
dsw7
kary2
kary3
kary4
kary5
kary6
kary7
kary8
karym2
karym3
karym4
karym5
karym6
karym7
karym8
sswm3
sswm4
sswm5
sswm6
sswm7
ufracm31
ufracm33
ufracm41
ufracm43
227 ufracm45
ufracm51
CPU Cycles
ufracm53
ufracm55
ufracm61
ufracm63
ufracm65
ufracm67
ufracm71
ufracm73
ufracm75
ufracm77
rtl
sg1
sg2
226
225
223
Figure 6.2: The execution time in CPU cycles with different simultaneous algo-
rithms with different keys sizes of N , with e1 and e2 with equal length (i.e. 21 size
of N )
46
6.2. BENCHMARKING
For benchmarking the combination of fixed and non-fixed bases, we pick the
fastest algorithm from our fixed bases performance which is combs exponentia-
tion method and combine it with the fastest non-fixed base algorithm, unsigned
fractional windows. When combine both algorithms we can save v squarings of
the total computation, but gain an additional b + a − 2 multiplications. When
comparing this method against unsigned fractional windows matrix exponentia-
tion the trade off is pre-computation versus running time multiplications.
To reducing the number of multiplications for the combination of comb and
unsigned fractional windows we select larger parameters for combs method as a
function. We choose it in such a way that it will have less multiplications in total
compared to the unsigned fractional windows matrix method. ‘
In Figure 6.3 it can be seen that combination of comb exponentiation method
with parameters {8, 4} and unsigned fractional windows {3, 3} slower as apposed
to unsigned fractional window matrix method with parameters {3, 3} this is due
to the extra computation of combs method. Increasing the parameters of the
comb exponentiation method to {16, 8} or {32, 16} will reduce the number of
overall multiplications.
In addition we show computation of the naive exponentiation method, i.e.
computing two exponentiation separately and multiply the result. With one
exponentiation having a fixed base. Clearly it can be seen that it is much slower.
47
6.2. BENCHMARKING
cmb3216-ufrac33
cmb84-ufrac33
cmb168-ufrac33
naive33
ufracm33
226 41
CPU Cycles
Time in ms
225 20
223 5
48
6.3. IMPLEMENTATION DETAILS
• Branch avoidance: avoid jumps in code, which may cause wrong speculative
execution which results in loss of potential CPU execution cycles.
• Continuous memory: Allocate memory such that all values are next to each
other. This gives a high probability that the data is in cache compared to
when values are placed in a linked list.
• Function call avoidance: function calls cause overhead, and potential loss of
CPU cycles. In the code we try to have all computation in a single function
and make use of macros as substitution for functions.
Additional speed is achieved due to the fact that for some exponentiation
methods we could use multiple threads, i.e. by using CPU full ability to compute.
Of course the creation of threads will introduce additional overhead, but the
computational gain is much higher, this can clearly be seen in Figure 6.2 for
function sg1 and sg2. Where sg1 is running a single thread and sg2 is running
two threads to compute a product of exponentiations.
Furthermore speed is gained by using the modulo instruction between multiply
and squaring instructions. This reduces the size of intermediate value, and cuts
cost on the next instruction.
49
6.4. SIDE CHANNEL ATTACK PREVENTION
The field of side channel attacks is relatively new and there is no complete
theory of side channel analysis. But, there are many studies conducted that break
different cryptosystems [31, 58, 63, 12] using side-channel analysis, also many
remedies have been suggested for different cryptosystems such as [14, 20]. In this
section we present countermeasures against simple SCA. As this thesis focuses on
software implementations we take only software side channels in to account and
we discard the possibility of an invasive attack such as power or electromagnetic
analysis (EMA) attacks.
Simple SCA
Simple SCA (SSCA) is if the adversary obtains information from a single expo-
nentiation. To harden our exponentiation methods against SSCA we must make
the observable information independent of the of the secret, such as messages,
nonces, and the private keys in a way that the adversary only sees a fixed se-
quence of operations that cannot be linked to the bits of the processed secrets.
Looking back at Chapter 5 we can distinguish two key problems that reveal
SSCA information to the adversary:
• Table lookups: The time can be different for a table to fetch from main
memory into the CPU’s cache memory can be observed.
50
6.4. SIDE CHANNEL ATTACK PREVENTION
the same as selecting the value depending on ei , and pseudo code can be seen in
Figure 6.7. The index i of the auxiliary table is selected by a logical AND with ei ,
if the table i does not match the ei then (i − ei ) does does not becomes zero and
logical NOT becomes zero and it multiplied by the table entry. If the i matches
ei then the result of the logical NOT becomes one, and the table entry is stored
in E. The entry E can then be used in a safe and secure way by the algorithm.
f o r ( f o r i i n b−1 down t o 0 )
{
i f ( e i == 1 )
{
A = SQR(A)
A = MULT(A, g )
} else {
A = SQR(A)
}
}
Figure 6.5: The core of the binary left-to-right exponentiation method with is
prone to SSCA.
In Figure 6.8 we show the impact of both the dummy operations, and the
table lookup for binary left-to-right, 2k -ary (matrix) and the comb method. The
exponentiation algorithms with protection against SSCA are denoted with asc
(anti side channel), example: cmb42asc is combs exponentiation algorithm with
51
6.5. API DESCRIPTION
parameters 4.2. We have chosen to display only the results of the fixed base
exponentiation as the generation the auxiliary table as it is most clear this way.
It can be seen that these countermeasures have an impact on the performance.
The comb exponentiation method is, while still being the fastest method, almost
half the time extra at key length of 3072-bits.
In addition the figure shows a ”v”-shape for the 2k -ary matrix method with
the turning point being k between 4 and 5. This is due to the large number of
entries in the auxiliary table.
genkey() This function generates all parameters necessary, such as primes and special
primes and returns a public and private key pair that are needed for the
cryptosystems.
keyinit() This function takes a public key as parameter, depending on which variant
of the cryptosystems is used the, it will build an auxiliary table for all
fixed-bases using appropriate parameters for the key size.
encrypt() This function takes two parameters m and a public key. The function
generates a random number r which will be used to encrypt m using simul-
taneous multi exponentiation with predefined parameters for the encryption
depending on the size of the key and on whether there is a non-fixed base.
52
6.6. COMPARISON
encrypt unsafe() This function is basically the same as encrypt(), except this function does
not provide counter measures against SSCA and will be faster compared to
encrypt().
decrypt() The decrypt function takes two parameters ciphertext c and a private key,
and will decrypt c returning message m. Depending on the key size and
variant of the cryptosystem the function has set the optimal parameters for
decryption. In addition the function has countermeasures against SSCA.
decrypt unsafe() This function provides functionality as decrypt() with the exception of
having no countermeasures against SSCA.
6.6 Comparison
When comparing our implementation of simultaneous multi-exponentiation, e.g.
g1e1 g2e2 (mod N ), to other implementations such as Horn’s work [39] we see quite
some differences. In [39] we see two basic approaches 1) computing the two
exponentiations g1e1 and g2e2 in the naive way, i.e. computing both exponentiations
separately an then multiply the results and 2) pre-compute squares of g, also
known as aggressive caching, seen Section performs quite well for fixed base.
But the strategy fails horribly when computing fixed-non-fixed exponentiations,
as pre-computation for the non-fixed has to happen every time. Combing the
two approaches to compute fixed-non-fixed bases is an option. But as they are
computed individually will be slower as our implementation where we combine
the comb method and ufrac-method in such a way that that we save squarings,
as seen in Section 6.2.1. For comparison reasons we have included the naive
exponentiation of unsigned fractional windows in Figure 6.3
When comparing our Paillier implementation to the Paillier implementation
of Virtual Ideal Functionality Framework (VIFF) software package for secure
multi-party computation version 0.7.1 (viff-51167e387cc3), in which the Paillier
cryptosystem is implemented in Python language using the gmpy extension for
GMP library we see a huge difference in performance compared to our work. See
Table 6.3.
53
6.6. COMPARISON
Table 6.3: Comparison between VIFF Paillier implementation and our Paillier
implementation
ltr
kary2
kary3
kary4
kary5
kary6
kary7
229 ssw3 335
ssw4
ufrac21
ufrac31
ufrac33
ufrac35
ufrac41
ufrac43
ufrac45
CPU Cycles
Time in ms
228 167
27
2 83
226 41
225 20
223 5
10
2
1024 2048 3072 4096
Keysize in bits
54
6.6. COMPARISON
karm2asc
karm3asc
karm4asc
karm5asc
karm6asc
karm7asc
karm8asc
cmb42asc
cmb84asc
kary7asc
kary8asc
karm2
229 karm3
karm7
karm8
cmb42
cmb84
kary7
kary8
CPU Cycles
28
2
27
2
26
2
225
23
2
1024 2048 3072 4096
Keysize of N in bits
Figure 6.8: Execution time of different exponentiation methods with and without
protection against SSCA.
55
Chapter 7
Secure Multi-Party Computation
7.1 Introduction
Secure multi-party computation is a cryptographic technique allowing n parties
to jointly compute the result of a function f (x1 , x2 , ..., xn ) while ensuring that
the input xi of each party Pi is kept private, even with a number t of the parties
cheating. The only information that is allowed to be revealed is the result of the
function.
A classic example of such an application is the millionaire’s problem [78]:
A group of n millionaires wish to figure out who is the richest, but no single
millionaire wants to disclose the magnitude of his fortune in fear of humiliation.
Finding out who is the richest would normally not be possible without a trusted
party, but using secure multi-party computation the millionaires can safely com-
pute the result without a trusted party. This is a toy problem, but the technique
enables several important applications such as e-voting, secure auctions, secure
online gaming, and secure data mining.
In the 80s it was proved that secure multi-party computation could in fact be
applied to any computable function, making it an extremely general and useful
technique, at least in theory. This was first done by Yao [78] in the restricted
case of two parties, but soon followed by similar results for the general case of n
parties [5, 21]. These results were, however, mostly of theoretical interest due to
the complexity of the protocols.
Since then a large body of results have been obtained using different security-
and adversary- models, underlying network assumptions, and improvements of
previously known results.
In the recent years, the theory has advanced enough to allow practical im-
plementations of secure multi-party computation. Examples of practical systems
which support evaluation of general multi-party computation are the FairPlay
[49], VIFF [1], ShareMind [15], and SIMAP [16] systems. However, many
applications are still infeasible in practice, especially those that rely on quick
57
7.2. VERIFIABLE SECRET SHARING
response times like online auctions. Also, in order to be practical, the aforemen-
tioned systems tend to either be restricted to a limited number of parties or to
loosen up the security model. Some examples of the latter could be assuming
that the corrupted parties do not deviate from the protocol (the passive security
model) or that at most a certain threshold t of parties gets corrupted (threshold
security model).
This chapter explains in detail the Secure Multiparty Computation protocol
presented by Orlandi. In addition we show how to modify the protocol in order
to gain computational speedup. Also, we describe our implementation for the
Orlandi protocol. Finally we discus the benchmarks of these modifications. This
part is joint work with Thomas P. Jakobsen and Janus Dam Nielsen and has
appeared at Applied Cryptography and Network Security 2010, Beijing, China.
Sharing phase In this phase each player generates some special values and a some sort of
commitment for the values and distributes them over all players.
Reconstruction phase After receiving modified shares form co-players, the player can check with
a certain probability if the share is mangled with and discard if necessary.
58
7.3. ORLANDI PROTOCOL
59
7.3. ORLANDI PROTOCOL
Basic Multiplication We define the multiplication of the shares [x] and [y] as
[z] = Mul([x], [y], [a], [b], [c]) where we assume that the parties are given
a random triple ([a], [b], [c]) s.t. c = a · b from an honest dealer. The
multiplication is realized as follows:
60
7.4. ORLANDI IMPLEMENTATION
TripleGen() generates a triple by having each party first choose random shares
[a] and [b] including the needed randomness and the commitments. Second, each
party encrypts (Enceki (ai )) his share ai using his public key eki and a homomor-
phic cryptosystem. Then he broadcasts the encrypted share, the corresponding
commitment, and the commitment for bj . The share of the product [c] = [a] · [b]
is computed by using the homomorphic property of the received encrypted values
to multiply the shares [ai ] and [bj ]. The product is then masked with some ran-
domness di,j and sent. ci is then computed by decrypting Decski (γi,j ) the product
shares, adding them up and subtracting the randomness. ski is the private key of
party i. The computation inside encrypted values gives rise to the requirement
that the modulus of the cryptosystem which must be much larger than the mod-
ulus of the shares and the commitment scheme p. This is not an issue in practice
because the key size of a factorization based cryptosystem is usually much bigger
than the order of the group of points on an elliptic curve, if the same level of
security is to be obtained.
61
7.5. CONCLUSION AND FUTURE WORK
as seen in chapter 6. The implementation of the python wrapper lets us use fast
and efficient C code which performs better than python as seen later on. The
python wrapper for this optimization is a specially tailored one, but uses the
same underlying exponentiation methods, comb and unsigned fractional window
exponentiation methods, for calculating the result.
Additional speedup can be achieved by using
P homomorphic properties
P of Pail-
lier cryptosystem
Q in changing
P step 3a) ci = j Decski (γi,j ) − j di,j mod p to
ci = Decski ( j γi,j ) − j di,j mod p. This minor modification results in only one
exponentiation in total instead of one per party. This optimization is included in
the specially tailored wrapper for the Orlandi protocol.
7.4.2 Benchmarks
Figure 6.3 gives the average execution time of triple generation for two, three,
and nine parties. We have benchmarked different revisions of the VIFF Orlandi
implementation corresponding to the various optimizations we have performed.
Revision 1231 is the initial unoptimized implementation of the Orlandi protocol
which uses the python implementation of Paillier, revision 1355 uses an in-lined
step 1, 2a, and 2b of TripleGen(), revision 1370 uses our first C implemented
version of Paillier, and enormous improvements can be seen compared to the
previous version. Also it might be noted that two players are slower than tree
players. This is due to the implementation design. Revision 1393 uses our first
implementation of step 2c as described in TripleGen(). In revision 1393 our
specially written wrapper for step 3a is used. It makes use of the homomorphic
properties such that we only have to decrypt once. In addition we see a noticeable
performance gain. In version 1440 the fixed non-fixed bases is used which uses the
combination of comb and unsigned fractional window exponentiation methods.
For up to 6 players this gives a performance gain (see table 7.2, with more players
this becomes slower, the reason for this is not the module itself, but due to other
modification within the Orlandi python code.
It can be clearly seen that if the number of parties increases the performance
gain becomes higher, up almost 40 times for 9 players compared to the initial
implementation. For more details on implementation of the whole protocol we
like to refer the reader to [40, 41]
62
7.5. CONCLUSION AND FUTURE WORK
implemented and tested. Our implementation provides an easy API for key gener-
ation, encryption and decryption for these cryptosystems. In addition it provides
multiple prime generation methods to construct (special) primes. Also, one can
access the different (simultaneous multi) exponentiation algorithms is provided
thought the API.
Our implementation uses the GNU gmp library for multi precision arithmetic,
it would be interesting, for future work, to use another multi precision arithmetic
library specially targeted at fields and modular operations, and make use of ad-
vanced compilers such as qhasm [11].
Also, an interesting direction of future work would be an implementation of a
homomorphic encryption scheme which is protected against quantum computers
based on lattices [34, 74]
parties rev. num. 1231 1355 1370 1393 1399 1400 1440
2 time 3519.6 3519.6 894.6 243.8 226.5 224.2 201.5
2 stdvar 1.0 0.8 3.2 0.9 0.7 0.7 1.2
3 time 3972.7 4012.1 376.3 155.0 168.3 170.9 135.0
3 stdvar 94.8 157.4 72.1 59.2 35.9 38.2 49.9
9 time 8937.4 8849.7 846.9 237.0 188.9 188.4 224.9
9 stdvar 460.2 281.2 27.0 36.5 20.7 29.0 29
Table 7.1: The average execution time in ms. of triple generation as a function
of number of parties.
63
7.5. CONCLUSION AND FUTURE WORK
Table 7.2: The average execution time in ms. of triple generation for latest
revision 1440.
64
Appendix A
Appendix
65
A.1. UNBALANCED BENCHMARKS FIXED
cmb21
cmb42
cmb82
dsw2
dsw3
dsw4
dsw5
dsw4
dsw7
kary2
kary3
kary4
kary5
kary6
kary7
kary8
karym2
karym3
karym4
karym5
karym6
karym7
karym8
sswm3
sswm4
sswm5
sswm6
sswm7
ufracm31
ufracm33
ufracm41
ufracm43
227 ufracm45
ufracm51
CPU Cycles
ufracm53
ufracm55
ufracm61
ufracm63
ufracm65
ufracm67
ufracm71
ufracm73
ufracm75
ufracm77
rtl
sg1
sg2
226
225
223
Figure A.1: Unbalanced Encryption with bit length of e2 half the size of e1
66
A.1. UNBALANCED BENCHMARKS FIXED
cmb21
cmb42
cmb82
dsw2
dsw3
dsw4
dsw5
dsw4
dsw7
kary2
kary3
kary4
kary5
kary6
kary7
kary8
karym2
karym3
karym4
karym5
karym6
karym7
karym8
sswm3
sswm4
sswm5
sswm6
sswm7
ufracm31
ufracm33
ufracm41
ufracm43
227 ufracm45
ufracm51
CPU Cycles
ufracm53
ufracm55
ufracm61
ufracm63
ufracm65
ufracm67
ufracm71
ufracm73
ufracm75
ufracm77
rtl
sg1
sg2
226
225
223
Figure A.2: Unbalanced Encryption with bit length of e2 one fourth the size of
e1
67
Bibliography
69
Bibliography
70
Bibliography
71
Bibliography
[36] S. Goldwasser and S. Micali. Probabilistic encryption and how to play mental
poker keeping secret all partial information. In STOC, pages 365–377. ACM,
1982.
[37] T. Granlund. GNU MP. The GNU Multiple Precision Arithmetic Library,
1996.
[39] M. Horn. Design and implementation of an interface for cyclic finite groups
in Java. Technical note, April 2003.
[42] W.S. Jevons. The principles of science: a treatise on logic and scientific
method. Classworks, 1877.
72
Bibliography
[47] C.H. Lim and P.J. Lee. More flexible exponentiation with precomputation.
Lecture Notes in Computer Science, 839:95–107, 1994.
[48] M.X. Makkes. Paillier and DGK libraries including python modules. http:
//www.kr85.org/.
[51] U.M. Maurer. Fast generation of prime numbers and secure public-key cryp-
tographic parameters. Journal of Cryptology, 8(3):123–155, 1995.
[54] B. Möller. Improved techniques for fast exponentiation. In P.J. Lee and C.H.
Lim, editors, ICISC, volume 2587 of Lecture Notes in Computer Science,
pages 298–312. Springer, 2002.
73
Bibliography
[58] L.D. Olson. Side-channel attacks in ECC: A general technique for varying
the parametrization of the elliptic curve. In M. Joye and J. J. Quisquater,
editors, CHES, volume 3156 of Lecture Notes in Computer Science, pages
220–229. Springer, 2004.
[63] C. Percival. Cache missing for fun and profit. BSDCan 2005, 2005.
[66] M.O. Rabin. Probabilistic algorithm for testing primality. Journal of Number
Theory, 12(1):128–138, 1980.
[68] R.L. Rivest, A. Shamir, and L. Adleman. On Digital Signatures and Public-
Key Cryptosystems. Laboratory for Computer Science, Massachusetts Insti-
tute of Technology, 1977.
74
Bibliography
[71] A. Shamir and E. Tromer. On the cost of factoring RSA-1024. RSA Cryp-
toBytes, 6(2):10–19, 2003.
[72] P.W. Shor. Polynomial-time algorithms for prime factorization and discrete
logarithms on a quantum computer. SIAM Review, pages 303–332, 1999.
[74] N.P. Smart and F. Vercauteren. Fully homomorphic encryption with rela-
tively small key and ciphertext sizes. In P.Q. Nguyen and D. Pointcheval,
editors, Public Key Cryptography, volume 6056 of Lecture Notes in Computer
Science, pages 420–443. Springer, 2010.
[76] E.G. Straus. Addition chains of vectors (problem 5125). American Mathe-
matical Monthly, 70(806-808):16, 1964.
[78] Andrew Chi-Chih Yao. How to generate and exchange secrets. In FOCS,
pages 162–167. IEEE, 1986.
[79] Y.S. Yeh, T.Y. Huang, H.Y. Lin, and Y.H. Chang. A study on parallel RSA
factorization. Journal of computers, 4(2):112–118, 2009.
[80] S.M. Yen, C.S. Laih, and A.K. Lenstra. Multi-exponentiation (cryptographic
protocols). IEE Proceedings-Computers and Digital Techniques, 141:325,
1994.
75
Index
77
Abstract
79