Power Analysis Based Side Channel Attack
Power Analysis Based Side Channel Attack
Hasindu Gamaarachchi
Harsha Ganegoda
http://www.ce.pdn.ac.lk
Department of Computer Engineering
Faculty of Engineering
University of Peradeniya
Peradeniya 20200
Sri Lanka.
Abstract
Side channel attacks break the secret key of a cryptosystem using channels such as sound,
heat, time and power consumption which are originally not intended to leak such information.
Power analysis is a branch of side channel attacks where power consumption data is used as
the side channel to attack the system. First using a device like an oscilloscope power traces are
collected when the cryptographic device is doing the cryptographic operation. Then those traces
are statistically analysesd using methods such as Correlation Power Analysis (CPA) to derive
the secret key of the system. Being possible to break Advanced Encryption Standard (AES)
in few minutes, power analysis attacks have become a serious security issue for cryptographic
devices such as smart card.
As the first phase of our project, we build a testbed for doing research on power analysis
attacks. Since power analysis is a practical type of attack in order to do any research, a testbed
is the first requirement. Our testbed includes a PIC microcontroller based cryptographic device,
power measuring circuits and a digital oscilloscope in the hardware side. In software side it
includes algorithms running on the microcontroller, oscilloscope automation scripts and analy-
sis programs. We verify the functionality of the testbed by attacking AES in time less than 10
minutes. Since building a test bed is a complicated process, having a pre-built testbed would
save the time of future researchers.
The second phase of our project is to attack the latest cryptographic algorithm called Speck
which has been released by National Security Agency (NSA) for use in embedded systems. So
far, Speck has not been attacked using power analysis. In spite it has lot of differences to AES
making impossible to directly use the power analysis approach used for AES, we introduce
novel approaches to break Speck in less than an hour. Therefore, We practically show that even
though the algorithm is very new still it is vulnerable to power analysis.
The third phase of the project is to work on countermeasures. After getting familiar with
the current state of art, we select few already introduced countermeasures and practically attack
them on our testbed to do a comparative analysis. Meanwhile, we try to form our own counter-
measures and to improve existing countermeasures. Under circuit based countermeasures, the
existing idea of implementing power line filters is practically implemented and tested. We show
that it is not safe enough. We try few of our own circuit based ideas as well, to evaluate how
good they are as countermeasures. But unfortunately none of them are good enough. Under
software based countermeasures existing methods called random instruction injection and ran-
domly shuffling Sboxes are implemented and tested. We show that those countermeasures are
good enough for their simplicity and cost. But we identify the possible threat due to the prob-
lem of generating a good seed for the pseudo random algorithm running on the microcontroller.
We address this issue by using a hardware based true random generator that amplifies a random
electrical signal and samples to generate a proper seed.
Contents
1 Introduction 1
1.1 Cryptanalysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Side channel attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.3 Power analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.4 Power consumption of CMOS circuits . . . . . . . . . . . . . . . . . . . . . . 2
1.5 Power model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.6 Correlation Power Analysis (CPA) . . . . . . . . . . . . . . . . . . . . . . . . 4
1.7 Countermeasures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.8 The project and its objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2 Related work 9
2.1 Power analysis and Testbeds . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 Countermeasures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3 The testbed 13
3.1 Cryptographic device . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.1.1 AES on PIC microcontroller . . . . . . . . . . . . . . . . . . . . . . . 13
3.1.2 USB interfacing of the PIC and encountered problems . . . . . . . . . 14
3.2 Power measurement circuit . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.2.1 Ground resistor method to measure power . . . . . . . . . . . . . . . . 18
3.2.2 VDD resistor method to measure power . . . . . . . . . . . . . . . . . 20
3.2.3 The trigger for the oscilloscope . . . . . . . . . . . . . . . . . . . . . 21
3.3 Construction of the PCB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.4 Power capturing automation . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
ii
CONTENTS CONTENTS
5 Speck implementation 31
5.1 Speck encryption algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
5.1.1 Operations in Speck . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
5.1.2 Variants of Speck . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
5.1.3 Steps in the algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 31
5.2 Difference with AES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
5.3 Speck for 8 bit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
5.3.1 ADD operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
5.3.2 ROTATE operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
5.3.3 XOR operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
5.4 Testbed for Speck . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
5.5 Speck for 16 bit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
6 Attack on Speck 38
6.1 Challenge in attacking Speck . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
6.1.1 Difference in mixing the key . . . . . . . . . . . . . . . . . . . . . . . 38
6.1.2 Lack of sbox operations . . . . . . . . . . . . . . . . . . . . . . . . . 39
6.2 Steps to attack Speck . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
6.3 Experiences and difficulties . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
6.3.1 First attempt of the attack . . . . . . . . . . . . . . . . . . . . . . . . 42
6.3.2 Zero key issue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
6.3.3 The need of more number of traces . . . . . . . . . . . . . . . . . . . 46
6.3.4 Second attempt of the attack . . . . . . . . . . . . . . . . . . . . . . . 47
6.4 Speck vulnerability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
6.5 Attack for 16 bit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
7 Circuit Countermeasures 50
7.1 filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
7.1.1 Single capacitor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
7.1.2 Single inductor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
7.1.3 LC filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
7.1.4 Effectiveness of tested filters . . . . . . . . . . . . . . . . . . . . . . . 54
7.2 Methods we tried . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
7.2.1 Voltage regulator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
7.2.2 Zener diode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
7.2.3 Operational amplifier . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
7.2.4 Constant current source . . . . . . . . . . . . . . . . . . . . . . . . . . 59
7.2.5 An additional microcontroller in parallel . . . . . . . . . . . . . . . . . 60
7.2.6 FTDI chip . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
7.2.7 Effectiveness of tested methods . . . . . . . . . . . . . . . . . . . . . 65
8 Software countermeasures 68
8.1 Random instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
8.1.1 Principles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
8.1.2 Experiments and results . . . . . . . . . . . . . . . . . . . . . . . . . 69
8.2 Sbox shuffling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
8.3 Improvements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
8.3.1 Setup 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
8.3.2 Setup 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
iii
CONTENTS CONTENTS
9 Conclusion 85
9.1 Testbed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
9.2 Speck . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
9.3 Phase 3 : Countermeasures . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
iv
List of Figures
v
LIST OF FIGURES LIST OF FIGURES
vi
LIST OF FIGURES LIST OF FIGURES
8.4 Correlation peaks for injection of (a) no random instructions (b) one random
instruction (c) Three random instructions (d) seven random instructions . . . . 72
8.5 Number of power traces required when Sbox operations are randomly shuffled . 74
8.6 Correlation peaks when SBox operations are randomly shuffled . . . . . . . . . 75
8.7 Amplification of thermal noise via a resistor through an op amp caused saturation 76
8.8 Hardware random number generator that generates random pulses . . . . . . . 76
8.9 Amplified reversed biased noise . . . . . . . . . . . . . . . . . . . . . . . . . 77
8.10 Further amplified reversed biased noise . . . . . . . . . . . . . . . . . . . . . 77
8.11 Digitized random pulses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
8.12 The process of generating an n bit random seed using digitized random pulses . 79
8.13 Histogram for random numbers generated by xoring 100 values . . . . . . . . . 80
8.14 Histogram for random numbers generated by xoring 1000 values . . . . . . . . 80
8.15 Histogram for random numbers generated by Matlab . . . . . . . . . . . . . . 80
8.16 Frequency spectrum of the signal formed by the hardware random number gen-
erator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
8.17 Frequency spectrum of the signal formed by Matlab random number generator . 81
8.18 Hardware random number generator that generates an analog waveform . . . . 81
8.19 Amplified reversed biased noise . . . . . . . . . . . . . . . . . . . . . . . . . 82
8.20 Further amplified reversed biased noise . . . . . . . . . . . . . . . . . . . . . 82
8.21 Process of generating a random seed via the analog to digital converter by sam-
pling the amplified analog noise signal . . . . . . . . . . . . . . . . . . . . . . 83
8.22 Histogram for random numbers generated by xoring 10 values . . . . . . . . . 84
8.23 Frequency spectrum of signal formed by generated random numbers . . . . . . 84
vii
List of Tables
6.1 The effect of the selection function on the difference between the the highest
and the second highest correlation coefficient . . . . . . . . . . . . . . . . . . 46
6.2 Time taken for a successful attack on AES . . . . . . . . . . . . . . . . . . . . 48
6.3 Time taken for a successful attack on Speck 8 bit implementation . . . . . . . . 48
8.1 Number of power traces and time required for different number of random in-
struction injections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
viii
Chapter 1
Introduction
1.1 Cryptanalysis
A cryptosystem is a pair of algorithms including the encryption algorithm and the decryption
algorithm that does encryption and decryption based on a key. As the security of such a system
fully depends on the secrecy of the secret key, if somehow the key is compromised then any
unauthorized party will be able to decrypt the cipher text. Analysing a cryptosystem to find a
weakness that would leak the secret key is called cryptanalysis [7]. Currently various attacks
exist which can be used to derive the key of a cryptosystem in an unauthorized fashion.
The most trivial of such attacks is the brute force attack. In a brute force attack all the
possibilities for the key are generated and are tried one by one until it matches the correct key.
Brute force attacks fall into the category of traditional attacks. As modern encryption algorithms
consist of very large keys as large as 128 bits or 256 bits, the number of permutations is huge that
this type of attack is not practically feasible due to the huge computational time required. On
the other hand another category of attacks called modern attacks exist which use mathematical
approaches. Differential cryptanalysis and linear cryptanalysis are such methods and these
methods are more effective than traditional attacks.
1
1.3. POWER ANALYSIS CHAPTER 1. INTRODUCTION
out information are called side channel attacks. If these side channels are used to attack a
cryptosystem then they are called side channel attacks.
Consider Figure 1.1. Plain text is given as input to the system and the system runs the
encryption to output cipher text. The key used in the system is unknown to the outside. But
the system leaks out information using various side channels unintendedly. These side channel
data are collected and they are analysed together with the input data to derive the secret key.
This same process can be done in an decryption operation as well where now the input would
be cipher text and output would be plain text.
2
CHAPTER 1. INTRODUCTION 1.4. POWER CONSUMPTION OF CMOS CIRCUITS
p = vi
As the voltage given to a circuit is equal to the power supply voltage and because it is
always constant, the above equation can be now written as follows. Here VDD is the voltage of
the power supply. Therefore by measuring the instantaneous current, the instantaneous power
consumption can be deduced.
p = VDD i
The power consumption of a CMOS circuits is made up of two components as static power
and dynamic power. Static power is due to the leakage current of transistors and therefore it is
dependent on the design of the circuit. On the other hand, dynamic power is due to the switching
of transistors and therefore it is dependent on the data being processed and the operation being
done. As power analysis utilizes the relationship between power consumption and the data
being processed, dynamic power is the relevant one. As the static power is mostly constant,
the variation in the total power is solely due to dynamic power and therefore the total power
consumption can be directly used for an attack.
Dynamic power is consumed when switching occurs in transistors. That is if a CMOS cell
changes from 0 to 1 or from 1 to 0, switching occurs in transistors and power is consumed.
Figure 1.3 is for a CMOS inverter. Upper graph there shows the input signal and the output
signal of the CMOS inverter while the below graph shows the instantaneous current. As you
can see there are two transitions in the input signal where at first it goes from 0 to 1 and then
from 1 to 0. At these moments where switching takes place peaks are visible in the instantaneous
current drawn by the circuit.
These clear changes in the current is due to the current drawn to charge the capacitance in
the transistors and the temporary short circuit current that is drawn during switching. In devices
such as a microcontrollers, the data bus which is long and connected to many components have
a quite big capacitance. Therefore writing to the data bus during a memory read or write causes
a significant power consumption. When attacking software based encryption using power anal-
ysis this is the point which is mostly made use of.
3
1.5. POWER MODEL CHAPTER 1. INTRODUCTION
Figure 1.3: Dynamic power consumption during switching in a CMOS inverter [3, p. 32]
As the significant power consumption in the CMOS circuit occurs during transitions, this
model works fine. Most microcontrollers usually set the data bus to 0 or 1 before writing a value
to it. In a data bus where all the bits are set to 0 first, hamming distance is just the hamming
weight of the value written to the bus afterwards( HW (v XOR 0) = HW (v) ). If it is a
data bus where all bits are initially set to 1 then the hamming distance is equal to hamming
weight of the value written subtracted by the number of bits in the bus. ( HW (v XOR 1) =
N um bits − HW (v) ). Therefore for most microcontrollers hamming weight can be directly
used as the power model.
4
CHAPTER 1. INTRODUCTION 1.6. CORRELATION POWER ANALYSIS (CPA)
data. CPA is more recent and when compared to other algorithms such as DPA has numerous
advantages such as requirement of less number of power traces [10].
As the first step of the attack, an intermediate value generated during the cryptographic
operation (encryption or decryption) which is a function of a variable data value and part of the
key must be selected. This function is known as the selection function. In the equation below I
is the intermediate value. The variable data sample which is generally plain text or cipher text
is shown by d. k is part of the key which will be called as the subkey after this point.
I = f (d, k)
Then power measurements must be done when the device is doing the cryptographic op-
eration. Here power traces must be obtained for several hundred (or thousands depending on
the system) of samples for d. These obtained traces are the real power consumption values.
Imagine that we used N number of plain text samples to get N number of power traces. Each
power trace would have M number of sampling points which correspond to power consumption
at each sampled moment in time.
After that, the intermediate values for each of the variable data samples used and the power
consumption for those intermediate values must be calculated using a power model such as
hamming distance model. This calculation must be repeated for all possible values for the
subkey considered. These calculated data are the hypothetical power consumption values.
Finally the real power consumption values and the hypothetical power consumption values
per each subkey must be compared to each other to find out which subkey has the highest
correlation. This comparison is done using the statistical method called Pearson correlation
coefficient given by the equation below [10]. In the equation, the j th sample point of the ith
power trace has been written as Wi,j . The hypothetical power consumption value for the ith
plain text (or cipher text) with respect to the appropriate subkey has been written as Hi . An
estimate for the correlation for j th sampling point for a certain subkey can be found using the
equation.
PN PN PN
N i=0 Wi,j Hi − i=0Wi,j i=0 Hi
ρ̂ = q PN PN q
)2 N N
2 PN
−( Hi2 − ( Hi )2
P
N i=0 Wi,j i=0 Wi,j i=0 i=0
Pearson correlation coefficient is a value between -1 and 1. If the value is 1 that means the
two data sets compared (hypothetical power consumption values and real power consumption
values) have the best correlation. If 0 that means there is no correlation at all. -1 means they
are correlated at best but the relationship is such that the values in the data sets are inversely
proportional. As the two data sets used here to find the correlation coefficient is the hypothet-
ical power consumption data and the real power consumption data, the correlation coefficient
depicts how much they are related. Therefore the maximum correlation coefficient would be
obtained when the hypothetical power consumption values for the samples are calculated using
the correct key.
The fact that the analysis is done separately on separate subkeys is the one that reduces the
complexity of a correlation power analysis attack to a practically feasible value. For example
consider a 128 bit key. A 128 bit key has 2128 possibilities and checking such number of pos-
sibilities is not feasible even on a super computer. This is why a brute force approach is not
feasible for most encryption algorithms out there today. Now consider a CPA attack where a
subkey is size of a byte. Here the 128 bit key is divided into 8 bit portions to make 16 bytes.
A byte which is 8 bits just have 256 possibilities and when all 16 keybytes are considered only
256x16 possibilities must be computed. This number of computations is practically feasible.
5
1.7. COUNTERMEASURES CHAPTER 1. INTRODUCTION
Countermeasures against power analysis attacks can be categorized in to two broad cate-
gories as hardware based countermeasures and software based countermeasures as shown in
Figure 1.4. Hardware based countermeasures as the names suggests is the addition of new com-
ponents or modification of existing components of the cryptosystem, in order to minimize the
leakage of secret data as power. Hardware countermeasures can be further classified into two
categories namely circuit level countermeasures and micro architecture level countermeasures
as shown in Figure 1.4. Circuit level countermeasures involve addition of components to the
circuit or modification of the circuit at a macro level. For example adding a power line filter to
the cryptographic device or connecting a noise source are circuit level countermeasures. Micro
architecture level countermeasures involve addition of components or modification of compo-
nents at logic gate level. For example, adding additional transistors to a CMOS gate such that
power consumption becomes different, is such micro architecture level countermeasure. Soft-
ware based countermeasures on the other hand do modifications to the program code that runs
on the microcontroller of the cryptographic device. For example inserting some false instruc-
tions randomly at the middle of the encryption to change the power consumption pattern is a
software based countermeasure.
Due to the massive threat imposed by power analysis attacks, a lot of research is already
conducted to form various countermeasures. Currently there are many such introduced coun-
termeasures and are elaborated in chapter 2.
6
CHAPTER 1. INTRODUCTION 1.8. THE PROJECT AND ITS OBJECTIVES
as smart cards are widely used for authentication mechanisms. Further even ATM (Automated
Teller Machine) cards and SIM (Subscriber Identity Module) cards are also smart cards. There-
fore as this type of attacks are a serious risk for the security, researchers are proposing various
countermeasures.
To find countermeasures or do any other power analysis based research on the first place a
testbed is needed. That is a system comprising of a cryptosystem, power measurement system
and necessary software to do the analysis is needed. Building such a system involves a diffi-
cult process which is time consuming. While it needs a quite good knowledge on electronics,
computer interfacing, mathematics and programming knowledge, the process needs spending
lot of time to do troubleshooting. Anyone who is interested in power analysis going through
this same process is a wastage of time. But if a testbed is already available one can directly start
on countermeasures or similar research rather than going through the troubles of building the
testbed. Moreover if such an already built system is available anyone who has a little electronic
knowledge also can use it.
The first phase of our project is to build such a testbed for power analysis based side channel
attacks. As described before, the objective is to help researchers who are interested on coun-
termeasures to save them from the testbed setup time and difficultly. And also we need this
testbed to continue the next phases of the projects for this semester as well as the next semester.
The cryptographic device of the testbed is built using a reprogrammable microcontroller such
as PIC or AVR. Therefore this device does software based encryption opposed to hardware
based encryption which in that case uses direct logic circuits or FPGA (Field Programmable
Gate Array). Then we attack AES using the created testbed to check whether our testbed works
properly.
The second phase is to do cryptanalysis on an encryption algorithm called Speck. Speck is a
very recent encryption algorithm that was released by the NSA in June 2013. This belongs to the
type of ciphers called block ciphers which AES also belongs, but the speciality is this is a very
lightweight cipher targeted mainly for embedded systems. Therefore it is expected to be famous
among embedded devices in the future and therefore it must be tested to check whether this
algorithm is vulnerable to power analysis to determine the necessity of countermeasures. Up
to now according to best of our knowledge there is no such power analysis based cryptanalysis
done on Speck. Therefore in the second phases we check whether Speck is vulnerable to power
analysis attacks. This phase also involves implementation of Speck for 8 bit microcontrollers,
due to the fact that such an implementation currently is not available.
Currently there is a lot of available countermeasures against power analysis as we explain in
chapter 2. But at one point any of them would fail. Performance and powerfulness of computers
as well as devices such as digital oscilloscopes are rapidly increasing and therefore the effec-
tiveness of old countermeasures decrease. A countermeasure which appears to take infeasible
amount of time to be attacked today might become feasible in the future with the improvement
of technology. Further, the effectiveness of a countermeasure would compromise factors such
as power, performance, device size and cost. For example adding new hardware components
will increase power consumption and the device size, which is not preferred. Integrating a
software countermeasure would reduce the performance and efficiency. Meanwhile any coun-
termeasure would definitely increase the cost of a cryptographic device. Due to all these reasons
countermeasures against power analysis is still a very open research area. Therefore the third
phase of our project is to work on countermeasures against power analysis. First we study the
current state of the art of hardware based countermeasures. Then we would practically test
some selected circuit based countermeasures on the testbed we have created and evaluate their
effectiveness. Here we limit testing of hardware based countermeasures only to circuit based
7
1.8. THE PROJECT AND ITS OBJECTIVES CHAPTER 1. INTRODUCTION
countermeasures as micro architecture level ones would need access to advanced facilities such
as fabrication. Next we study the current state of art of software countermeasures and then
some selected software countermeasures would be tested and evaluated. Finally we improve
a selected countermeasure with the goal of contributing towards the security of embedded de-
vices. Meanwhile we would also test some new ideas to verify whether they are good enough
as countermeasures or not.
The rest of the report is organized as follows. In chapter 2 we discuss about the related
work. In chapter 3 we discuss the process of building of the testbed and in chapter 4 we discuss
how AES algorithm was attacked and its results. Then in chapter 5 how the implementation of
Speck algorithm for PIC was done is explained. In chapter 6 the attack on Speck is discussed
while emphasizing the new approaches and mechanisms needed while presenting the results.
In chapter 7 we elaborate some selected circuit based countermeasures and present the results
from the tests we did for verifying their effectiveness. Several new circuit based ideas tested by
us are also presented with evaluation results on their performance. In chapter 8 we elaborate
some selected software based countermeasures and show how effective they are, based on the
test we carry out. Improvements done by us to a selected software based countermeasure is also
elaborated. Finally we conclude the report in chapter 9.
8
Chapter 2
Related work
This is a commercially available board but this being built using an FPGA does hardware
based encryption. Power analysis on hardware based encryption and software based encryption
has differences and therefore for those who do research on software based encryption this board
cannot be used. On the other hand most cryptodevices such as smart cards use microcontrollers
in their design due to the low cost and hence that is software based encryption. The crypto-
graphic device of our testbed does software based encryption and therefore it would be useful
for anyone interested in software based encryption.
Martinasek et al. [13] has created a testbed for Differential Power Analysis where the cryp-
tographic device is based on a PIC microcontroller. But the method they have used to interface
the cryptographic device to the computer is RS232. Today most computers such as laptops does
not have RS232 interfaces and therefore using it with such a computer is not straight forward.
In our setup we have interfaced the cryptographic device using Universal Serial Bus (USB) and
because almost all computers today are equipped with USB ports our device can be more easily
used. Further for measuring power they have used a current probe. A current probe is much
costly and our laboratory does not have such probes. On the other hand our power measurement
setup measures voltage rather than the current and therefore ordinary oscilloscope probes can
be used.
9
2.1. POWER ANALYSIS AND TESTBEDS CHAPTER 2. RELATED WORK
Petrvalsky et al. [14] has created the cryptographic device for their testbed using an 8051
based processor for Differential Power Analysis. They have used a USART for communication
and again the use of RS-232 causes the problems discussed in the previous work. To measure
power they use two different setups. One method is for ordinary oscilloscope probes while the
other method is for differential probes which are costly. They claim that the method that does
not use differential probes require more power traces to derive the key than the method that
use differential probes. In our work we introduce a new power measurement method that uses
ordinary probes but yet needs similar number of traces the method that used the differential
probe required. This power measurement method is not found in any publication available to
us.
Petrvalsky et al. [15] has also created a testbed based on an ARM based processor and there
as well the same USART have been used. In this setup for power measurement a differential
probe is required. This type of probes are much costly than an ordinary oscilloscope and they
are not available in our lab. Our power measurement setup as described above uses ordinary
oscilloscope probes. They have alternately suggested a method to use two ordinary oscilloscope
probes instead of the differential probe but then with another probe for the trigger a total of three
probes are needed. Oscilloscopes normally come with two probes and even if three probes were
available then the waves from the first probe has to be subtracted from the second one which
produces an unnecessary overhead. Our power measurement setup only require two probes, one
for power measurement and the other for the trigger.
Tepanek et al. [11] designed an AVR microcontroller based power analysis testbed and the
cost for the testbed including the testbed is claimed to be $2300. The upper margin of their
oscilloscope at that time is $2000 and therefore the cost for the cryptographic device with the
power measurement circuits and relevant interfacing circuits is about $300. This is a bit higher
cost for us and therefore we have come up with a setup which is less costly than that. Another
issue in this setup as similar to the previous ones is the use of serial port for interfacing.
Moreover all the publications above that discuss power analysis on software based encryp-
tion does not discuss any detailed information about the testbed they have created. They neither
include any steps or the methodology they followed in building the testbed. Therefore those
testbeds are not easily reproducible.
Another issue with the above mentioned publications is that they all use AES to do power
analysis. Apart from them there are many other works related to power analysis but they are
also mostly about AES or DES. According to best of our knowledge currently there is no work
that discuss about power analysis on the Speck algorithm. Therefore the fact that whether the
algorithm is vulnerable to power analysis attacks is yet unknown. In the second phase of our
project we do power analysis on Speck.
Brier et al. [10] introduced the Correlation Power Analysis algorithm for analysing the
power traces to derive the secret key. As explained in section 1.6 we make use of this algo-
rithm for the power analysis attack.
Correlation Power Analysis algorithm has a large time complexity and hence a generic im-
plementation that runs on the CPU takes a considerable amount of time. This makes it difficult
due to the fact that large amount of time has to be waited to get the result for an attack. Therefore
we use an implementation of Correlation Power Analysis on CUDA (Compute Unified Device
Architecture) done for a previous project during semester 5 which harness the power of thou-
sands of threads in NVIDIA GPUs to accelerate the algorithm more than 1000 times faster [16].
As that implementation was for analysing power traces for hardware based encryption, couple
of changes were done to port it for software based encryption. As this was for attacking AES
for the attack on Speck it has to be further modified to comply with that algorithm.
10
CHAPTER 2. RELATED WORK 2.2. COUNTERMEASURES
2.2 Countermeasures
Work related to circuit level hardware countermeasures are very few. Sprunk [17] in his inven-
tion uses a clock that outputs a stream of random clock pulses. When the clock is unpredictable,
the moment at which a certain instruction would run is also unpredictable. Therefore the ob-
tained power traces would be misaligned. The misalignment of power traces makes it necessary
to collect and analyse large number of power traces which makes the attack more time con-
suming. But many communication protocols such as USB and RS232 need a stable clock and
therefore usage of an unpredictable clock is a disadvantage is such situations.
Kocher et al. [9] proposed introducing noise to the power line which is also a circuit level
countermeasure. When noise is manually added using some noise generating device, it de-
creases the signal to noise ratio (SNR) and hence the number of required power traces increase.
Mangard [3, pp. 167-175] also proposed several circuit level countermeasures. One technique
is filtering the power line. When an appropriate filter is connected to the cryptographic de-
vice, the exploitable power consumption patterns are attenuated making the attack difficult.
Mangard [3, pp. 167-175] also proposed skipping clock pulses. When the clock signal is passed
through a filter which randomly skips clock pulses, the alignment of the power traces are broken
making the attack difficult. According to best of our knowledge, most of these countermeasures
are just proposed but not practically tested. Therefore we practically implement some of these
countermeasures and attack them to check how effective they are.
On the other hand there are lot of work regarding micro architecture level hardware coun-
termeasures. For example Waddle and Wagner [18] introduced a method called dual rail logic
to achieve constant power consumption for every clock cycle. This eliminate the correlation
between power traces and the data and operation being processed. But devices with micro ar-
chitecture level countermeasures are very costly. Also the power is consumed equally all the
time increasing the power consumption. Further additional logic gates increases the device size
as well. Working on micro architecture level countermeasures would need technologies such as
fabrication and hence we are not focusing on these countermeasures.
On software countermeasures as well there are a lot of works. Table 2.1 work summarizes
some selected software based countermeasures.
11
2.2. COUNTERMEASURES CHAPTER 2. RELATED WORK
In dummy instruction injection method, instructions such as NOP (No Operation) are ran-
domly inserted to the encryption code. This breaks the alignment of power traces and hence the
number of power traces required for the attack increases. But NOP instructions which do not
do any work would have a power consumption pattern that is easily distinguishable. Therefore
using techniques such as SPA the traces can be visually inspected to detect the NOPs.
The method proposed by Ambrose et al. [19] inserts real instructions instead of NOPs.
This makes visual inspection of power traces a challenge. But inserting extra instructions is
a compromise with power consumption. But random instruction insertion is quite simple to
implement on any microcontroller. Therefore this method is less costly and easily feasible but
yet provides considerable level of security when compared many other countermeasures. But
the effectiveness of the countermeasure has been only tested on a simulator. In simulators ideal
conditions are assumed when modeling the real systems. But in real systems such ideal condi-
tions do not exist and hence the effects may be different, Therefore we practically implement
the random instruction insertion countermeasure on our testbed and attack it to evaluate the
effectiveness on a real system.
The method called randomly shuffling Sbox operations introduced at [3] proposes to ran-
domly change the order at which is the Sbox operations are done. Since the number of Sboxes
in an algorithm is limited, the number of required power traces cannot be indefinitely increased
like in random instruction injection. But as no additional operations are performed unnecessary
computations and power consumption do not happen. We test this countermeasure as well, in
our testbed.
12
Chapter 3
The testbed
As discussed in the introduction to do any power analysis attack research first of all a testbed
is needed. Creating the testbed was a great challenge as any source that properly explained
the steps in a systematic fashion could not be found. It was more challenging as it required
knowledge in various areas such as microcontroller programming, oscilloscope automation,
microcontroller interfacing and electronics where they had to be explored and learnt. Further,
lot of failures occurred when building the system and there was no proper source that explained
the steps to troubleshoot and therefore most problems were solved mostly by experimentation.
In this chapter we discuss the methodology that was followed in constructing the testbed while
elaborating the difficulties encountered and how they were solved. The testbed construction has
been divided into the following sections although they are tightly combined, for the convenience
of proceeding the explanation.
13
3.1. CRYPTOGRAPHIC DEVICE CHAPTER 3. THE TESTBED
USB peripheral module which would be helpful to interface with the computer to send plain
text to the device. Since this PIC is going to be used throughout the project, the pin diagram of
the device is shown in Figure 3.2. The AES code was compiled and the hex file was burnt to
the PIC. Here the program was such that it encrypted a set of hard coded plain text using a hard
coded key and then wrote the result to the EEPROM(Electrically Erasable Programmable Read
Only Memory) of the PIC. The sole purpose was to verify the accuracy of the code. The circuit
was setup up on a breadboard and the circuit diagram used is shown in Figure 3.3. Here the
clock to the PIC has been provided via an 8MHz ceramic resonator similar to the one shown in
Figure 3.4.
Then after turning on the power supply and waiting a while for the program to run, the PIC
was removed and put back to the PIC programmer to read the EEPROM data. The cipher text
in the EEPROM was compared with an online tool [24] that did AES encryption to verify the
accuracy of the AES code we used.
14
CHAPTER 3. THE TESTBED 3.1. CRYPTOGRAPHIC DEVICE
Figure 3.3: Most basic circuit diagram for connecting the PIC
USB protocol is very complicated and it requires firmware from the microcontroller side
and driver from the computer side to work. Therefore writing from the scratch would be a
totally different project and hence a library had to be used. MikroC has a USB library but
unfortunately it was only for enumerating the device as a Human Interface Device (HID). This
would be suitable for a mouse or a joystick which are common human interface devices but for
our device which needed to transfer text this would not suitable. Even a software like Matlab
doesnt have a API (Application Program Interface) that directly writes or reads text via such an
HID interface and therefore an alternative solution had to be found.
After some investigation it was found out that the PIC C IDE by CCS had a library to enu-
merate a PIC as a Communication Class Device (CDC) device. This type of devices are exposed
as a virtual serial port on the computer and hence programming is same as programming the
serial port which is quite easy. Therefore we switched from Mikro C to PIC C. Getting familiar
with PIC C which was bit different from MikroC took some time but after a while we did nec-
essary modifications to the code in Mikro C to make it compatible with PIC C. After verifying
the accuracy of the code by an experiment similar to the one we did to Mikro C, we proceeded
for USB interfacing. PIC C had easy to use library functions to handle USB and after getting
familiar with them, the AES code was extended such that; it accepts plain text via USB, do the
encryption and send the cipher text back to the computer. Finally it was compiled and burned to
the PIC and the setup was done on a breadboard according to the circuit diagram in Figure 3.5.
Here D+ and D- in the USB port has been connected to respective pins in the PIC. Here we
have designed the device as a self-powered USB device rather than a bus powered USB device.
That is rather than providing power via the 5V VBUS pin in the USB port we give power using a
dedicated power supply. That is why the Vcc pin of the USB port in Figure 3.5 is not connected.
The reason is because USB power has lot of noise then the obtained power traces for the attack
15
3.1. CRYPTOGRAPHIC DEVICE CHAPTER 3. THE TESTBED
But unfortunately more problems occurred with respect to USB when introducing the power
measurement resistor explained in section 3.2. When the resistor is connected serially to the
device to compensate the voltage drop across the resistor, the voltage of the power supply had
to be increased and when doing this suddenly the USB connection disconnected. After that it
was no longer detected by the PC and after some investigation it was found out that the internal
voltage regulator in the PIC that provided power for USB communication had burnt. As many
users experiences in different forums said that the voltage regulator in the PIC is very sensitive
16
CHAPTER 3. THE TESTBED 3.2. POWER MEASUREMENT CIRCUIT
and burns unexpectedly, rather than buying a PIC we decided to supply a regulated 3.3V to the
pin 14 of the PIC from an external supply after setting the configuration options in PIC C to
turn off the internal voltage regulator.
Problems did not stop at that point where errors occurred during transmission of data. That
is, data sent by the computer sometimes corrupted when being read by the PIC. Also the com-
munication froze unexpectedly sometimes. After some investigation it was found out by putting
a capacitor of value 0.47µF to pin 14, the problem can be solved. The final circuit diagram is
shown in Figure 3.8.
17
3.2. POWER MEASUREMENT CIRCUIT CHAPTER 3. THE TESTBED
arise.
Figure 3.9 shows a simple setup of a microcontroller connected to a power supply. The
power consumption is of the microcontroller at an instance is given by the equation p = VDD I
where I is the instantaneous current as explained section 1.4. As VDD is constant, the current
I is directly proportional to the power p. Therefore the power can be deduced by measuring
the current. But for measuring the current a device called a current probe is needed which is
costly. Therefore a method which an oscilloscope can be used is preferred but oscilloscopes
measure voltage rather than current. We elaborate two techniques which make it possible for an
oscilloscope to be used for measuring power.
18
CHAPTER 3. THE TESTBED 3.2. POWER MEASUREMENT CIRCUIT
Power was captured using this method and the CPA algorithm explained in section 1.6 was
run to check whether the correct key could be obtained but unfortunately it was not successful.
It was found out that two factors had affected the wrongs results.
2. Wrong grounding
Mangard [3, pp. 49-51] in the example for his power measurement setup has used a 1 ohm
resistor. But when we ran the CPA algorithm (explained in section 4.1) on the traces collected
using a 1 ohm resistor we could not derive the key. But later we found out that when we used a
resistor which is 10 ohms or higher we could derive the key successfully.
Figure 3.12: How the averaged power trace becomes 0 when oscilloscope ground is incorrectly
connected
Also connecting the ground wire of the oscilloscope it must be done correctly or else the
attack would not work. Initially we connected the oscilloscope probes as shown in Figure 3.11
and obtained the power traces. As the attack didnt work we took multiple power traces for the
encryption of the same plain text and averaged it. This can be easily done using the functions
in the oscilloscope itself. When average of multiple traces are taken the most noise cancels out
and a trace that depicts an accurate power consumption can be taken [3, p. 83]. The averaged
trace looked like the one shown in Figure 3.12. Here yellow wave which is almost a zero level
is the power trace (voltage across the resistor) and blue wave which is a square pulse is the
19
3.2. POWER MEASUREMENT CIRCUIT CHAPTER 3. THE TESTBED
trigger used which is explained in subsection 3.2.3. As you can see, the power trace is almost
0 which means that something is wrong in the measurement. After investigation it was found
out that wrong placing of ground wire of the oscilloscope was the issue. In Figure 3.11 we
have connected the USB ground to the top of the resistor and the ground of the oscilloscope to
the bottom of the resistor. As both ground values are the same this puts the voltage across the
resistor to 0 and this explains the power trace in Figure 3.12. As a solution what we did was
connecting the probe as shown in Figure 3.13. Now both grounds are connected to the same
place but the power trace would be an inverted one because we are measuring a negative voltage
with respect to the oscilloscope ground. But in oscilloscopes there are functions to easily invert
a wave. After connecting in this correct fashion we got an average trace as shown in Figure 3.14.
Now the power trace shows some significant power consumption.
20
CHAPTER 3. THE TESTBED 3.2. POWER MEASUREMENT CIRCUIT
is costly must be used to measure power across A and B. As an alternative two oscilloscope
probes can be used. One oscilloscope probe can be connected to measure voltage between A
and C while another probe can be connected to B and C while making sure the ground wires
of both probes connect to C. Then by subtracting the voltage of the first probe by the voltage
of the second probe you can get the voltage between A and B. One issue with this two probe
method is the unnecessary overhead to do the subtraction of the wave forms. Another issue is
most oscilloscopes usually comes with two probes and when both probes are used like this we
will not have another probe to use as the trigger that is explained in subsection 3.2.3 In spite of
the difficulties VDD resistor method is still advantageous as elaborated in section 4.3 because it
requires less number of power traces to do the attack which in turn will reduce the attack time.
Later accidentally we figured out a new interesting method where just one probe can be used
to measure power using the VDD resistor method. That is just by connecting the oscilloscope
probe between B and C. Therefore here we measure voltage across the microcontroller rather
than across the resistor. Measuring using this method lets you do the attack using lesser number
of traces than what is needed when using ground resistor method. The comparison of number
of traces needed is done in section 4.3.
Power measurement should be taken when the device is doing the encryption but not when it
is idling or communicating with the computer. Therefore the oscilloscope must be properly
triggered to measure the exact place we want to measure. As we made the device we have
set the pin 21 of the PIC to set a signal when is starts the encryption. This signal goes to 5V
when it does the encryption and when it is not it goes to 0. The blue wave form in Figure 3.12
and Figure 3.14 are the trigger signal set by pin 21. We have set the oscilloscope to trigger on
the occasion when the trigger signal goes from 0 to 5V. In a situation where the cryptographic
device is not programmed by the attacker yet there are methods to set the trigger but at this stage
for convenience we ourselves set the trigger [3, pp. 49-51].
21
3.3. CONSTRUCTION OF THE PCB CHAPTER 3. THE TESTBED
Figure 3.16: The cryptographic device and the power measurement setup on a breadboard
Figure 3.17: The cryptographic device and the power measurement setup on a PCB
22
CHAPTER 3. THE TESTBED 3.4. POWER CAPTURING AUTOMATION
Figure 3.18: The overview of the basic steps in the automated power measurement setup
3.5 Summary
As explained above lot of difficulties arose when making the testbed and troubleshooting had
to be done. Also note that explained are the significant difficulties that we faced while many
other little to moderate problems that were encountered have been omitted. Therefore it is very
clear that creating a testbed is a time consuming process and having an already built testbed
would save a researcher interested in power analysis research from those difficulties and time
consumed.
23
Chapter 4
As you have understood when reading chapter 3, the creation of testbed and doing a test attack
are not two isolated processes. While creating the testbed we had to do several attacks to verify
whether our testbed was working properly. By doing so only we figured out errors in our setup
and rectified them. To test our testbed we used AES as the algorithm under attack. The reason
to select AES was because it is the most widely used block cipher in the world and therefore
lot of material on how to do a power analysis on AES are available. Since people had already
attacked AES we knew that if the attack does not work, the problem should be in our testbed.
Otherwise if we directly started to attack an algorithm which is not yet tested we would not be
able to determine whether the algorithm is not vulnerable for power analysis or whether our
testbed is wrong. That is why we first tested on AES rather than directly attacking the Speck
algorithm
In this chapter we first show the steps followed when attacking AES. Then we elaborate
some analysis we did about power analysis using AES as the algorithm under attack.
24
CHAPTER 4. ATTACKING AES USING CPA 4.1. STEPS OF THE ATTACK
25
4.1. STEPS OF THE ATTACK CHAPTER 4. ATTACKING AES USING CPA
results were obtained. In this table the column named Key1 shows the set of keybytes which
gave the highest correlation. The column Correlation1 is the correlation coefficients for each
of the keybytes in Key1. The key2 and Correlation2 are for the set of keybytes which has the
next highest correlation and so on. As you can see, the column Key1 matches the exact key we
have used. The correlation coefficients for those keys are quite high where all are more than
0.7. But the correlation coefficients for the key with second highest correlation and the third
highest correlation are much lesser where both of them are less than 0.3. Therefore the fact that
the correlation coefficients of Key1 and Key2 are having a large gap between them compared to
the gap between the correlation coefficients of Key2 and Key3, lets us confidently decides Key1
would give us the correct key.
For getting the correct key not even 500 traces were necessary. Even using just 200 traces,
the key could be derived correctly with a clear gap in the correlation coefficients for Key1 and
Key2. Taking 200 traces usually takes at most 10 minutes and the CPA algorithm on CUDA for
200 traces takes just less than 1 minute. Therefore AES on PIC could be broken in a time even
lesser than 15 minutes.
26
CHAPTER 4. ATTACKING AES USING CPA 4.2. NUMBER OF TRACES
For plotting these graphs first the CPA code was modified such that it printed all correlation
coefficients for all keys rather than the top most ones. Then using a shell script the CPA program
was called for different number of traces. The processing of the results and plotting them was
automated by writing a Matlab script.
27
4.3. MEASURING CIRCUIT EFFECT CHAPTER 4. ATTACKING AES USING CPA
Keybyte number 0 1 2 3 4 5 6 7
Best key match 0 0 0 0 8B 0 45 5B
Best correlation 0.413 0.384 0.479 0.501 0.406 0.621 0.378 0.382
Next correlation 0.399 0.362 0.362 0.332 0.360 0.360 0.363 0.359
Difference 0.013 0.022 0.117 0.169 0.046 0.260 0.014 0.024
Table 4.2: Results for ground resistor method
the method we invented where power is measured across the microcontroller rather than across
the resistor we could still measure power using this method. In this section we compare how the
number of power traces needed to successfully do the attack changes depending on the power
measuring method used.
First we collected 200 traces using the ground resistor method and ran the CPA code. Then
we collected another 200 traces now using the VDD resistor method. The key of the crypto-
graphic device when doing these measurements had been set to all zeros. The value of the
resistor used in both situations was 100 ohms. Table 4.2 shows the summary of the results for
the ground resistor method and Table 4.3 shows the summary of the results for the VDD resistor
method.
In both tables we only show the first 8 keybytes of the key rather than showing all 16
keybytes to save space. There the first row shows the index of the keybytes. Since we are only
Keybyte number 0 1 2 3 4 5 6 7
Best key match 0 0 0 0 0 0 0 0
Best correlation 0.720 0.754 0.736 0.777 0.708 0.850 0.745 0.742
Next correlation 0.387 0.375 0.400 0.385 0.380 0.402 0.361 0.386
Difference 0.334 0.379 0.336 0.391 0.328 0.449 0.384 0.356
Table 4.3: Results for VDD resistor method
28
CHAPTER 4. ATTACKING AES USING CPA 4.4. NOISE FILTER EFFECT
Keybyte number 0 1 2 3 4 5 6 7
100 MHz Best key match 0 0 0 0 0 0 0 24
Best correlation 0.417 0.384 0.444 0.393 0.374 0.522 0.388 0.384
Next correlation 0.379 0.376 0.385 0.386 0.372 0.374 0.376 0.379
Difference 0.038 0.008 0.059 0.007 0.002 0.147 0.012 0.005
75MHz Best key match 0 0 0 0 D 0 0 0
Best correlation 0.437 0.471 0.492 0.490 0.379 0.598 0.440 0.405
Next correlation 0.375 0.397 0.403 0.381 0.368 0.369 0.367 0.399
Difference 0.061 0.074 0.088 0.110 0.011 0.229 0.073 0.006
42MHz Best key match 0 0 0 0 0 0 0 0
Best correlation 0.52 0.55 0.50 0.58 0.45 0.64 0.46 0.52
Next correlation 0.36 0.38 0.36 0.37 0.37 0.38 0.36 0.39
Difference 0.15 0.17 0.13 0.21 0.07 0.26 0.10 0.13
21MHz Best key match 0 0 0 0 61 0 0 0
Best correlation 0.437 0.472 0.444 0.511 0.398 0.568 0.427 0.394
Next correlation 0.394 0.390 0.381 0.402 0.364 0.399 0.406 0.378
Difference 0.043 0.082 0.063 0.109 0.034 0.169 0.021 0.015
11MHz Best key match 0 0 0 0 0 0 0 0
Best correlation 0.470 0.390 0.402 0.432 0.418 0.608 0.454 0.441
Next correlation 0.391 0.378 0.396 0.365 0.374 0.382 0.386 0.370
Difference 0.079 0.012 0.006 0.067 0.044 0.226 0.068 0.071
Table 4.4: The dependence of results on the cut-off frequency of the high pass filter
considering first 8 keybytes this is from 0 to 7. The second row has the key that had highest
correlation while the third row shows the value of the correlation coefficient for that key. The
fourth row has the value of the second highest correlation coefficient. In this table we don’t show
the key with respect to that second highest correlation as it is not needed for our comparison.
The last row has the difference between the correlation coefficients in row 3 and 4. As you can
see, though the same number of traces were used the results in the two cases are different. In
VDD resistor method all derived keybytes are 0 which is correct. Here the correlation values for
those key bytes are quite high which is greater than 0.7 and the difference is also quite high.
But in the ground resistor method several keybytes have gone wrong. Even the keybytes which
are 0 does not have high correlation values as in the VDD case. Also the difference between the
correlation values here are much smaller.
Therefore we can say that the VDD method for measuring power is better than the ground
resistor method when the number of traces needed is compared, that in turn saves the time taken
for the attack.
29
4.4. NOISE FILTER EFFECT CHAPTER 4. ATTACKING AES USING CPA
least twice the maximum frequency component in the continuous signal in order to retain all
the useful information about the original signal. Therefore in this case a sample frequency of
24MHz (12MHz X 2) or more is necessary. Using a lesser value than that would reduce the ac-
curacy of the power consumption measurements of each instruction, but on the other hand using
a very high sampling frequency also would be disadvantageous as then it would only increase
the amount of noise in the measurements rather than improving the accuracy. The oscilloscope
we used (Tektronix MSO2012B) has a frequency range of 100MHz and this whole bandwidth
would not be therefore useful in our case. As the oscilloscope had a function to activate a high
pass filter known as the noise filter we applied various cut off frequencies it supported and ob-
tained power traces. Here we used the ground resistor method to measure power and took 200
power traces for each cut-off frequency 100MHZ(no filter applied), 75MHz, 42MHz, 21MHz,
and 11 MHz which were the cut-off frequencies supported by our oscilloscope. In all these
cases the key of the cryptographic device had been set to all zeros.
In Table 4.4 the results have been tabulated. As similar to the results in section 4.3 only
the first 8 keybytes have been shown and what are depicted by each row in each sub table for
different cut-off frequencies are also same as section 4.3. Here if you observe Difference row
for each cut-off frequency for correct keybytes it is visible that the largest difference between
the highest and the second highest correlation coefficients exist for 42MHz. This means that
at 42MHz the number of traces required for an attack is low. The reason why 42MHz is better
than 100MHz or 75MHz would be because using 42MHz means the noise at frequencies above
42MHz would be cut-off making the traces more clear. The reason why the lower frequencies
such as 21MHz or 11MHz provide worse results would be because those frequencies are lesser
than the Nyquist frequency and hence useful information are removed from traces.
30
Chapter 5
Speck implementation
31
5.2. DIFFERENCE WITH AES CHAPTER 5. SPECK IMPLEMENTATION
left by 3 bits and is subjected to xor with the result r. This is what happens in the first round.
This sequence of actions are repeated through the rest of the rounds. For the first round, K2 was
directly used but then for the next rounds, a modified version of the key called the round key is
used. These round keys are formed using add, rotate and xor operations carried step by step for
K1 and K2 as shown in Figure 5.1.
32
CHAPTER 5. SPECK IMPLEMENTATION 5.3. SPECK FOR 8 BIT
which the key and plain text are mixed is completely different when compared with the process
in AES. Figure 5.1 showed the overview of Speck algorithm and there as you can see, the whole
key is not mixed at once with the plain text. The key is first split into two equal parts as K1 and
K2 and similarly plain text is split into two parts as P1 and P2. P1 which is the left half of plain
text is subjected to round and add operations and the result is mixed with K2 which is the right
half of the key, by using xor operation. K1 is used in the next rounds to generate the round keys.
33
5.3. SPECK FOR 8 BIT CHAPTER 5. SPECK IMPLEMENTATION
Index 0 1 2 3 4 5 6 7
Value 0x07 0x06 0x05 0x04 0x03 0x02 0x01 0x00
this number can be shown using a byte array. There the first row indicate the index of the array
and the second row shows the value stored in that slot in hexadecimal.
Though the representation is straight forward, the challenge was to implement the basic
operations needed for carrying out the encryption. Following subsections discuss how those
operations were implemented based on the unsigned char array.
c1=(a1+b1+c0)/256
c6=(a6+b6+c5)/256 c0=(a0+b0)/256
+ +
Array A a7 .
a1 a0
+ + +
Array B b7 .
b1 b0
c7=(a7+b7+c6)/256
Array S s7=(a7+b7+c6)%256 .
s1=(a1+b1+c0)%256 S0=(a0+b0)%256
Fortunately PIC C compiler supported 16 bit integers despite the PIC is 8 bit. Therefore the
addition of two bytes, for example the addition of (a0 and b0 ) was taken to a 16 bit unsigned
variable. Then simply by taking the modulo operation of this value with 256 the value for s0
can be found (s0 = (a0 + b0 ) % 256). The carry can be found by doing an integer division with
256 (c0 = (a0 + b0 )/256. s0 can be put to the respective slot in array S and c0 should be added to
the addition of a1 and b1 when finding s1 and c1 . This process continues till the most significant
byte of the array is reached. The carry obtained for the most significant byte c7 is dropped as
anyway even in 64 bit addition this part is dropped.
34
CHAPTER 5. SPECK IMPLEMENTATION 5.3. SPECK FOR 8 BIT
Figure 5.3: An 8 bit shift right operation of a 64 bit unsigned integer using a byte array
Though the above case is very straight forward, the difficulty comes when k is not a multiple
of 8. For example say k is equal to 3 bits. Now the array cannot be shifted by whole elements
but instead each elements must be internally shifted while taking the overflown bits to the the
beginning of the next consecutive byte. Consider Figure 5.4 where the byte array is right shifted
by 3 bits. As you can see, the most significant byte (byte 7) can be just shifted by 3 bits as the
leftmost 3 bits anyway becomes 0. But the other bytes cannot be just shifted like that. For
example if the byte 6 is considered, just shifting the value to the right by 3 bits is not enough but
further it is necessary to bring the last 3 bits of the most significant byte as shown in Figure 5.4.
How the implementation was done is shown in Figure 5.5. Here we right shift the value in the
6th slot by 3 bits and apply or operation with the value in 7th slot shifted left by 5 bits. This trick
brings the overflown bits of the 7th slot to the beginning of the 6th slot.
Now consider a more difficult situation where a value has to be shifted by a number of bits
k, where k is larger than 8 and k is not a multiple of 8. For example consider a shifting by 19
number of bits. Now the method in the previous paragraph will not work as shifting a byte by
19 bits will make it all zero as the byte it self is just 8 bits. Therefore in such situations, the
solution is to first shift by 16 bits which means shifting the whole array elements by two and
then to do a shift by 3 as in the previous paragraph. Therefore generally if the shifting is to
be done by k bits k/8 number of bytes must be shifted and then k%8 number of bits must be
35
5.4. TESTBED FOR SPECK CHAPTER 5. SPECK IMPLEMENTATION
bit7
bit3
bit2
bit1
bit0
bit7
bit3
bit2
bit1
bit0
bit6
bit6
Byte array before
bit5
bit5
bit4
bit4
...
shifting
bit7
bit3
bit2
bit1
bit0
bit7
bit3
bit6
bit6
bit5
bit5
bit4
bit4
Byte array after
0
0
0
...
shifting
Byte 7 Byte 6
shifted according to the method described in the previous paragraph. Here the sign ‘%‘ refers
to modulo operation.
The left shift is the inverse of the right shift which was discussed up to now and hence it
is not elaborated here. Finally the or operation must be implemented but luckily it does not
have any problems like overflowing and therefore or operations of corresponding bytes can be
done independently using a simple loop. The implementation of or is exactly similar to the
implementation of xor operation discussed in subsection 5.3.3.
36
CHAPTER 5. SPECK IMPLEMENTATION 5.5. SPECK FOR 16 BIT
such as accepting input from a computer and sending result to the computer via the USB inter-
face were added in a similar fashion described in subsection 3.1.2. Then it was compiled and
then PIC microcontroller was programmed to get a working Speck cryptosystem.
Then finally the Speck cryptosystem was tested for the accuracy and proper functionality.
37
Chapter 6
Attack on Speck
Speck which is a very recent light weight algorithm for microcontrollers as described in chap-
ter 5 has a good possibility of becoming popular for embedded device industry in the future.
Therefore it would be a good idea to check whether this algorithm is also vulnerable for power
analysis attacks so that any countermeasures can be implemented. According to best of our
knowledge there is no work on power analysis on Speck and therefore this chapter discusses
new experiences. First in section 6.1, the differences in Speck with AES that makes it impossi-
ble to directly use the known attack method on AES is discussed. Then in section 6.2 the step
by step methodology that should be followed by someone to reproduce an attack on Speck is
described. Next section 6.3 describes the experiences we did undergo when formulating the
attack including the challenges that were faced and how they were successfully overcome. Af-
ter that in section 6.4 the effort taken for a power analysis attack on Speck is compared with
a similar attack on AES. Finally in section 6.5 we show that the attack is successful not only
on 8 bit microcontrollers but also on 16 bit microcontroller and hence how the concept can be
generalized for any bit microcontroller.
38
CHAPTER 6. ATTACK ON SPECK 6.2. STEPS TO ATTACK SPECK
but even then instead of mixing K1 directly, K1 is first subjected to various operations to derive
a round key and finally it is this round key that is mixed. This round key is not just formed
using operations on K1 alone, but also K1 is subjected to a mixing with a modified version of
K2. Therefore we cannot directly derive K1 but instead the round key must be first derived and
reverse operations must be done to this round key with the help of already found K2 to derive
original K1.
Therefore process of attacking the key in Speck must be done in two phases. First a set of
power traces must be collected that involve power consumption for right half of the key and CPA
algorithm must be run on those traces to derive that key. After that, another set of power traces
has to be taken that involves power consumption for left half of the key and CPA algorithm in
collaborated with the already found key must be run to derive this part of the key. Due to the
need of carrying out in two phases, the overall time to do an attack is definitely high for Speck
than when compared with AES.
Figure 6.1 shows the most basic steps that should be followed to attack Speck. Each block
39
6.2. STEPS TO ATTACK SPECK CHAPTER 6. ATTACK ON SPECK
represents a task that should be followed while the arrows represents the data flow. The order
of how tasks should be executed is shown using a number inside the blocks.
The first step as shown in Figure 6.1 is collecting of power traces. This set of power traces
must include power consumption of reading or writing of an intermediate value that is based on
the right half of the key K2. The point marked in Figure 6.2 as R1 which is the result that comes
after mixing K2 with transformed plain text is such a good point. We suggest that you measure
power during the xor operation M1 as marked in Figure 6.2. When power is measured across
this operation by setting the trigger explained in subsection 3.2.3 to go high at the beginning of
this M1 operation and then to go down at the end of M1 operation it will be possible to capture
the power consumption that would happen when loading the value at R1 from memory.
After the power traces are collected, as the second step, the CPA algorithm should be run by
giving those power traces as the input. Of course the plain text used for collecting power also
should be given as input though it is not shown in Figure 6.1. The CPA implementation used for
AES cannot be used here. Instead some modifications must be done such that it comply with
Speck. For example changes must be done to the selection function such that now it calculates
the intermediate value at R1. Also as we are attacking half of a 128 bit key, now the algorithm
no longer has to do calculations for 16 bytes but just 8 bytes. Once the CPA algorithm is run it
will output you the right half of the key K2.
Then the third step as shown in Figure 6.1 is the capturing of other set of power traces. This
set of traces should include power consumption of reading or writing of a value that is based
on the left half of the key K1. The point marked as R2 in Figure 6.2 is such a good point.
This is the first place in the algorithm where K1 is mixed with plain text but note that K1 is
already mixed with K2. We suggest that you measure power across the M2 operation shown in
Figure 6.2 similar to a way that was described for M1.
40
CHAPTER 6. ATTACK ON SPECK 6.3. EXPERIENCES AND DIFFICULTIES
As the fourth step, the CPA algorithm should be again run by giving the new set of power
traces as the input. The CPA implementation used at step 2 cannot be used here directly as now
the intermediate value is now R2 but not R1. Therefore that implementation should be changed
appropriately and run to get the round key at the position marked as K’ in Figure 6.2 as the
result.
The final step is to derive original left half of the key K1 using the round key. For this,
the round key at K’ must be applied with reverse operation until K1 is obtained. Figure 6.3
denotes the order of reverse operations. It is well known that the reverse operation of xor is xor
it self. Therefore by doing an xor between K’ and the left rotated K2 at A, value at C can be
found out. The xor of C with 0 is C itself and therefore value C is equal to D. Then instead
of addition subtraction must be done. By subtracting K2 from D value E would be found out.
Here care must be taken when D is less than K2 as such value for D is a result of a overflow that
happened when finding D using E and K2 at addition during the key generation. Therefore in
such situation we should subtract K2 from D plus removed carry during addition. The reverse
operation of rotate right is rotate left and by doing that K1 can be derived successfully.
Now as shown in Figure 6.1 we have both K1 and K2 where by concatenating them the
correct key can be found out.
Figure 6.3: Deriving left half key from the round key
41
6.3. EXPERIENCES AND DIFFICULTIES CHAPTER 6. ATTACK ON SPECK
42
CHAPTER 6. ATTACK ON SPECK 6.3. EXPERIENCES AND DIFFICULTIES
difference with the previous traces is that now the traces do not include power consumption
during sbox lookup. Let’s call this the second set of power traces. Now the CPA algorithm with
AddRoundKey as the selection function is expected to return the correct key because the traces
still include the moment when the result after AddRoundKey operation is written to the memory.
But after running the algorithm we figured out that we always get all zeros as the result despite
of what is set as the key in the cryptosystem.
With that result we could decide that when measuring power only for the xor operation some
problem is caused. As this was an unexplainable result it was decided to investigate further by
plotting how the correlation coefficient changed with the time (sampled time in power traces)
for each possibility of each keybyte for each selection function . We guessed that by comparing
the plots for each selection function it would be possible to get a hint regarding the reason.
As there are 256 possibilities per each keybyte where there are 16 separate keybytes, 4096
correlation versus time datasets had to be generated while running the CPA algorithm. For this,
the CPA code was modified such that it printed these necessary information to separate files.
But as this raised a necessity of retaining large amount of data in the memory we had to do some
more changes to the code such that now the analysis was done part by part while transferring
calculated portions of data from memory to disk. First the modified CPA code was run while
using the the combination of xor and sbox as the selection function. Then the modified CPA
code was run while using just xor as the selection function. For both situations, the first set of
power traces was used.
After generating the two data sets which each was several gigabytes in size, a Matlab script
was written to to draw the plots. The script was run separately on the two datasets to get two
sets of 4096 graphs. From these graphs necessary graphs were selected to do the comparison.
Figure 6.4: The correlation vs time graph for a correct key when using sbox lookup as the
selection function
Figure 6.4 is the correlation vs time graph for the correct key guess for a certain keybyte
when sbox lookup is used as the selection function (most precisely combination of xor and sbox
43
6.3. EXPERIENCES AND DIFFICULTIES CHAPTER 6. ATTACK ON SPECK
Figure 6.5: The correlation vs time graph for a wrong key when using sbox as the selection
function
look up is the selection function). The x axis represents the time. Actually this is the sample
number of the power traces. As our power traces had 100 000 sample points x axis range from 1
to 10×104 . The y axis is the correlation coefficient for the considered key guess. As you can see
there is a sudden peak in the middle. This is the place where the result from the sbox lookup is
written to the memory. Correlation coefficient goes high because the real power consumption at
this moment matches the hypothetical power consumption values that we calculated. Figure 6.5
shows a similar graph but for a wrong key guess. There we do not see any sudden peak in the
correlation values because there is no such correlation between the real and hypothetical power
values.
Figure 6.6 is the graph drawn for the correct key guess for the same keybyte as the previous
case but now xor is the selection function used. There two peaks are visible. The left peak is
when the result of the xor operation (let’s call this value R) is written to the memory. When the
xor operations of all keybytes finish, then starts the sbox lookup and R has to be again loaded
from memory. The peak in the right side is due to this loading of R, to be used for the sbox look
up. If you consider the location of this right peak with the single peak in Figure 6.4, it can be
noted that they lie very near. The reason is that because the loading of R for sbox lookup from
memory and then writing the result of sbox lookup to the memory happen just after the other.
The graphs for a wrong key guess when xor is used as the selection function still looks very
same to Figure 6.5 where there are no peaks. But for key guess 0 though it is a wrong key, still
it has some peaks as shown in Figure 6.7. This high correlation should be the reason that falsely
returned zero as the result by the CPA algorithm for the second set of power traces we used. In
this second set of power traces we only captured power for the xor operation and therefore the
right peak in Figure 6.6 would be no longer there now as those samples for the sbox lookup is
no longer there. Therefore if you consider the left peak in Figure 6.6 (correlation when writing
R to the memory) is less than the significant peak that is occurring in Figure 6.7. Therefore
0 is returned as the most possible key. If you observe the location of the significant peak in
44
CHAPTER 6. ATTACK ON SPECK 6.3. EXPERIENCES AND DIFFICULTIES
Figure 6.6: The correlation vs time graph for a correct key when using xor as the selection
function
Figure 6.7: The correlation vs time graph for the zero key guess when using xor as the selection
function
45
6.3. EXPERIENCES AND DIFFICULTIES CHAPTER 6. ATTACK ON SPECK
Keybyte number 0 1 2 3 4 5 6 7
xor 0.122 0.051 0.043 0.049 0.029 0.035 0.030 0.073
sbox 0.516 0.498 0.555 0.485 0.554 0.626 0.507 0.514
Table 6.1: The effect of the selection function on the difference between the the highest and the
second highest correlation coefficient
Figure 6.7 carefully you can see that this lies just before the left peak in Figure 6.6. As the peak
in Figure 6.6 was corresponding to the writing of result after the xor, we can deduce that the
significant peak in Figure 6.7 is the moment when plain text is loaded from memory in order to
be subjected to xor.
When a number is subjected to xor with 0 the result is the number itself. Therefore if plain
text is subjected to xor with 0 the result is the plain text itself. When we use xor as the selection
function, for key guess 0 the CPA algorithms correlates the power consumption during loading
of plain text memory to the hypothetical power consumption calculated for the xor of plain text
with key guess 0. Finally this was the reason that makes a peak for 0 key guess.
Since now the cause for the zero key issue was known we decided to do another attempt
which is going to discussed in subsection 6.3.4.
46
CHAPTER 6. ATTACK ON SPECK 6.3. EXPERIENCES AND DIFFICULTIES
values to the hypothetical power consumption values calculated by the power model. When
an sbox is there a change in one bit of the key would change lot of bits in the resultant value
and hence when calculating hypothetical power consumption values a keyguess with even one
bit change to the correct key would give a very different hypothetical power value than the
real power consumption value. Therefore correlation for the correct key would be significantly
higher than for the key guesses that are even closer to the correct key, making it possible to
derive the correct key with less number of power traces. In contrast, in an operation like xor
a change of one bit in the key would maximally change only one bit in the resultant value and
therefore it makes it harder to distinguish hypothetical power consumption values for the keys
closer to the correct key. Because of that larger number of power traces are required.
Though the original purpose of sbox and the concept of confusion itself was to make a
cipher more secure it seems that for power analysis having a sbox becomes an advantage to the
attacker.
Based on that experience got by the analysis done in the previous subsections we did a second at-
tempt to attack Speck where we became successful at the end. As explained in subsection 6.3.2
in order to get the correct key, power should be measured such that it does not include the load-
ing of the value to be subjected to xor, but only include the moment when writing the result
of xor after the operation. One solution may be to preprocess the power traces such that the
loading part is trimmed but as xor operation happen eight number of times per each keybyte
finding the exact locations is difficult.
Then we came up with a solution which used a different power measurement point that
still made it possible to use the same old selection function. The result at R1 in Figure 6.2
which is the result after the xor operation is of course loaded back from memory later when
the algorithm starts the next encryption round (2nd round). Therefore if power was measured
when this loading happen, that is during M1 operation in Figure 6.2, then the problem would be
solved. The reason is that now we do not have the loading of the value before xor which resulted
in the troublesome correlation. After new power traces were taken based on this approach the
correct key could be successfully derived after running the CPA algorithm.
Since the right half key of was found out successfully now the next step was to find the left
half key. As it was discussed in subsection 6.1.1, a round key is generated by doing various
modifications to the left half key including a mixing with the right half key. It is this round key
that is subjected to xor with modified plain text to get an intermediate value at R2 in Figure 6.2.
Therefore a new set of power traces must be collected that includes writing of this value R2.
Based on the experience with zero key issue power was measured across the operation M2 in
Figure 6.2
On the power traces obtained, CPA algorithm was run with right half key as an input to
derive the round key. But of course the final objective is to find the original key but not a round
key. Therefore the round key must be reversed by applying the reverse operations backwards.
By applying those reverse operations as elaborated in section 6.2, the original left half key could
be found out. This key reversal functionality was added into the CPA code so that it directly
printed the original key.
47
6.4. SPECK VULNERABILITY CHAPTER 6. ATTACK ON SPECK
Table 6.3: Time taken for a successful attack on Speck 8 bit implementation
48
CHAPTER 6. ATTACK ON SPECK 6.5. ATTACK FOR 16 BIT
including attacks on both AES and Speck the subkeys were always 8 bits (1 byte) in size. When
performing AES whether the microcontroller is 8 bit, 16 bit, 32 bit or whatever, the substitution
box lookups are always happening byte by byte. Therefore for any microcontroller, AES can be
attacked using CPA by attacking on each byte of the key.
But Speck on the other hand which comprises of only add, rotate and xor operations does not
have any byte wise operation. If you consider an 8 bit microcontroller these operations would
of course happen byte wise because registers are anyway 8 bits. In section 5.3 we described
how we did the implementation of Speck for an 8 bit microcontroller and there you would
have noticed that all operations were done byte wise using loops. Even if a compiler that
supports larger integers is used for an 8 bit microcontroller, the compiler would still use the
same approach in assembly level. Because of this it can be thought that CPA attack on Speck
which attacked each byte separately would have only worked only because the implementation
was 8 bit. The result of the attack is therefore to be checked in a situation when 16 bit or higher
microcontrollers are used.
For a 16 bit microcontroller a straight forward method to attack would be to attack using
CPA on subkeys which are 16 bits in size. Since registers are 16 bits all operations would
happen in terms of values which are 16 bit in size and therefore the attack will definitely work.
We tried this approach by using the 16 bit implementation of Speck done for PIC described
in section 5.5. Here the CPA code had to be modified considerably such that it now attacked
2 bytes at a time than 1 byte at a time. But this analysis took considerably larger time when
compared to a byte wise attack as now a 16 bit number which has 65536 possibilities increases
the number of computations necessary. Therefore if the microcontroller was 32 bit, the number
of possibilities is much more and the time for a CPA attack also would be high. Therefore the
approach of increasing the size of a subkey is not going to work for microcontrollers with large
registers.
The next approach was to check the feasibility of a byte wise CPA attack on a 16 bit micro-
controller. Though operations happen 16 bits at a time, still there should be some correlation
between the total power consumption for the 16 bits and the power consumption for 8 specific
bits that we try to attack at a time. When more and more power traces are taken for different
plain text samples, the power consumption for the rest of the bits except the specific 8 bits that
are attacked at a time would be treated as noise [3, pp. 138-141] automatically by the CPA
algorithm. Taking this to consideration we did a byte wise CPA analysis for a set of power
traces captured when encrypting using the 16 bit microcontroller. It was figured out that it was
still possible to recover the key successfully. But the number of power traces needed was much
higher than the number necessary for the attack on 8 bit microcontroller. But still the complete
attack was possible in several hours.
When a 32 bit microcontroller is considered it should be still possible to do a byte wise CPA
attack because now 24 other bits apart from the 8 bits we are targeting at a time for the attack
can be considered noise. But more power traces would be definitely needed for such. Therefore
it can be deduced that the more number of bits the microcontroller is, the more number of power
traces are needed. Capturing more number of power traces takes time while the execution time
for CPA algorithm also takes more time when more traces have to be analysed. But still a
time period like 1 day or even 1 week is still a feasible time for an attacker and hence the
vulnerability is there even for microcontrollers with large registers. Therefore countermeasures
are still needed despite the register size.
49
Chapter 7
50
CHAPTER 7. CIRCUIT COUNTERMEASURES 7.1. FILTERS
uses this principle to minimize the leakage. Though this idea has been proposed some time
back [3], still any work that presents the effectiveness of them is not to be found. Therefore we
decided to practically implement them and test them.
The rest of this section discusses each filter we implemented, their results and finally the
comparison of the effectiveness of them.
1000µF Microcontroller
Oscilloscope
ground
Ground
The capacitor connected as shown in Figure 7.1 is a low pass filter. The impedance of a
capacitor is given by the following equation where f is the frequency and C is the capacitance.
1
X=
2πjf C
When the frequency is smaller, the impedance is larger and hence signal components with
smaller frequencies are reluctant to pass through a capacitor. But when the frequency is larger,
impedance gets smaller and those signals pass through the capacitor. When the microcontroller
in Figure 7.1 draws varying currents, it causes noise in the power supply line. The capacitor
conducts and shorts out all the high pass signals, filtering out only the low pass ones. This
smooth out the large variations of the signal.
Though the signal is affected, still there is a visible variation and this is enough to derive
the key of the system. But this requires more number of power traces. Figure 7.3 shows how
the correlation coefficient changes with the number of power traces during an attack. We saw
51
7.1. FILTERS CHAPTER 7. CIRCUIT COUNTERMEASURES
in Figure 4.3 that when no capacitor is used even 50 power traces were enough to figure out the
correct key. But now in Figure 7.3 at least 1500 power traces are required to identify the correct
key out of the others. But the time required to collect 1500 power traces and analyse them was
only about 1.5 hours. This is still a very feasible time for an attacker.
0.35
0.3
0.25
Correlation
0.2
0.15
0.1
0.05
0
500 1000 1500 2000 2500 3000 3500 4000 4500 5000
Number of traces
Figure 7.3: Number of power traces needed when a capacitor filter is used
X = 2πjf L
Here, f is the frequency of the signal and L is the inductance. According to this equation
when the frequency increases, the impedance increases. Hence an inductor is reluctant to pass
the high frequency signals through it. As the inductor is connected serially in Figure 7.4, the
52
CHAPTER 7. CIRCUIT COUNTERMEASURES Effect of an inductor
7.1. FILTERS
VDD
Microcontroller
1mH
Oscilloscope
ground
high frequency signals in the whole branch is attenuated. That is why the variation of the signal
in Figure 7.5 is not as high as when no inductor was used.
The number of traces required to attack the system with the inductor connected can be
found from Figure 7.6. As you an see now at least 500 traces are required to distinguish the key
correctly. This is not as good as the capacitor. But yet better than nothing was used.
7.1.3 LC filter
A low pass second order LC filter was implemented as shown in Figure 7.7, using two 1000µF
capacitors and two 1 mH inductors. As this is a second order filter it has a sharp cut off slope.
Therefore the frequencies greater than the cut off frequency are much more attenuated when
compared to low oder filters considered before. The power trace in this case in shown in Fig-
ure 7.8. As you can see now the power trace is very much smoothened than all previous cases.
As expected, the number of power traces required for an successful attack increased by a
considerable amount compared to the others methods. As seen from Figure 7.9 it is only at
about 5000 power traces, the correct key starts to become significant from other keys. 5000 is
53
7.1. FILTERS CHAPTER 7. CIRCUIT COUNTERMEASURES
0.7
0.6
0.5
Correlation
0.4
0.3
0.2
0.1
0
0 200 400 600 800 1000
Number of traces
Effect of an LC filter
Figure 7.6: Number of power traces needed when an inductor filter is used
VDD
Oscilloscope
Probe tip
100 ohm
1mH 1mH
v
1000µF 100µF
Microcontroller
Oscilloscope
ground
a comparatively large number of traces but yet the time taken for an attack is about 4.5 hours,
which is feasible for an attacker.
54
CHAPTER 7. CIRCUIT COUNTERMEASURES 7.1. FILTERS
0.25
0.2
Correlation
0.15
0.1
0.05
0 1000 2000 3000 4000 5000
Number of traces
55
7.2. METHODS WE TRIED CHAPTER 7. CIRCUIT COUNTERMEASURES
VDD
100 ohm
v Oscilloscope
Probe tip
Vin Vout
Voltage
regulator
G
Microcontroller
Oscilloscope
ground
56
CHAPTER 7. CIRCUIT COUNTERMEASURES 7.2. METHODS WE TRIED
VDD
100 ohm
v Oscilloscope
Probe tip
Microcontroller
Oscilloscope
ground
Figure 7.12: Connecting a reversed biased zener diode in parallel to the microcontroller
When an attack was carried out the number of traces required increased. According to the
graph in Figure 7.14 at about 300 power traces, the key becomes identifiable. Hence this is not
that much effective as a countermeasure.
57
7.2. METHODS WE TRIED CHAPTER 7. CIRCUIT COUNTERMEASURES
0.8
0.7
0.6
Correlation
0.5
0.4
0.3
0.2
0.1
0 100 200 300 400 500
Number of traces
Figure 7.14: Number of power traces needed when a zener diode is used
58
CHAPTER 7. CIRCUIT COUNTERMEASURES 7.2. METHODS WE TRIED
inverting input. Since the operational amplifier is a non linear device, the input - output current
Effect of an operational amplifier
relationship would be non linearl as well. The causes a change in the power trace as shown in
Figure 7.16.
VDD
100 ohm
Oscilloscope
v Probe tip
+ 5V
-
With opamp
Microcontroller
Oscilloscope
ground
From Figure 7.17 it can be observed that the correct key becomes identifiable only after
about 4000 power traces. Therefore this method is somewhat effective. But yet it is not good
enough to prevent an attack.
59
7.2. METHODS WE TRIED CHAPTER 7. CIRCUIT COUNTERMEASURES
0.25
0.2
Correlation
0.15
0.1
0.05
0 1000 2000 3000 4000 5000
Number of traces
Figure 7.17: Number of power traces needed when an operational amplifier is used
current source it is not an ideal one and hence current variations in Milli Ampere range still
Effect of a current source
happens. Therefore an attack needed only about 50 traces and this method is not useful at all as
a countermeasure.
VDD
100 ohm
330
+Vcc
ohm
+
- C828
-Vcc
Oscilloscope
Probe tip
Microcontroller
1 ohm Oscilloscope
ground
60
CHAPTER 7. CIRCUIT COUNTERMEASURES 7.2. METHODS WE TRIED
as an emulation of a dual core chip which is usually powered by a single power supply. The
Effect of a zener diode
circuit diagram is shown in Figure 7.19. There, the microcontroller to the right does the real
encryption while the one to the left does some other job just to consume false power.
VDD
100 ohm
v Oscilloscope
Probe tip
Other
Micro-
Micro-
controller
controller
Oscilloscope
ground
First, we programmed the second microcontroller such that it prints to the serial port in
an infinite loop. Power traces were taken when the correct microcontroller was doing the real
encryption. The obtained correlation vs number of traces plot is shown in Figure 7.20. But
unfortunately it is quite clear that at about 200 power traces, the correct key becomes visible.
0.6
0.5
Correlation
0.4
0.3
0.2
0.1
100 200 300 400 500 600 700 800 900 1000
Number of traces
Figure 7.20: Number of power traces needed when another microcontroller that write to the
serial port infinitely is connected in parallel to the power supply
When the power traces for the other microcontroller was alone inspected via the oscillo-
scope, it was noted that the power consumption for serial printing was not that significant.
Therefore the introduced noise would not have been significant. Therefore it was programmed
to do a mock AES instead of mock serial printing. Then power traces were obtained and anal-
ysed to obtain a plot as shown in Figure 7.21.
But still it is visible in Figure 7.21 that at about 300 power traces, the correct key becomes
visible. Therefore we concluded that the dual core idea was also not successful for hiding the
61
7.2. METHODS WE TRIED CHAPTER 7. CIRCUIT COUNTERMEASURES
0.5
0.45
0.4
0.35
Correlation
0.3
0.25
0.2
0.15
0.1
0.05
100 200 300 400 500 600 700 800 900 1000
Number of traces
Figure 7.21: Number of power traces needed when another microcontroller that do a mock AES
is connected in parallel to the power supply
leakage.
62
CHAPTER 7. CIRCUIT COUNTERMEASURES 7.2. METHODS WE TRIED
Two possibilities for the failure can be either the fact Atmel chips are safer or numerous
other chips and components on the Arduino board affects the power consumption. To identify
the exact cause, the power pin of the Atmel 328P chip was folded upwards and then it was
connected to base on the board via a 10 ohm resistor as shown in Figure 7.23. A 10 ohm
resistor is selected in contrast to a 100 ohm one or otherwise the drop across the resistor would
be too large that the microcontroller would not get enough voltage. The averaged power trace
now looked as shown in Figure 7.24 where now the power trace is quite clear and an attack
should succeed. As expected, at about 2000 traces, the correct key started appearing.
Figure 7.23: Hacking the Arduino Uno board for a successful power analysis attack
With that it is clear that Atmel 328P chip is also vulnerable though it not that vulnerable as
PIC2550. Therefore the reason for the failure for direct power measurements on the Arduino
Uno would have been the extra components on the board. Since Arduino Uno had lot of com-
ponents we minimized the scope by testing a similar attack on an Arduino Nano. Arduino Nano
is also a prototyping board based on an Atmel microcontroller, but which does not have extra
63
7.2. METHODS WE TRIED CHAPTER 7. CIRCUIT COUNTERMEASURES
components and features as in Arduino Uno. But yet the averaged power traces was similar to
one in Figure 7.22, and even at 100,000 power traces the attack became unsuccessful.
Then we obtained an Arduino Mini Pro which was even more simpler than an Arduino
Nano. There, the average power trace was clear as shown in Figure 7.25. As expected the
attack became successful at about 2000 power traces.
Now obviously it should be a component that is present in the Arduino Nano but not in
Mini Pro, the cause for the failure of the attack. Schematic diagram for Arduino Nano [6] and
Arduino Mini Pro [32] was compared to see the difference. As there were few differences, in
order to test the success of an attack by eliminating one item by one item, the Arduino Nano
was implemented on a breadboard.
The schematic diagram of the Arduino Nano is shown in Figure 7.26. There the micro-
controller is an Atmel 328 chip. Since the chip on the Arduino Nano is a surface mount one
permanently soldered to the board, a removable Atmel 328 chip from an Arduino Uno was
used. In Figure 7.26 an FTDI FT232RL chip is present. It is the chip that interfaces the mi-
crocontroller’s serial port to USB port of the computer. Such a chip was obtained and was
connected on the breadboard. All other components such as voltage regulators, zener diodes,
resistors and capacitors were connected on the breadboard exactly according to the circuit dia-
gram in Figure 7.26. The Arduino Nano implemented on a breadboard looks like as shown in
Figure 7.27.
Then power traces were obtained for the Arduino Nano implementation on the breadboard.
But the average power trace was almost null similar to the real Arduino Nano and the attack
did not succeed even at 100,000 traces. Then the FTDI chip was eliminated from the circuit
and an averaged power trace was observed thinking that FTDI would have been the culprit.
The averaged power trace looked very clear similar to Figure 7.25. Therefore the attack would
definitely succeed.
These observation lead us to infer that FTDI chip is the culprit that makes the attack un-
successful on Arduino Nano. The power consumption by the FTDI chip would have been too
noisy that reduces the SNR for the power consumption signal of the microcontroller by very
much. This made us think on the possibility to use an FTDI chip connected to the power supply
in parallel to the micrcontroller as a countermeasure. Therefore it was decided to test on the
PIC2550 microcontroller which was used in earlier evaluations.
The FTDI chip was connected to the PIC2550 microcontroller as shown in Figure 7.28. Up
to now a similar FTDI chip was used for communication between the microcontroller and the
64
CHAPTER 7. CIRCUIT COUNTERMEASURES 7.2. METHODS WE TRIED
PC as it was discussed in chapter 3, but in those cases power to the FTDI chip was provided
2014 10:18:30 D:\Downloads\ArduinoNano30Eagle\Arduino Nano2.sch (Sheet: 1/1)
power separately from the 5 v supply of the PC’s USB bus. Now power to the FTDI chip is
supplied from the same power supply used to supply power to the microcontroller.
An averaged power trace was obtained which is shown in Figure 7.29. Here we can see
that the power trace is clear in opposed to null power trace observed for Arduino Nano. Power
traces were collected and the correlation vs number of traces plot was plotted and it is shown in
Figure 7.30. The key is clear at just about 200 traces!
Finally after all those efforts, even though the FTDI made an attack on an Atmel chip in-
feasible for us, introducing the same FTDI had no effect at all on PIC2550. At the moment the
reason why it happens so is unknown to us.
65
7.2. METHODS WE TRIED CHAPTER 7. CIRCUIT COUNTERMEASURES
100 ohm
v Oscilloscope
Probe tip
D+ RX Micro-
FTDI chip
D- TX controller
Oscilloscope
ground
66
CHAPTER 7. CIRCUIT COUNTERMEASURES 7.2. METHODS WE TRIED
Figure 7.29: Averaged power trace when FTDI chip is connected in parallel to PIC2550
0.6
0.5
Correlation
0.4
0.3
0.2
0.1
100 200 300 400 500 600 700 800 900 1000
Number of traces
Figure 7.30: Number of power traces required when FTDI chip is connected in parallel
67
Chapter 8
Software based countermeasures do modifications to the program code that runs on the crypto-
graphic device as it was discussed in section 1.7. From many available software based coun-
termeasures, due to reasons that were discussed under section 2.2, we selected the technique
called random instruction injection [19] to be practically tested on our testbed. An insight into
random instruction method and the results from the carried out attack is explained under sec-
tion 8.1. Randomly shuffling Sbox operations is another proposed software countermeasure as
mentioned in section 2.2. This method was also implemented and tested by us and the details
are elaborated in section 8.2. A problem with these countermeasures is with respect to the
randomness. We elaborate a low cost method that can be integrated into any cryptosystem to
provide a true randomness. Those improvements are elaborated in section 8.3.
8.1.1 Principles
In section 1.6 how a power analysis attack is done was discussed. Figure 8.1 is a pictorial
recap of the attack. Left side of Figure 8.1 shows the real power consumption values which
are the power traces collected through the oscilloscope. Four power traces are shown where
each corresponds to the power consumption pattern during encryption of a plain text sample. N
number of plain text samples are encrypted and therefore N number of such power traces are
there even not shown in Figure 8.1. The power traces are nicely aligned with respect to time.
Therefore, if a certain event like writing of a certain intermediate value during encryption is
considered, it should happen on the same time sample of every power trace as marked in the
Figure 8.1. Right side of Figure 8.1 shows the hypothetical consumption values which are the
ones calculated using the hamming weight model. There, for each combination of key guess
and plain text those values are calculated for a specific intermediate value during the encryption.
During analysis, each vertical column in real power consumption values are compared with each
vertical column in hypothetical power consumption values using Pearson correlation. When the
68
CHAPTER 8. SOFTWARE COUNTERMEASURES 8.1. RANDOM INSTRUCTIONS
column at the correct time moment is compared with the correct key as marked in Figure 8.1, it
Statistical comparison in CPA
would cause the highest correlation. This is how the correct key is identified. Note that for this
to work, the alignment of the power traces is an important requirement.
Key guess 2
Key guess 3
Key guess 1
Correct key
P1 Q1 R1
Power trace 1 Compare
Using
P2 Q2 R2
Power trace 2 statistics
P3 Q3 R3
Power trace 3
Highest
P4 Q4 R4
Power trace 4 Correlation
In random instruction injection, encryption program is written such that additional instruc-
tions are randomly inserted and executed at the middle of the encryption. Figure 8.2 shows
the same four power traces in Figure 8.1, but now with just one random instruction injected.
One random instruction means that sometimes no random instruction is injected but sometimes
one random instruction is injected. The first power trace in Figure 8.2 is same as in Figure 8.1
because in this case no extra instruction has been injected. The marked position in the trace is
the position that corresponds to the access to the specific intermediate value that is going to be
used for the attack. For the second encryption, a random instruction has been injected at the
middle. Therefore now the corresponding point has shifted to the right as marked in Figure 8.2.
Because of this phenomena, the power traces are misaligned as shown in Figure 8.2. Now
when you consider a certain vertical column in the power trace set, the specific time sample not
only has power consumption values for the intended operations but also for random operations.
Therefore the pairwise comparison is affected.
69
Random instruction injection
8.1. RANDOM INSTRUCTIONS CHAPTER 8. SOFTWARE COUNTERMEASURES
Col A Col B
Power traces
are misaligned
Figure 8.2: How power traces are misaligned when a single random instruction is inserted
by (N+1)2 times. The increase in power traces lets enough number of correct power consump-
tion points fall into the column as well as to compensate the effect on SNR ratio due to the
random power consumption points.
The AES implementation in chapter 3 was modified to inject random instruction. AES was
selected for testing as AES was more vulnerable than Speck and because the most vulnerable
is better for testing countermeasures. In AES, Sbox lookup is the point used for the attack. Our
implementation is such that N number of random instructions are only inserted between the
trigger and the first Sbox lookup. As a result the power traces would be shifted to the right from
0 to N number of slots.
The results for the first keybyte that corresponds to the first Sbox are shown in Figure 8.3.
How the correlation coefficient changes with the number of power traces for each key is shown
for different number of random instructions injected. Figure 8.3a is when no random instruc-
tions are inserted. Even before 50 traces, the correct key is significant. When one random
instruction is injected the number of power traces at which the key becomes significant has
increased up to more than 200 according to Figure 8.3b. Then according to Figure 8.3c, the
number of power traces has grown up to more than 500 traces when three random instructions
are injected. As shown in Figure 8.3d, the number of power traces further increases up to more
than 2000 power traces when 7 random instructions are injected.
Table 8.1 summarizes the approximate minimum number of power traces and also the ap-
proximate minimum time required to carry out a successful attack for different number of ran-
dom instructions. When no random instructions are injected, the attack can be done in no time.
But with the number of random instruction insertions, the required time as well as the time
increase in a quadratic fashion. At fifteen random instruction injections it requires more than
40000 power traces where the time required for the attack is more than 45 hours. If the number
of traces increase at this rate when about 100 instructions are injected, the required number of
traces would be massive. Assuming a quadratic relationship it can be predicted that 200 random
instructions would need more than 500,000 power traces which would increase the attack time
to more than 25 days. Therefore when comparing with the circuit countermeasures tried before
in chapter 7, random instruction method appears to be more effective.
An interesting observation could be made in correlation coefficient vs time graphs for the
correct keys. Figure 8.4 shows those graphs for different number of random instructions in-
jected. The x axis in these graphs denote the time with respect to the sample number of the
70
CHAPTER 8. SOFTWARE COUNTERMEASURES 8.1. RANDOM INSTRUCTIONS
0.6
1
Correct Key
0.9
0.5
0.8
0.7 0.4
Correlation
Correlation
0.6
0.5 0.3
0.4
0.3 0.2
0.2
0.1 0.1
0 50 100 150 200 0 200 400 600 800 1000
Number of traces Number of traces
(a) (b)
0.4 0.25
0.35
0.2
0.3
Correlation
Correlation
0.25
0.15
0.2
0.15
0.1
0.1
0.05 0.05
0 500 1000 1500 2000 0 1000 2000 3000 4000 5000
Number of traces Number of traces
(c) (d)
Figure 8.3: The number of power traces required for injection of (a) no random instructions (b)
one random instruction (c) Three random instructions (d) seven random instructions
Table 8.1: Number of power traces and time required for different number of random instruction
injections
power traces. According to Figure 8.4a it can be observed that the correlation peaks for zero
random instruction injections is high. But the rest of the sub figures show that when the number
of random instruction injections increases, the correlations peaks become shorter and wider. It
was discussed earlier with respect to Figure 8.2 that when the number of random instructions
increases, the number of possible slots in time that a correct power consumption point can fall
also increases. All these slots will have a certain number of correct power consumption points
useful for the attack. Therefore all these slots would cause correlation peaks. As the slots are
adjacent we see that the width of the correlation peaks becoming large with the increase in num-
71
8.1. RANDOM INSTRUCTIONS CHAPTER 8. SOFTWARE COUNTERMEASURES
ber of random instructions. The same fact causes the reduction in the height of correlation peaks
as well. If N random instructions are injected there would be N+1 slots and the probability of
a correct power consumption point falling on to a certain slot is 1/(N+1), if the randomness is
uniform. Therefore, when the number of random instructions increases, the ratio between the
correct power consumption points to wrong points for a considered time slot decreases. Wrong
power consumptions points are like noise and hence the SNR ratio also decreases. Because
of that, the correlation coefficients becomes smaller with the number of random instructions.
Hence the correlation peaks appear to become shorter. Here note that the correlation peaks be-
coming short means that the value of the correlation peaks becoming short. Because the y axis
of sub figures in Figure Figure 8.4 are not in same scale this is not directly visible unless the
scale of the y axis is observed.
0.5
1
0.3
0.6
0.2
0.4
0.1
0.2
0 0
−0.2 −0.1
−0.4 −0.2
0 2 4 6 8 10 0 1 2 3 4 5 6 7 8 9 10
Time / Sample number 4 4
x 10 x 10
(a) (b)
0.25 0.2
0.2 0.15
0.15 0.1
0.1 0.05
0.05 0
0 −0.05
−0.05 −0.1
−0.1 −0.15
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
4 4
x 10 x 10
(c) (d)
Figure 8.4: Correlation peaks for injection of (a) no random instructions (b) one random in-
struction (c) Three random instructions (d) seven random instructions
Though random instruction injection is a simple but effective method that can be easily
implemented on any microcontroller, its main draw back is the wastage of CPU time. More
and more random instructions increase the security but yet increase the CPU time as well. This
reduces the efficiency of the cryptosystem. Further power is wasted for useless computations.
The AES implementation we used for PIC was around 700 instructions. Therefore adding like
100 extra instructions itself would increase the time and power consumed by 1/7 times. But not
only them, but also considerable amount of instruction would be added for generating random
numbers. This will further increase time and power consumption.
72
CHAPTER 8. SOFTWARE COUNTERMEASURES 8.2. SBOX SHUFFLING
73
8.3. IMPROVEMENTS CHAPTER 8. SOFTWARE COUNTERMEASURES
only for the time range during which Sbox lookups are done and the results are written to the
memory. If Sboxes were not shuffled, only one correlation peak should be visible with respect
to the point at which the result of the Sbox operation is written. But here 16 peaks are visible.
Since all 16 Sbox lookups are shuffled randomly, a given Sbox operation can fall on to one of
16 possible time moments. When large number of power traces are taken and if the randomness
is uniformly distributed, each time slot would have approximately the same number of the spe-
cific Sbox that is currently under attack. These cause approximately equal correlation peaks.
But note that the height of correlation peaks here are lesser than 0.04, where in Figure 6.4 when
no Sbox shuffling was done, the peaks were larger as 0.8. This reduction of height of correlation
peaks is due to the low SNR caused by the shuffling.
0.05
0.045
0.04
0.035
Correlation
0.03
0.025
0.02
0.015
0.01
1 1.5 2 2.5 3 3.5 4 4.5 5
Number of traces 4
x 10
Figure 8.5: Number of power traces required when Sbox operations are randomly shuffled
74
CHAPTER 8. SOFTWARE COUNTERMEASURES 8.3. IMPROVEMENTS
0.04
0.03
0.01
−0.01
−0.02
0 2 4 6 8 10
Time / Sample number 4
x 10
Figure 8.6: Correlation peaks when SBox operations are randomly shuffled
challenge for an attacker to figure out the seed. But embedded cryptographic devices are made
of microcontrollers and these microcontrollers usually do not have clocks that keep track of the
current time (real time clocks). Therefore some other method for generating a random seed is
required. We introduce a technique to improve the security of random instruction injection by
providing the seed for the pseudo random generator using a true hardware random generator.
Natural phenomena such as thermal noise across a resistor, nuclear decay of atoms, signal
produced by base of a reversed biased transistor and photoelectric effect are unpredictable using
theories in physics [34]. Therefore such processes are considered true random. Hardware that
uses such processes to generate random numbers are known as true random generators [33].
Amplifying thermal noise across a resistor or the signal produced by base of a reversed biased
transistor provides an easy way to generate a random signal [34]. We first attempted amplifying
thermal noise across a one mega ohm resistor via an operational amplifier, but the 60 Hz noise
from the power lines made the operational amplifier go to saturation causing waveform similar
to the one shown in Figure 8.7. But amplifying the signal produced by base of a reversed biased
transistor, gave good results and hence it was used to implement the true random generators
discussed in the following subsections.
8.3.1 Setup 1
A circuit for a random number generator based on noise generated by a reverse biased transistor
is found in [35]. We implemented that method on a breadboard to check how good it is as a
hardware random number generator.
Figure 8.8 is the circuit diagram of the circuit that we implemented. The collector of the
transistor Q1 is not connected. The base-emitter junction of Q1 is reversed biased. This causes a
noise signal on the base of the Q1 transistor. Transistor Q2 which is in the active region amplifies
the noise signal.The amplified noise signal observed at position marked as A in Figure 8.8 is
given in Figure 8.9. Then the transistor Q3 in Figure 8.8 further amplifies the signal. The signal
observed at position B in Figure 8.8 is shown in Figure 8.10. The signal is an analog signal
75
8.3. IMPROVEMENTS CHAPTER 8. SOFTWARE COUNTERMEASURES
Figure 8.7: Amplification of thermal noise via a resistor through an op amp caused saturation
+15V
4.7K
4.7K 1M
C
C1 B
Q3 74F14N
A
0.1uf 2222A
Q1 Q2
C828 C828 4.7K
0V
Noise generator and
Second amplifier Digitizer
first amplifier
Figure 8.8: Hardware random number generator that generates random pulses
and it is finally sent through a schmitt trigger to create a digital signal with random pulses. The
signal observed at the position marked as C is as shown in Figure 8.11. The problem here is
that, though pulses appear randomly, value 0 is much more present than value 1. This issue is
later solved using software. For Q1 and Q2, C328A transistors and for Q3, 2222A transistor
was used by us. The selection was due to availability reasons and note that any transistor with
a proper gain would work. 15V have been provided to reverse bias the base emitter junction of
Q1 to a level that produces enough noise. Any voltage value between 12V to 20V is suitable
for the components used here. As the Schmitt trigger, we have used the IC 74F14N but even a
not gate such as 74LS04 would be suitable.
The point C in Figure 8.8 is connected to a digital pin of the microcontroller of the cryp-
tosystem. The digital pin is configured as an input pin. When the program running on the
cryptosystem requests for a random seed, the signal on the digital pin is sampled repeatedly. As
the signal on the pin is a random bit pulse as in Figure 8.11, one sample gives a bit. Therefore, if
an n bit random seed is required, n samples have to be taken. By shifting each collected bit ap-
propriately and then doing an or operation among them, an n bit random number can be formed.
But the wave form in Figure 8.11 is zero biased as it was mentioned earlier. Therefore most of
76
CHAPTER 8. SOFTWARE COUNTERMEASURES 8.3. IMPROVEMENTS
the bits in a generated random number would be zero. This would make the random generator
biased to generate certain values only, which is not suitable. To remove the bias, several random
numbers generated in the above fashion were xored [36].
The above process is pictorially elaborated in Figure 8.12. Here the final outcome is an n
bit random seed. The value on the digital pin of the microcontroller is sampled n number of
times to obtain bit 0, bit 1, ... bit n-1 shown in Figure 8.12. By shifting them as shown in
Figure 8.12 and then doing a bitwise or, the random value 1 is generated. But due to the 0 bias
such generated values would not be uniformly distributed. Therefore the process is repeated
M number of times to get M number of random values. Those are subjected to bitwise xor to
obtain an unbiased n bit random seed as shown in Figure 8.12.
The effectiveness of a hardware random generator must be definitely practically tested. 16
bit unsigned integers were formed in the above described method. Hundred such numbers were
xored together to form a single random number. Large number of such random numbers (20000)
were collected and the frequency distribution (histogram) was drawn to obtain Figure 8.13. The
x axis represents the numerical value of the random numbers and y axis is the frequency count.
Since the range of 16 bit unsigned integers span from 0 to 65535 the axis has that range. The
histogram contains 100 bins in the x axis. It is quite visible that the histogram is biased to
certain values. The reason is that the number of xors done was not enough to remove the zero
bias.
77
8.3. IMPROVEMENTS CHAPTER 8. SOFTWARE COUNTERMEASURES
When the number of values xored was increased 1000, a histogram as shown in Figure 8.14
was obtained. Now an almost uniform distribution is seen which is good enough for a random
number generator. In order to verify, a histogram was drawn by generating 20000 random
numbers using the random number generator in Matlab. That histogram is shown in Figure 8.15.
By comparing the histogram of our random number generator with the histogram for Matlab, it
can be inferred that it is good enough.
Another factor to consider when evaluating the effectiveness of a random number generator
is whether the sequence generated is periodic. A sequence with a uniform distribution of values
is not random if the sequence is periodic. Periodicity of an can be evaluated by taking the
Fourier transform [37]. A periodic signal would have a discrete frequency spectrum opposed to
a continuous one.
The random number sequence we are trying to evaluate is a discrete signal. For such discrete
signals, Fast Fourier Transform (FFT) [37] can be used. Therefore, using Matlab, the frequency
spectrum for the 20000 random numbers collected before was obtained. The obtained frequency
spectrum is shown in Figure 8.16. The x is the frequency in terms of bins. The y axis is the
amplitude of the frequency components. It is clearly visible that the spectrum contains all
the frequency components. Except the DC value (0 frequency) rest have similar amplitudes
as well. The higher amplitude for the DC frequency component is because all the random
generated values are positive values and hence the signal is a DC shifted signal. As all frequency
components are there in the signal, we can infer that the signal is not periodic. In order to verify,
a similar test was done for the random generator in Matlab and the obtained frequency spectrum
is shown in Figure 8.17. Frequency spectrums are almost similar, establishing the fact that our
random number generator is good enough.
8.3.2 Setup 2
The randomness for the generated seed by the setup in subsection 8.3.1 is good enough, but col-
lecting and doing xor operations for 1000 values, where each value requires multiple number
of samples, takes some time (about 1 second). Since generating the seed is not that frequent
1 second delay is not a much issue, but the process can be made much faster by doing mod-
ifications to the random number generator setup. The circuit diagram for the new setup after
modifications is given in Figure 8.18
The noise generator and the first amplifier is exactly same as in the previous setup in Fig-
78
CHAPTER 8. SOFTWARE COUNTERMEASURES 8.3. IMPROVEMENTS
Random value 1
bit 0 <<0
bit 1 <<1 bit
bit 2 <<2 wise
OR
n bit integer
bit n-1 <<n-1
Random value 2
bit 0 <<0
bit 1 <<1 bit
bit 2 <<2 wise
OR
n bit integer bit n bit
n bit integer
bit n-1 <<n-1
n bit integer wise random
seed
XOR
n bit integer
Random value M
n bit integer
bit 0 <<0
bit 1 <<1 bit
bit 2 <<2 wise
OR
bit n-1 <<n-1
Figure 8.12: The process of generating an n bit random seed using digitized random pulses
ure 8.18. The difference is that the second amplifier is now an operational amplifier instead
of a transistor. An operation amplifier has a better gain than a transistor. Then the digitizer
found in the earlier setup is not present here. The amplified noise signal from the first amplifier
is passed through the AC coupling capacitor C1 to remove the DC offset. The signal at place
marked as A in Figure 8.18, observed via the oscilloscope is shown in Figure 8.19. This sig-
nal is fed to the operational amplifier configured as an inverting amplifier with a large gain of
87000 times. The amplified signal by the operation amplifier at place marked as B is as shown
in Figure 8.20. This signal is expected to be an amplified version of the signal in Figure 8.19
but the deformation is due to the bandwidth limitation of the operational amplifier where higher
frequency components have been attenuated. Anyway the signal is a random analog signal. The
output of the setup (place marked b) is connected to an analog input pin of the microcontroller
of the cryptosystem. This is why the negative values of the signals have been clipped to zero in
Figure 8.20. This flattening causes a zero bias in the random signal which has to be unbiased in
software.
When a random seed is requested by the program, the voltage value on the analog input pin
is converted via the analog to digital converter (ADC) of the microcontroller. The inbuilt ADC
of general microcontrollers are limited to about 10 bit resolution. If the seed is required to be a
number with large number of bits, several 10 bit numbers can be taken and then can be shifted
and subjected to or operation to form large values. To remove the zero bias mentioned earlier,
several such generated values can be subjected to xor.
79
8.3. IMPROVEMENTS CHAPTER 8. SOFTWARE COUNTERMEASURES
900
800
700
600
Frequency
500
400
300
200
100
0
0 1 2 3 4 5 6
Random number value 4
x 10
Figure 8.13: Histogram for random numbers generated by xoring 100 values
250
200
Frequency
150
100
50
0
0 1 2 3 4 5 6
Random number value 4
x 10
Figure 8.14: Histogram for random numbers generated by xoring 1000 values
250
200
Frequency
150
100
50
0
0 1 2 3 4 5 6
Random number value 4
x 10
80
CHAPTER 8. SOFTWARE COUNTERMEASURES 8.3. IMPROVEMENTS
7
x 10
5
Amplitude
2
0
0 0.5 1 1.5 2
Frequency 4
x 10
Figure 8.16: Frequency spectrum of the signal formed by the hardware random number gener-
ator
7
x 10
5
3
Amplitude
0
0 0.5 1 1.5 2
Frequency 4
x 10
Figure 8.17: Frequency spectrum of the signal formed by Matlab random number generator
+20V 87 k
4.7k
0.1 µF +15V
1
A
C1 -
UA741 B
Q1 Q2 1M +
C828 C828 -15V
Figure 8.18: Hardware random number generator that generates an analog waveform
81
8.3. IMPROVEMENTS CHAPTER 8. SOFTWARE COUNTERMEASURES
82
CHAPTER 8. SOFTWARE COUNTERMEASURES 8.3. IMPROVEMENTS
Random value 1
k bit integer 0 <<0
k bit integer 1 <<k bit
k bit integer 2 <<2k wise
Random value 2
k bit integer 0 <<0
k bit integer 1 <<k bit
k bit integer 2 <<2k wise
OR
Lower order n bits bit n bit
n bit integer
k bit integer
(n/k)*k <<(n/k)*k
Lower order n bits wise random
seed
XOR
Lower order n bits
Lower order n bits
Random value M
k bit integer 0 <<0
k bit integer 1 <<k bit
k bit integer 2 <<2k wise
OR
k bit integer
(n/k)*k <<(n/k)*k
Figure 8.21: Process of generating a random seed via the analog to digital converter by sampling
the amplified analog noise signal
The zero bias of the waveform in Figure 8.20 is much low when compared with the wave-
form in the earlier setup (Figure 8.11). Therefore now collecting and xoring 10 values which
had to be 1000 in the earlier case was enough. This makes the setup such efficient than before.
The randomness was verified by collecting a large number of random numbers and then
drawing a histogram and a frequency spectrum as it was done for the previous case. Obtained
histogram is given in Figure 8.22 and the frequency spectrum is given in Figure 8.23. Based on
the arguments elaborated in subsection 8.3.1 it can be inferred that the random number generator
is good enough.
83
8.3. IMPROVEMENTS CHAPTER 8. SOFTWARE COUNTERMEASURES
250
Frequency 200
150
100
50
0
0 1 2 3 4 5 6
Random number value 4
x 10
7
x 10
5
4.5
3.5
3
Amplitude
2.5
1.5
0.5
0
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
Frequency 4
x 10
84
Chapter 9
Conclusion
Power analysis attacks are a type of side channel attacks that has become a popular method of
attacking an encryption algorithm due to the fact that time taken is significantly less than what
is needed for other types of attacks. Power analysis involves measuring the power consumption
of a cryptographic device during its operation and then analysing it using statistical techniques
such as Correlation Power Analysis in order to derive the key.
85
9.3. PHASE 3 : COUNTERMEASURES CHAPTER 9. CONCLUSION
As the Speck algorithm had lot of differences with AES, the power analysis method for AES
could not be directly used here and hence new approached had to be found. As currently no
one had attempted a power analysis attack on Speck those approaches had to be newly formed.
Though the first attempts of the attacks on Speck using power analysis failed, with the help of
observations found by attacking AES algorithm, the mechanisms and methods were identified
to realize the attack. Finally using the mechanisms and methods we identified, on our testbed
Speck could be broken in less than 1 hour.
With those findings we conclude that though Speck is a very recent algorithm introduced
specially for microcontroller based embedded systems where embedded systems are anyway
the target of power analysis attacks, still it is vulnerable to power analysis attacks. Therefore
just using Speck in embedded systems without any countermeasures against power analysis
thinking that a recent algorithm would be less vulnerable to such attacks is dangerous.
86
CHAPTER 9. CONCLUSION 9.3. PHASE 3 : COUNTERMEASURES
regarding the randomness. Usually a pseudo random algorithm would be used for generating
the randomness but in microcontrollers usually there is nothing like a real time clock to generate
a suitable seed for the algorithm. Hence we implemented a method that uses an amplified noise
signal formed across a reverse biased transistor to generate a true random seed. Two setups have
been introduced by us, where one set up requires a digital pin of the microcontroller to sample
a digitized noise signal, while the other method requires an analogue pin to sample an analogue
noise signal. Each has its own advantages and disadvantages with respect to the components
required and the time needed to generate a seed. The randomness of the seeds generated by both
methods were verified by collecting several thousands of samples and then plotting frequency
distributions and Fourier spectrum.
87
9.3. PHASE 3 : COUNTERMEASURES CHAPTER 9. CONCLUSION
88
Bibliography
[3] S. Mangard et al., Power Analysis Attacks: Revealing the Secrets of Smart Cards (Ad-
vances in Information Security). Secaucus, NJ, USA: Springer-Verlag New York, Inc.,
2007.
[5] Wikipedia. (2014, Apr.) 3 rounds of speck block cipher with key schedule. [Online].
Available: http://en.wikipedia.org/wiki/Speck (cipher)
[7] B. A. Forouzan, Cryptography and Network Security, 1st ed. New York, NY, USA:
McGraw-Hill, Inc., 2008.
[8] E. Prouff, Ed., Constructive Side-Channel Analysis and Secure Design: 5th International
Workshop, COSADE 2014, Paris, France, April 13-15, 2014. Revised Selected ... Com-
puter Science / Security and Cryptology), 2014th ed. Springer, 8 2014.
[9] P. Kocher, J. Jaffe, and B. Jun, “Differential power analysis,” in Advances in Cryptology-
CRYPTO99. Springer, 1999, pp. 388–397.
[10] E. Brier et al., “Correlation power analysis with a leakage model,” in Cryptographic Hard-
ware and Embedded Systems-CHES 2004. Springer, 2004, pp. 16–29.
[11] F. Tepanek et al., “Differential power analysis under constrained budget: Low cost educa-
tion of hackers,” in 2013 Euromicro Conference on Digital System Design (DSD). IEEE,
2013, pp. 645–648.
[12] Y. Hori et al., “Sasebo-giii: A hardware security evaluation board equipped with a 28-nm
fpga,” in 2012 IEEE 1st Global Conference on Consumer Electronics (GCCE), Oct 2012,
pp. 657–660.
[13] Z. Martinasek et al., “General scheme of differential power analysis,” in 2013 36th Inter-
national Conference on Telecommunications and Signal Processing (TSP). IEEE, 2013,
pp. 358–362.
89
BIBLIOGRAPHY BIBLIOGRAPHY
[14] M. Petrvalsky et al., “Differential power analysis of advanced encryption standard on ac-
celerated 8051 processor,” in Radioelektronika (RADIOELEKTRONIKA), 2013 23rd In-
ternational Conference. IEEE, 2013, pp. 334–339.
[15] ——, “Differential power analysis attack on arm based aes implementation without ex-
plicit synchronization,” in Radioelektronika (RADIOELEKTRONIKA), 2014 24th Interna-
tional Conference. IEEE, 2014, pp. 1–4.
[16] H. Gamaarachchi et al., “Accelerating correlation power analysis using graphics process-
ing units (gpus),” in 2014 7th International Conference on Information and Automation
for Sustainability (ICIAfS). IEEE, 2014, pp. 1–6.
[17] E. Sprunk, “Clock frequency modulation for secure microprocessors,” Apr. 4 1995, uS
Patent 5,404,402.
[18] J. Waddle and D. Wagner, “Fault attacks on dual-rail encoded systems,” in Computer
Security Applications Conference, 21st Annual. IEEE, 2005, pp. 10–pp.
[19] J. A. Ambrose, R. G. Ragel, and S. Parameswaran, “Rijid: random code injection to mask
power analysis based side channel attacks,” in Proceedings of the 44th annual Design
Automation Conference. ACM, 2007, pp. 489–492.
[20] L. Goubin and J. Patarin, “Des and differential power analysis the duplication method,” in
Cryptographic Hardware and Embedded Systems. Springer, 1999, pp. 158–172.
[23] F. Finfe. (2014, Aug.) Advanced encryption standard (aes) on embedded system. [Online].
Available: http://www.on4jx.net/microcontroller/AESonEmbedded.php
[24] OnlineDomainTools. (2015, Apr.) Aes - symmetric ciphers online. [Online]. Available:
http://aes.online-domain-tools.com
[25] C. Pitu et al., “A practical approach to power trace measurement for differential power
analysis based attacks.”
[27] R. Beaulieu et al., “The simon and speck families of lightweight block ciphers,” 2013.
[28] vias. (2010, Dec.) What is a filter. [Online]. Available: http://www.vias.org/feee/filters 02.
html
90
BIBLIOGRAPHY BIBLIOGRAPHY
[30] FTDI. (2015, Sep.) Ft232r - usb uart ic. [Online]. Available: http://www.ftdichip.com/
Products/ICs/FT232R.htm
[32] ——. (2015, Sep.) Arduino pro mini. [Online]. Available: https://www.arduino.cc/en/
uploads/.../Arduino-Pro-Mini-schematic.pdf
[35] A. Logue. (2002, May) Hardware random number generator. [Online]. Available:
http://www.cryogenius.com/hardware/rng/
[37] J. G. Proakis and D. K. Manolakis, Digital Signal Processing (4th Edition). Upper Saddle
River, NJ, USA: Prentice-Hall, Inc., 2006.
91