We introduce a deep learning ensemble (NNBits) as a tool for bit-profiling and evaluation of cryptographic (pseudo) random bit sequences.
(This work has been submitted and is currently under review)
Table of contents generated with markdown-toc
# clone repository
git clone https://github.com/Crypto-TII/nnbits
# change working directory
cd nnbits/
# install requirements
pip install -r requirements.txt
# create dataset directory
mkdir 'speck_32_64'(Python)
#### Create the dataset ######
number_of_samples = 300_000
from avalanche_data_generator.speck_32_64 import speck_k64_p32_o32_r22 as data_generator
import numpy as np
dataset = data_generator.generate_avalanche_dataset(int(number_of_samples))
np.save(f"speck_32_64/round6_sequences300k.npy", dataset[6])(bash)
python -m nnbits.run --savepath 'demo_speck32_round7'The most likely problem to occur is that you need to adapt the GPU and CPU settings in the configuration file demo_speck32_round7/config.cfg as explained in How to set GPU parameters.
(bash)
python demo_speck32_round7/demo_analysis.pyYou should find an image like the following one in the demo_speck32_round7 folder as result.png:
Two demo notebooks are included in this repository. Please clone the repository and install the requirements by running:
git clone https://github.com/Crypto-TII/nnbits
cd nnbits
pip install -r requirements.txtIn conda you can install Jupyter Lab via conda install -c conda-forge jupyterlab and launch Jupyter Lab via jupyter lab.
The most likely problem to occur during the demo execution is that you need to adapt the GPU and CPU settings in the configuration file demo_speck32_round7/config.cfg as explained in How to set GPU parameters.
The output gives the following information:
====================================
speck_32_64/round0_sequences300k.npy
|| time | NN finished | pred. bits || best bit | acc (%) | n pred | p value ||
===================================================================================================================
|| 2022-05-19_12h33m59s | 0/100 | 0/1024 || nan | nan | nan | nan ||
|| 2022-05-19_12h34m41s | 1/100 | 63/1024 || 143 | 100.000 | 1 | 0 ||
|| 2022-05-19_12h34m41s | 3/100 | 122/1024 || 237 | 100.000 | 1 | 0 ||
...
|| 2022-05-19_12h34m42s | 16/100 | 762/1024 || 511 | 100.000 | 1 | 0 ||
p-value is below limit ==> stop analysis.
Topmost is the *.npy file which has been analyzed by the ensemble.
The tabular output gives the following information in real-time during the training of the ensemble:
- The
timecolumn gives a timestamp for the row and the rest of the row indicates the ensemble training status. NN finishedis the neural networks which have already finalized their training.pred. bitsindicates how many bits of the total unit length were already present at the output of theNN finished. For example the avalanche unit of Speck 32 has a length of1024bits and in the last timesteps762/1024of those bits had been predicted by one of the neural networks.best bitthe bit which can be predicted with the highest accuracy:accmean test accuracy of thebest bitn predhow many neural networks have already predictedbest bitp valuewhat is the p-value for the observation ofacc
If you execute the code on a new machine or on a new dataset or with a new model, the parameters which are likely to change are the ones relating to how many actors work in parallel on each GPU
# hardware settings <------------ adjust according to your GPU hardware (check with nvidia-smi)
N_GPUS = 1 # how many GPUs do you have available?
N_ACTORS_PER_GPU = 4 # divide the GPU memory by ~3800 MiB for training a generalized Gohr's network on the avalanche dataset of Speck32/64
GPU_PER_ACTOR = 0.25 # <= 1/N_ACTORS_PER_GPU
CPU_PER_ACTOR = 5 # depends on your CPU cores << N_CPU_CORES / N_ACTORSYou can find useful information about GPU usage by running watch -n 0.5 nvidia-smi while running the code.
The snapshot below shows that the memory of GPU 0 is almost full (39354MiB / 40536MiB). This means N_ACTORS_PER_GPU has to be reduced.
The GPU fraction used by each actor (NUM_GPUS) has to be modified accordingly.
Sun May 22 10:41:14 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 495.29.05 Driver Version: 495.29.05 CUDA Version: 11.5 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA A100-SXM... On | 00000000:01:00.0 Off | 0 |
| N/A 38C P0 53W / 275W | 39354MiB / 40536MiB | 0% Default |
| | | Disabled |
+-------------------------------+----------------------+----------------------+
The {data_path} contains a single *.npy file with X sequences of length 1024 bits for SPECK 32/64, for example:
>>> filename = '/home/anna/NBEATSD4/data_5rounds_1000000_samples.npy'
>>> data = np.load(filename)
>>> print(data.shape)
(1000000, 1024) # 1'000'000 rows with n_bits=1024 in each row.
>>> print(data[0])
array([0, 0, 0, ..., 1, 0, 1], dtype=uint8)Often machine learning data is saved in the format of X.npy, Y.npy X_val.npy, Y_val.npy. The following routine produces a dataset of the expected format for NNBits:
#load training and validation data
X = np.load('X.npy')
Y = np.load('Y.npy')
X_val = np.load('X_val.npy')
Y_val = np.load('Y_val.npy')
#combine the data: concatenate Y as a column to X
train = np.c_[X, Y]
val = np.c_[X_val, Y_val]
#combine the data: concatenate rows
final = np.r_[train, val]
#save final
np.save('nnbits_dataset.npy', final)- Add your TensorFlow model
my_model.pyto the foldermodels/. - Add your TensorFlow model to the initialization file
models/__init__.pyby adding a linefrom .my_model import create_model_routine as my_model_id
- Call NNBits and set the configuration parameter
'NEURAL_NETWORK_MODEL': 'my_model_id'
An ensemble of deep neural networks is trained and tested on a *.npy file which contains sequences of potential random data.
- Each ensemble member is a neural network with a unique bit
selection: The respective bit selection will define some bits of the sequence as inputs, and the remaining bits as outputs of the neural network. The input bits will be set to zero at the input of the neural network. The neural network will be trained to predict the output bits. The number of selections, and therefore ensemble members is defined in the*.cfgconfiguration file. - Each ensemble member is trained on the training data as defined in the
*.cfgfile. - Each ensemble member is tested on the test data as defined in the
*.cfgfile.
This repository contains the following files:
nnbits
| | README.md <- the file which generates the current view
| |_
|_ demo.ipynb <- demo notebook
|_ nnbits
|_ run.py <- run the ensemble distinguisher (see `demo.ipynb`)
|_ selections.py <- generates bit selections, see [Methodology](#methodology)
|_ metric.py <- defines a bit-by-bit accuracy as metric
|_ network.py <- handles routines for the deep-learning models in folder `models`
|_ models <- contains the following deep learning models
|_ gohr_generalized.py <- a generalized version of Gohr's network
|_ resnet50.py <- ResNet50 implementation
|_ vgg16.py <- VGG-16 implementation
|_ nbeats.py <- N-BEATS network
|_ trainingtracker.py <- keeps track of the ensemble training progress
|_ filemanager.py <- keeps track of filenames
Running these commands will create a folder located in path save_path with the following structure
save_path
cfg <- *.cfg ensemble configuration file
|_ h5 <- *.h5 neural network model files which contain the weights of each neural network
|_ hist <- *.pkl files which contain the training history of each ensemble member
|_ pred <- *.npy files with the predictions of each ensemble member (generated by running test_ensemble.py)
If you use this code in your work, please cite the following paper
@inproceedings{
}