LiPar is a lightweight parallel learning model for practical in-vehicle network intrusion detection. LiPar has great detection performance, running efficiency, and lightweight model size, which can be well adapted to the in-vehicle environment practically and protect the in-vehicle CAN bus security.
You can see the details in our paper LiPar: A Lightweight Parallel Learning Model for Practical In-Vehicle Network Intrusion Detection. (arXiv:2311.08000)
The dataset we used is Car-Hacking Dataset (you can fine the details of this dataset by the link). We also upload the data files in ./OriginalDataset/. Due to the limitation of upload file size, we compressed each data file into a .rar file. You can get the original data by unzipping the files.
The codes for data processing are uploaded in ./DataProcessing/. Here are the steps for our data processing:
- Data preprocessing and data cleaning:
data_process.pyis used to process attack datasets, includingDoS_Attack_dataset,Fuzzy_Attack_dataset,Spoofing_the_RPM_gauge_datasetandSpoofing_the_drie_gear_dataset.data_process_normal.pyis only used to processnormaldataset. You can change thefile_pathand the new file name indf.to_csvfunction in the code to preprocess the datasets one by one. Then, it will generate five preprocessed data files respectively. The generated datasets can be found in the compressed file./DataProcessing/PreprocessedData.rar. - Image data generating:
img_generator_seq.pyis used to process one-dimensional data sequentially into RGB image data. For attack datasets, there are both attack messages and normal messages. So, if an image is composed entirely of normal messages, we will label it as normal image. Otherwise, we will label it as attack image. Therefore, each attack dataset can generate two sets of image. You can set different directory addresses inimage_path_attackandimage_path_normalto store the generated normal and attack images. For each set of new generated images, the images will be named from 1 ton(nis the total number of images in the set) by sequence. When you finish processing one data file, you can change the filename and path infile_pathto process other data files. The files we used are the preprocessed data files obtained in the previous step. For normal dataset, certainly, the program will only generate one set of normal image. At the end, you will obtain 9 sets of images in different directories. - Dataset partitioning: The directory we used to store all images is
./data_sequential_img/train/. Then, we need to divide the train set, validation set and the test set from all the image data.split_trainset.pyis used to divide 30% of the total image data into validation and test set, the directory of which named./data_sequential_img/val/. Futhermore,split_testset.pyis used to divide$\frac{1}{3}$ of the images invalset into test set named./data_sequential_img/test/. Finally, the ratio of images in the training set, validation set, and test set is7:2:1. Also, you can change the path and directory name to anything you want by modifyingTrain_Dir,Val_DirandTest_Dirin the code.
For learning-based models, first of all, the model should be trained, and then we can obtain the optimal weights and parameters value in the model through training. Finally, the weights and parameters obtained are loaded into the model for testing, so as to verify the final detection performance of the model.
Therefore, we construct two programs for each model, respectively for model training process and testing process. You can find files like train.py, train_LSTM.py or train_CANet.py are for model training process, and files like predict.py, predict_LSTM.py or predict_CANet.py are for model testing process.
Take STParNet as an example, all of the files for model construction, model training and testing are stored under the same directory ./STParNet/. Before training, you should import the model you want to train by code like from par_DW_LSTM4 import ParDWLSTM. Here, the par_DW_LSTM4 is the name of python file which contructs the model and ParDWLSTM is the main class of the model. You should also set the path of input images. The image_path refers to the directory where all the images are located, the root in train_dataset refers to the directory of training set and the root in validate_dataset refers to the directory of validation set. Besides, save_path refers to the path for saving the file of optimal weights and parameters, which is in .pth format.
Similarly, when testing models, we may need to change the data path used for testing at image_path and the path to load the trained model parameter file at model_weight_path in predict.py.
The codes for model size testing are uploaded in ./MemoryTest/. Running each program can obtain the memory consumption of each part of the model.
The code of resource adaption algorithm is uploaded in ./ResouceAdaptationModel/. The specific parameters in code need to be set according to the actual requirements.