A framework to predict driver's maneuver behaviors.
This repository presents a PyTorch implementation of the Driver Intent Prediction model from our paper "KaAI-DD: Holistic Driving Dataset for Predicting Driver Gaze and Intention".
Our framework builds on the foundational work of "Driver Intention Anticipation Based on In-Cabin and Driving Scene Monitoring", with modifications and enhancements to benchmark performance using the KaAI dataset, which we developed. Below, we explain how to test our dataset using this framework.
Architecture of our driver intent prediction model
| Model | Dataset | Accuracy (%) |
|---|---|---|
| Baseline | Brain4Cars (In-Cabin) | 77.40 |
| Baseline | Brain4Cars (In-Cabin + Out-Cabin) | 83.98 |
| Baseline | Our Dataset (In-Cabin) | 83.65 |
| Baseline | Our Dataset (In-Cabin + Out-Cabin) | 85.85 |
| Enhanced | Our Dataset (In-Cabin + Out-Cabin + Gaze) | 86.85 |
We have prepared the KaAI 5s Dataset, which can be downloaded from this link. The dataset includes the following components:
- road_camera: Videos recorded by a camera facing the road.
- face_camera: Videos recorded by a camera facing the driver.
- Gaze & CAN: Data containing the driver's gaze and vehicle CAN signals.
To use our dataset with this framework:
- Place the
road_camera,face_camera, andGaze&CANfolders inside theannotation_kaaidirectory. - Split the dataset using 5-fold cross-validation by running the
n_fold_Brain4cars.pyscript in thedatasets/annotation_kaaidirectory.- You can also use the pre-generated
.csvfiles available in thedatasets/annotation_kaaidirectory to skip this step.
- You can also use the pre-generated
The 3D-ResNet50 network, along with its pretrained model, is adapted from the work in 3D ResNets. We express our gratitude to the original authors, and have modified the implementation to better suit our specific dataset and task.
Before running the run-3DResnet.sh script, set the following paths:
root_path: Path to this project.annotation_path: Path to the annotation directory in this project.video_path: Path to the image frames of driver videos.pretrain_path: Path to the pretrained 3D ResNet50 model.
Important Notes:
n_fold: The fold number, ranging from 0 to 4.sample_duration: Length of input videos (16 frames).end_second: The time before the maneuver from which frames are input (ranging from 1 to 5 seconds).
For more details on other arguments, refer to opt.py.
The model trained using our script is available here. The model name is save_best_3DResNet50.pth.
We utilized FlowNet 2.0 to extract the optical flow of all outside images, which we then used in our ConvLSTM network. The optical flow images can also be found here.
Our ConvLSTM network is adapted and extended from the work in this repo. We express our appreciation to the creators of these foundational projects.
Before running the run-ConvLSTM.sh script, set the following paths:
root_path: Path to this project.annotation_path: Path to the annotation directory in this project.video_path: Path to the image frames of optical flow images.
Important Notes:
n_fold: The fold number, ranging from 0 to 4.sample_duration: Length of input videos (5 frames).interval: Interval between frames in the input clip (between 5 and 30).end_second: The time before the maneuver from which frames are input (ranging from 1 to 5 seconds).