Thanks to visit codestin.com
Credit goes to Github.com

Skip to content

This is the official implementation of "OpenREAD:Reinforced Open-Ended Reasoning for End-to-End Autonomous Driving with LLM-as-Critic"

Notifications You must be signed in to change notification settings

wyddmw/OpenREAD

Repository files navigation

OpenREAD: Reinforced Open-Ended Reasoning for End-to-End Autonomous Driving with LLM-as-Critic

arxiv paper 🤗 HuggingFace models 🤗 HuggingFace datasets

Songyan Zhang1*, Wenhui Huang2*, Zhan Chen1, Chua Jiahao Collister1, Qihang Huang1, Chen Lv1†

Nanyang Technological University1, Harvard University2

*Equal Contributions, †Corresponding Author


An overview of the framework of our OpenREAD.

✨Capabilities

An overview of the capability of our proposed OpenREAD, a vision-language model tailored for autonomous driving by reinforcement learning with GRPO. Besides the trajectory planning, our OpenREAD is also capable of providing reasoning-enhanced response for open-ended scenario understanding, action analysis, etc.

🦙 Data & Model Zoo

Our OpenREAD is built upon the Qwen3-VL-8B and finetuned on a mixture of datasets including LingoQA, OmniDrive, and NuScenes datasets. Our OpenREAD is now available at huggingface. Enjoy playing with it!

To facalitate the learning of reasoning capability at the cold start stage, we construct a large scale of CoT annotations on the LingoQA and NuScenes datasets as shown above. We further extend the amount of annations for LingoQA from 7K to 11K. All the CoT annotations are available here.

🛠️ Install

  1. Clone this repository and navigate to OpenREAD folder
git clone https://github.com/wyddmw/OpenREAD
cd OpenREAD
  1. Install ms-swift package
conda create -n openread python=3.10 -y
conda activate openread
pip install -e .
  1. Install Flash-Attention.
  pip install flash_attn=2.8.3 --no-build-isolation

If the installation is not compatable for your device and environment, please refer to the source code and install the suitable version.

  1. Install Qwen3-VL dependicies.
  pip install "transformers==4.57" "qwen_vl_utils==0.0.14"

🪜 Training & Evaluation

Datasets

The datasets used to train OpenREAD are as follows:

Please download our pre-processed Lidar-BEV images for the NuScenes dataset. For trajectory evaluation, we use the GT cache introduced in GPT-Driver. Please download the GT cache from Google Drive The datasets are organized in the following structure:

data
├── LingoQA
│   ├── action
│   │   └── images
│   ├── evaluation
│   │   │── images
│   │   └── val.parquet
│   ├── scenery
│   │   └── images
│   ├── training_data.json
│   └── evaluation_data.json
├── nuscenes
│   ├── samples
│   │   ├── CAM_FRONT
│   │   ├── LIDAR_BEV
│   ├── gt
│   │   │── vad_gt_seg.pkl
│   │   └── gt_traj_mask.pkl
│   traj_val_bev_ego_status.json
│   

It is recommended to symlink your dataset root to data:

Evaluate on the LingoQA dataset.

Before running the evaluation script, please first download the pretrained Lingo-Judge. Check the path of LingoQA dataset and LingoJudge pretrained model in the eval/LingoQA/eval_lingo.sh.

sh eval/LingoQA/eval_lingo.sh

The predictions, Lingo-Judge, CIDEr, Meteor, and BLEU metrics will be saved to the eval/LingoQA/lingoqa_results_OpenREAD.json.

Evaluation on the NuScenes Trajectory Benchmark

We also provide scripts to evaluate trajectory prediction quality on the NuScenes validation set using both STP-3 and UniAD metrics. Update the trained model path, eval_file path, training mode, and inference outputs path in the eval/Trajectory/infer_trajs_dist.sh, then run trajectory inference:

bash eval/Trajectory/infer_trajs_dist.sh

This script generates trajectory prediction JSON files under the directory specified by inference outputs path. Next, update the trajectory inference outputs path inside eval/Trajectory/eval_trajs.py, Then compute both STP-3 and UniAD metrics by running:

python eval/Trajectory/eval_trajs.py

🔨 TODO LIST

  • [✓] Release hugging face model, inference and eval scripts.
  • [✓] Release CoT data.
  • Release training code.

Acknowledgment

We appreciate the awesome open-source project of ms-swift, OmniDrive, and GPT-Driver.

✏️ Citation

If you find OpenREAD is useful in your research or applications, please consider giving a star ⭐ and citing it with the following BibTeX:

@article{zhang2025openread,
  title={OpenREAD: Reinforced Open-Ended Reasoing for End-to-End Autonomous Driving with LLM-as-Critic},
  author={Zhang, Songyan and Huang, Wenhui and Chen, Zhan and Collister, Chua Jiahao and Huang, Qihang and Lv, Chen},
  journal={arXiv preprint arXiv:2512.01830},
  year={2025}
}
@article{zhang2024wisead,
  title={WiseAD: Knowledge Augmented End-to-End Autonomous Driving with Vision-Language Model},
  author={Zhang, Songyan and Huang, Wenhui and Gao, Zihui and Chen, Hao and Lv, Chen},
  journal={arXiv preprint arXiv:2412.09951},
  year={2024}
}

About

This is the official implementation of "OpenREAD:Reinforced Open-Ended Reasoning for End-to-End Autonomous Driving with LLM-as-Critic"

Resources

Stars

Watchers

Forks

Contributors 2

  •  
  •  

Languages