OpenREAD: Reinforced Open-Ended Reasoning for End-to-End Autonomous Driving with LLM-as-Critic

Songyan Zhang^1*, Wenhui Huang^2*, Zhan Chen¹, Chua Jiahao Collister¹, Qihang Huang¹, Chen Lv^1†

Nanyang Technological University¹, Harvard University²

*Equal Contributions, †Corresponding Author

An overview of the framework of our OpenREAD.

✨Capabilities

An overview of the capability of our proposed OpenREAD, a vision-language model tailored for autonomous driving by reinforcement learning with GRPO. Besides the trajectory planning, our OpenREAD is also capable of providing reasoning-enhanced response for open-ended scenario understanding, action analysis, etc.

🦙 Data & Model Zoo

Our OpenREAD is built upon the Qwen3-VL-8B and finetuned on a mixture of datasets including LingoQA, OmniDrive, and NuScenes datasets. Our OpenREAD is now available at huggingface. Enjoy playing with it!

To facalitate the learning of reasoning capability at the cold start stage, we construct a large scale of CoT annotations on the LingoQA and NuScenes datasets as shown above. We further extend the amount of annations for LingoQA from 7K to 11K. All the CoT annotations are available here.

🛠️ Install

Clone this repository and navigate to OpenREAD folder

git clone https://github.com/wyddmw/OpenREAD
cd OpenREAD

Install ms-swift package

conda create -n openread python=3.10 -y
conda activate openread
pip install -e .

Install Flash-Attention.

  pip install flash_attn=2.8.3 --no-build-isolation

If the installation is not compatable for your device and environment, please refer to the source code and install the suitable version.

Install Qwen3-VL dependicies.

  pip install "transformers==4.57" "qwen_vl_utils==0.0.14"

🪜 Training & Evaluation

Datasets

The datasets used to train OpenREAD are as follows:

Please download our pre-processed Lidar-BEV images for the NuScenes dataset. For trajectory evaluation, we use the GT cache introduced in GPT-Driver. Please download the GT cache from Google Drive The datasets are organized in the following structure:

data
├── LingoQA
│   ├── action
│   │   └── images
│   ├── evaluation
│   │   │── images
│   │   └── val.parquet
│   ├── scenery
│   │   └── images
│   ├── training_data.json
│   └── evaluation_data.json
├── nuscenes
│   ├── samples
│   │   ├── CAM_FRONT
│   │   ├── LIDAR_BEV
│   ├── gt
│   │   │── vad_gt_seg.pkl
│   │   └── gt_traj_mask.pkl
│   traj_val_bev_ego_status.json
│

It is recommended to symlink your dataset root to data:

Evaluate on the LingoQA dataset.

Before running the evaluation script, please first download the pretrained Lingo-Judge. Check the path of LingoQA dataset and LingoJudge pretrained model in the eval/LingoQA/eval_lingo.sh.

sh eval/LingoQA/eval_lingo.sh

The predictions, Lingo-Judge, CIDEr, Meteor, and BLEU metrics will be saved to the eval/LingoQA/lingoqa_results_OpenREAD.json.

Evaluation on the NuScenes Trajectory Benchmark

We also provide scripts to evaluate trajectory prediction quality on the NuScenes validation set using both STP-3 and UniAD metrics. Update the trained model path, eval_file path, training mode, and inference outputs path in the eval/Trajectory/infer_trajs_dist.sh, then run trajectory inference:

bash eval/Trajectory/infer_trajs_dist.sh

This script generates trajectory prediction JSON files under the directory specified by inference outputs path. Next, update the trajectory inference outputs path inside eval/Trajectory/eval_trajs.py, Then compute both STP-3 and UniAD metrics by running:

python eval/Trajectory/eval_trajs.py

🔨 TODO LIST

[✓] Release hugging face model, inference and eval scripts.
[✓] Release CoT data.
Release training code.

Acknowledgment

We appreciate the awesome open-source project of ms-swift, OmniDrive, and GPT-Driver.

✏️ Citation

If you find OpenREAD is useful in your research or applications, please consider giving a star ⭐ and citing it with the following BibTeX:

@article{zhang2025openread,
  title={OpenREAD: Reinforced Open-Ended Reasoing for End-to-End Autonomous Driving with LLM-as-Critic},
  author={Zhang, Songyan and Huang, Wenhui and Chen, Zhan and Collister, Chua Jiahao and Huang, Qihang and Lv, Chen},
  journal={arXiv preprint arXiv:2512.01830},
  year={2025}
}

@article{zhang2024wisead,
  title={WiseAD: Knowledge Augmented End-to-End Autonomous Driving with Vision-Language Model},
  author={Zhang, Songyan and Huang, Wenhui and Gao, Zihui and Chen, Hao and Lv, Chen},
  journal={arXiv preprint arXiv:2412.09951},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.dev_scripts		.dev_scripts
.github/workflows		.github/workflows
asset		asset
docs		docs
eval		eval
examples		examples
miscs		miscs
requirements		requirements
scripts		scripts
swift		swift
tests		tests
.gitignore		.gitignore
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.md		README.md
requirement.txt		requirement.txt
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

OpenREAD: Reinforced Open-Ended Reasoning for End-to-End Autonomous Driving with LLM-as-Critic

✨Capabilities

🦙 Data & Model Zoo

🛠️ Install

🪜 Training & Evaluation

Datasets

Evaluate on the LingoQA dataset.

Evaluation on the NuScenes Trajectory Benchmark

🔨 TODO LIST

Acknowledgment

✏️ Citation

About

Uh oh!

Contributors 2

Uh oh!

Languages

wyddmw/OpenREAD

Folders and files

Latest commit

History

Repository files navigation

OpenREAD: Reinforced Open-Ended Reasoning for End-to-End Autonomous Driving with LLM-as-Critic

✨Capabilities

🦙 Data & Model Zoo

🛠️ Install

🪜 Training & Evaluation

Datasets

Evaluate on the LingoQA dataset.

Evaluation on the NuScenes Trajectory Benchmark

🔨 TODO LIST

Acknowledgment

✏️ Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Contributors 2

Uh oh!

Languages