[CVPR'25, Highlight] SimLingo: Vision-Only Closed-Loop Autonomous Driving with Language-Action Alignment
SimLingo is a Vision-Language-Action (VLA) model that achieves state-of-the-art driving performance on the CARLA Leaderboard and Bench2Drive, while simultaniously including language capabilities like VQA, commentary, and instruction following.
This repository is based on Carla Garage and includes the PDM-lite expert, data collection code, language label generation, dreaming data generation, training of the base and final model, and evaluation of closed-loop driving and the language capabilities.
[2025/06/25]We released the simlingo model checkpoints and inference code.[2025/05/26]We released the full dataset on huggingface.[2025/05/08]Initial code release.[2025/04/28]SimLingo is accepted to CVPR as a highlight paper.
Clone the repository, setup CARLA 0.9.15, and build the conda environment:
git clone [email protected]:RenzKa/simlingo.git
cd simlingo
chmod +x setup_carla.sh
./setup_carla.sh
# Create base environment
conda env create -f environment.yaml
conda activate simlingo
# Install PyTorch separately to ensure correct CUDA version
pip install torch==2.2.0
# Install flash-attn separately
pip install flash-attn==2.7.0.post2Before running the code, you will need to add the following paths to PYTHONPATH on your system:
export CARLA_ROOT=/path/to/CARLA/root
export WORK_DIR=/path/to/simlingo
export PYTHONPATH=$PYTHONPATH:${CARLA_ROOT}/PythonAPI/carla
export SCENARIO_RUNNER_ROOT=${WORK_DIR}/scenario_runner
export LEADERBOARD_ROOT=${WORK_DIR}/leaderboard
export PYTHONPATH="${CARLA_ROOT}/PythonAPI/carla/":"${SCENARIO_RUNNER_ROOT}":"${LEADERBOARD_ROOT}":${PYTHONPATH}The main structure of this repository is taken from Carla Garage. Please check it out for more detailed information.
CARLA: We have the leaderboard_autopilot and scenario_runner_autopilot folders for running data collection. The leaderboard and scenario_runner folder are currently mostly unused (they just contain the route files for evaluation) but can be used to run evaluation on the CARLA eval routes or longest6_v2 or other benchmarks (see Carla Garage). The folder Bench2Drive (with its own leaderboard and scenario_runner folders) is used to run closed-loop eval on the Bench2Drive Benchmark. The team_code folder is used for all files to run closed-loop agents in carla (for the expert, simlingo and simlingo_base).
Training: simlingo_base_training and simlingo_training contain all files to run training. simlingo_training also contains the files to start the language evaluation.
Dataset: Our dataset is stored in a folder called database.
You can find our dataset here: https://huggingface.co/datasets/RenzKa/simlingo The uploaded data contains the driving dataset, VQA, Commentary, and Dreamer labels.
# Clone the repository
git clone https://huggingface.co/datasets/RenzKa/simlingo
# Navigate to the directory
cd simlingo
# Pull the LFS files
git lfs pull# Download individual files (replace with actual file URLs from Hugging Face)
wget https://huggingface.co/datasets/RenzKa/simlingo/resolve/main/[filename].tar.gz# Create output directory
mkdir -p database/simlingo
# Extract all archives to the same directory
for file in *.tar.gz; do
echo "Extracting $file to database/simlingo/..."
tar -xzf "$file" -C database/simlingo/
doneIf you download our dataset from Huggingface, you don't need to follow any of the steps from this section. If you only want to perfrom closed-loop driving evaluation, there is no need to download our dataset.
This repository uses the open-source expert PDM-Lite from the paper DriveLM to generate the driving dataset. Most of the code for the data collection is taken from Carla Garage. However, we changed some hyperparameter and used the data_agent from DriveLM which saves the required auxiliary information during data collection which is needed to generate the VQA and commentary data.
Generate driving data: To re-generate the data, we provide a script for a SLURM cluster, which parallelizes data collection across many GPUs (2080ti in our case). First, adjust the paths etc. in lines 213-230 of collect_dataset_slurm.py. You can specify the SLURM partition in partition.txt and change it during runtime. max_num_jobs.txt specifies how many parallel SLURM jobs are submitted. This can also be changed during runtime. The data collection is started via sbatch 0_run_collect_dataset_slurm.sh, which calls collect_dataset_slurm.py.
Increase the number in max_num_jobs.txt once your setup works.
Dataset cleaning: After the dataset is collected you can use dataset_generation/delete_failed_runs.py and dataset_generation/delete_infraction_routes.py to delete routes where the expert failed or carla crashed and the routes had to be restarted.
Route files: The routes for data collection are stored in data/simlingo. Note: These are different route files as used in the Carla Garage. To generate our route files, you can use the following script that generates our modified route files from the original Carla route files: bash dataset_generation/split_route_files.sh
This splits the long training and validation route files provided by Carla into short routes with max 1 or 3 scenarios and balances and upsamples the scenarios.
PDM-Lite uses a modified version of the CARLA leaderboard that exposes additional information about the scenarios and makes data collection easier. They can be found in the leaderboard_autopilot and scenario_runner_autopilot folders.
The dataset provided in this repository is not perfect. At some point while improving the model, you will likely need to collect an improved version.
Our bucket file is included in the released dataset. Check out our Huggingface repo.
If you want to generate your own buckets you can use the script dataset_generation/data_buckets/carla_get_buckets.py.
VQA (DriveLM): We use the script (with minor modifications) from DriveLM to generate VQA labels. You can run dataset_generation/language_labels/drivelm/carla_vqa_generator_main.py to generate the VQA labels for your dataset. We used ChatGPT to augment the questions and answers. We provide the augmented templates, which we load during training in the folder data/augmented_templates/drivelm_train_augmented_v2. An example script to generate those augmented sentences can be found here: dataset_generation/get_augmentations/gpt_augment_vqa.py. Note: To be able to generate the VQA labels we save many auxiliary information of the simulator state during data collection. If you use a different dataset, it is likely that this labelling script does not work.
Commentary: In this work we provide a new script to generate commentary labels. To generate commentary labels for your dataset, run dataset_generation/language_labels/commentary/carla_commentary_generator_main.py. We used ChatGPT to augment the questions and answers. We provide the augmented templates, which we load during training in the folder data/augmented_templates/commentary_augmented.json. Unfortunately, based on how the project evolved the augmentations were first done manually for subsentences and later merged. If helpful, we provide the subsentence level augmentationes here and the script to merge those to the final ones here. Note: To be able to generate the commentary labels, we save auxiliary information of the simulator state during data collection. If you use a different dataset, it is likely that this labelling script does not work.
File structure:
"image": # Path to RGB image
"commentary": # Commentary string (not augmented)
"commentary_template": # Commentary with placeholders for changing parts (e.g., object description, location). This is used to retrieve the augmentations.
"cause_object_visible_in_image": # Whether the object that causes the expert actions is visible in the front view image. Could be used to filter samples where the commentary describes an action based on an object not visible.
"cause_object": # Dictionary with attributes of the object causing the expert action.
"cause_object_string": # Language description of the cause object (e.g., dark green car that is to the front)
"scenario_name": # Name of the active CARLA scenario
"placeholder": # Dictionary to be able to replace the placeholders in commentary_template.To improve the alignment of language and actions, we propose Action Dreaming for which we provide a dataset with multiple different future trajectories given a language instruction. The language instructions cover a wide range of modes (e.g., speed changes, lane changes, object-centric navigation, crashes) with a label indicating whether the execution is allowed and safe or not.
To generate the labels, run dataset_generation/dreamer_data/dreamer_generator.py.
File structure:
category: # e.g. "target_speed", "stop", "faster", "crash", ...
"waypoints": # Dreaming waypoints
"route": # Dreaming path
"rgb_path": # Path to RGB image in dataset
"allowed": # Flag if execution is allowed.
"mode": # category
"info": # more information, e.g., about current, target, and final speed
"route_reasoning": # Language description about the route.
"dreamer_instruction": # Language instruction.
"instructions_templates": # Instruction with placeholders for changing parts (e.g., object description, location). This is used to retrieve the augmentations.
"templates_placeholders": # Dictionary to be able to replace the placeholders in commentary_template.
"dreamer_answer_safety": # Answer when safety mode is activated.
"safe_to_execute": # Flag if the instruction is safe to execute.We provide code for the smaller model SimLingo-Base (previously CarLLaVA - without language capabilities) in the folder simlingo_base_training and for the full model SimLingo in simlingo_training. For the config managment we use hydra. The config parameters are defined in the config.py file and can be adjusted in the .yaml files inside the config folder. Note: You should double check if the paths to the dataset is correct.
We provide a SLURM script to start training: train_simlingo_seed1.sh. This can be easily converted to a bash script to locally start the training. The entry file for training is simlingo_training/train.py.
With the default config, the training logs to Wandb. Login is required. We also include a visualization callback that plots ground truth and predicted waypoints during training.
The model file can be downloaded from huggingface: https://huggingface.co/RenzKa/simlingo. If you only want to perfrom closed-loop driving evaluation, there is no need to download our dataset.
Bench2Drive is a CARLA benchmark proposed by the paper Bench2Drive: Towards Multi-Ability Benchmarking of Closed-Loop End-To-End Autonomous Driving. It consists of 220 very short (~150m) routes split across all towns with 1 safety critical scenario in each route. Since it uses all towns for training, the methods have seen the test towns during training, so it can be considered a 'training' benchmark (reminiscent of level 4 driving). The benchmark also comes with a training dataset generated by the Think2Drive expert, but we use the open-source expert PDM-Lite that achieves better resuslts and can be adapted to collect the necessary labels to produce VQA, Commentary and Dreamer data. The benchmark and additional instructions can be found in the Bench2Drive folder.
Start eval: Evaluation on a SLURM cluster can be run with start_eval_simlingo.py. The config dictionary needs to be adjusted with the correct names and paths. Most things that need to be changed are marked with TODO tags in start_eval_simlingo.py.
Get results: The script Bench2Drive/tools/merge_route_json.py can be used to obtain the final metrics after the evaluation is done. Make sure that all 220 routes are evaluated.
The Bench2Drive folder is based on version 0.0.3 of the Bench2Drive repository. Please cite the Bench2Drive paper when using the benchmark.
NOTE: Files might get cleaned at some point in the future (maybe not, depending on my time). Since the dataset and model are a reproduction and not the original ones from the paper, numbers deviate slightly. However, conclusions drawn in the paper still hold. We will update the numbers shortly.
Entry point for the language evaluation is simlingo_training/eval.py. Please change the variable eval_mode to QA, commentary or Dreaming.
Afterwards, to obtain the metrics you can run simlingo_training/eval_metrics.py. For this you first need to specify an OpenAI key here: simlingo_training/utils/gpt_eval.py
If you find this repository useful, please consider giving us a star 🌟. Please cite the following papers for the respective components of the repo:
SimLingo:
@InProceedings{Renz2025cvpr,
title={SimLingo: Vision-Only Closed-Loop Autonomous Driving with Language-Action Alignment},
author={Renz, Katrin and Chen, Long and Arani, Elahe and Sinavski, Oleg},
booktitle={Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2025}
}PDM-Lite expert:
@inproceedings{Sima2024ECCV,
title={DriveLM: Driving with Graph Visual Question Answering},
author={Chonghao Sima and Katrin Renz and Kashyap Chitta and Li Chen and Hanxue Zhang and Chengen Xie and Jens Beißwenger and Ping Luo and Andreas Geiger and Hongyang Li},
booktitle={Proc. of the European Conf. on Computer Vision (ECCV)},
year={2024}
}Bench2Drive benchmark:
@inproceedings{Jia2024NeurIPS,
title={Bench2Drive: Towards Multi-Ability Benchmarking of Closed-Loop End-To-End Autonomous Driving},
author={Xiaosong Jia and Zhenjie Yang and Qifeng Li and Zhiyuan Zhang and Junchi Yan},
booktitle={NeurIPS 2024 Datasets and Benchmarks Track},
year={2024}
}- tuPlan garage | CARLA garage | Survey on E2EAD
- DriveLM | PlanT | KING | TransFuser | NEAT