NuRisk

A Visual Question Answering Dataset for Agent-Level Risk Assessment in Autonomous Driving (ICRA 2026)

This repository contains the end-to-end pipeline that produces the NuRisk VQA dataset — from raw CommonRoad scenarios to a parquet-backed visual question answering dataset ready for VLM fine-tuning.

Pipeline overview

flowchart LR
    A[1083 CommonRoad<br/>XML scenarios] --> B[Stage 1<br/>Frenetix<br/>Motion Planner]
    B --> C[Stage 2<br/>Trajectory<br/>extraction]
    C --> D[Stage 3<br/>Safety<br/>metrics]
    D --> E[Stage 4<br/>Close-agent ID<br/>+ risk scoring]
    E --> F[Stage 5<br/>VQA generation<br/>+ HF upload]
    F --> G[Stage 6<br/>VLM<br/>fine-tuning]

Stage	Folder / external repo	Output
1	`TUM-AVS/Frenetix-Motion-Planner` (external)	Per-scenario trajectory logs + BEV PNGs
2	`Trajectory_collection/`	Ego + dynamic-obstacle CSVs with lanelet info
3	`Safety_metrics_collection/`	`relative_metrics.csv` (TTC, relative pos/vel per agent)
4	`CloesdID_identification/` + `Riskscore_calculation/`	`risk_scores_output_enhanced.json` (0–5 risk scale per agent)
5	`finetune_preprocess/`	Parquet VQA dataset pushed to `Yuan-avs/Nurisk`
6	`training/` (launcher + DeepSpeed configs) wrapping `2U1/Qwen-VL-Series-Finetune`	LoRA-fine-tuned Qwen2.5-VL checkpoint (`NuRisk-VLM-Agent`)

Setup

git clone https://github.com/TUM-AVS/NuRisk.git
cd NuRisk
pip install -r requirements.txt

Then edit config.py to point the path constants (XML_SCENARIOS_DIR, LOGS_DIR, VLM_DATASET_DIR, …) at your local data layout. A few individual scripts still carry hard-coded /home/yuan/... paths — search & replace before running them, or override via their CLI flags.

Or run everything in a container — there's a pre-configured CUDA + PyTorch + DeepSpeed image:

docker compose up -d nurisk-train
docker compose exec nurisk-train bash

See DOCKER_SETUP.md for GPU driver + container details.

Fetch the published VQA dataset (parquet + images, ~250 GB once decoded):

bash download_dataset.sh ./Nurisk_dataset
# or, in Python:
# from datasets import load_dataset; load_dataset("Yuan-avs/Nurisk")

Stage 1 — Scenario simulation with Frenetix

The full data-preparation pipeline (stages 1–4) is run on a curated collision-rich set of 1083 CommonRoad XML scenarios (the Frenetix dataset/scenarios_collision/ pool). After downstream cleaning in stage 5, 888 scenarios appear in the published Yuan-avs/Nurisk fine-tuning dataset. All 1083 IDs are listed in scenarios.txt — the 888 that survived into the final dataset are marked with *.

The XMLs themselves are not redistributed here. All scenarios used by this pipeline are publicly available in the official CommonRoad scenario database at https://commonroad.in.tum.de/scenarios/ — search by the IDs in scenarios.txt, download the XMLs, and point XML_SCENARIOS_DIR in config.py at the directory you place them in.

Run the Frenetix Motion Planner on every scenario.

Frenetix produces, per scenario:

logs.csv — ego state across time
BEV PNG snapshots per timestep (used in stage 5 as VQA image input)

Refer to the Frenetix README for installation and execution. The remainder of this pipeline assumes the per-scenario Frenetix log directories live under one root (configurable via LOGS_DIR in config.py).

Stage 2 — Trajectory extraction

cd Trajectory_collection
python main_multi.py

Input: Frenetix logs + raw CommonRoad XML scenario files Output (per scenario):

ego_trajectory_positions_with_lanelets.csv
dynamic_obstacles_with_lanelets.csv
dynamic_obstacles.csv
ego_trajectory.csv
<scenario_name>.json (CommonRoad metadata)

Parallelism is set via the num_cpus constant in the script.

Stage 3 — Safety metrics

cd Safety_metrics_collection
python safety_multi.py

For every (ego, agent, timestep) tuple this computes:

Longitudinal / lateral relative distance
Relative velocity (front / rear / left / right)
Time-to-collision (TTC)

Output (per scenario): relative_metrics.csv

Stage 4 — Close-agent identification and risk scoring

4a. Filter to close obstacles

cd CloesdID_identification
python extract_close_metrics_multi.py

Output: close_relative_metrics.csv (per scenario)

4b. Compute risk scores with explanations

cd Riskscore_calculation
python risk_score_calculator_multi_enhanced.py

Output (per scenario):

risk_scores_output_enhanced.json — full per-agent reasoning trace
risk_scores_close_relative_metrics_enhanced.csv
risk_score_summary_enhanced.csv

Risk scale (set in config.py):

Score	Label	Min. distance	TTC
0	Collision	< 0.3 m	< 0.15 s
1	Extreme	< 0.8 m	< 0.65 s
2	High	< 1.3 m	< 1.15 s
3	Medium	< 3.0 m	< 3.0 s
4	Low	< 5.0 m	< 5.0 s
5	Negligible	≥ 5.0 m	≥ 5.0 s

Lateral vs. longitudinal weighting and ego dimensions (EGO_LENGTH = 4.508 m, EGO_WIDTH = 1.610 m) are also defined in config.py.

Stage 5 — VQA dataset creation and HF upload

cd finetune_preprocess
python create_groundtruth_dataset.py             # 1. link BEV images ↔ risk scores
python prepare_sequential_dataset.py             # 2. stack 5 consecutive BEV frames per sample
python create_sequential_groundtruth_dataset.py  # 3. attach risk labels to the 5-frame stacks
python create_qwen_finetune_dataset.py           # 4. emit LLaVA-style conversation pairs
python prepare_dataset_splits.py                 # 5. train / validation split
python convert_to_vqa_parquet_fixed.py           # 6. → parquet (image / question / answer)
python upload_vqa_to_hf.py --repo-name <user>/<repo>  # 7. push to the Hugging Face Hub

The final artifact is the public dataset Yuan-avs/Nurisk — 60 train + 7 validation parquet shards with three columns per row: image (5-frame BEV stack), question (driving-risk prompt referencing a specific agent), answer (structured JSON with risk score, level, direction, reasoning).

See finetune_preprocess/README.md for a more detailed view of each sub-step.

Stage 6 — VLM fine-tuning

We fine-tune Qwen2.5-VL on the NuRisk dataset using 2U1/Qwen-VL-Series-Finetune. The launcher and DeepSpeed configs are bundled in training/; the upstream training code is referenced as a sibling clone (see training/README.md).

# 1. Fetch the dataset
bash download_dataset.sh ./Nurisk_dataset
# 2. Clone the upstream training code next to this repo
git clone https://github.com/2U1/Qwen-VL-Series-Finetune.git ../Qwen-VL-Series-Finetune
# 3. Register the local dataset in the upstream data_config.json, then launch:
bash training/train_risk_analysis.sh

Typical LoRA configuration used in the paper:

Hyper-parameter	Value
Base model	`Qwen/Qwen2.5-VL-7B-Instruct`
LoRA rank	64 (best) / 256 (high-capacity variant)
LoRA target modules	`q_proj`, `k_proj`, `v_proj`, `o_proj`
Precision	bf16
Sequence length	2048 prompt / 1024 completion
DeepSpeed	ZeRO-3 (`training/scripts/zero3.json`)

The resulting checkpoint is referred to in the paper as NuRisk-VLM-Agent.

Quick start (full pipeline)

After editing config.py:

python run_pipeline.py --stage all

Or run a single stage: --stage {trajectory, safety, close, risk, vml}.

Extending to other driving datasets

The same pipeline (stages 2–5) generalises to other sources of ego/agent trajectories + BEV imagery. To target a real-world driving dataset such as nuScenes or Waymo Open, simply swap stage 1 for that dataset's trajectory + BEV producer and keep stages 2–5 as-is — the risk scoring and VQA generation logic are dataset-agnostic.

Citation

@article{gao2025nurisk,
  title   = {NuRisk: A Visual Question Answering Dataset for Agent-Level Risk Assessment in Autonomous Driving},
  author  = {Gao, Yuan and Piccinini, Mattia and Brusnicki, Roberto and Zhang, Yuchen and Betz, Johannes},
  journal = {arXiv preprint arXiv:2509.25944},
  year    = {2025},
  url     = {https://arxiv.org/abs/2509.25944}
}

License

Released under the MIT License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NuRisk

Pipeline overview

Setup

Stage 1 — Scenario simulation with Frenetix

Stage 2 — Trajectory extraction

Stage 3 — Safety metrics

Stage 4 — Close-agent identification and risk scoring

4a. Filter to close obstacles

4b. Compute risk scores with explanations

Stage 5 — VQA dataset creation and HF upload

Stage 6 — VLM fine-tuning

Quick start (full pipeline)

Extending to other driving datasets

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
CloesdID_identification		CloesdID_identification
Riskscore_calculation		Riskscore_calculation
Safety_metrics_collection		Safety_metrics_collection
Trajectory_collection		Trajectory_collection
finetune_preprocess		finetune_preprocess
training		training
.gitignore		.gitignore
DOCKER_SETUP.md		DOCKER_SETUP.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
config.py		config.py
docker-compose.yml		docker-compose.yml
docker-run.sh		docker-run.sh
download_dataset.sh		download_dataset.sh
requirements.txt		requirements.txt
run_pipeline.py		run_pipeline.py
scenarios.txt		scenarios.txt

Folders and files

Latest commit

History

Repository files navigation

NuRisk

Pipeline overview

Setup

Stage 1 — Scenario simulation with Frenetix

Stage 2 — Trajectory extraction

Stage 3 — Safety metrics

Stage 4 — Close-agent identification and risk scoring

4a. Filter to close obstacles

4b. Compute risk scores with explanations

Stage 5 — VQA dataset creation and HF upload

Stage 6 — VLM fine-tuning

Quick start (full pipeline)

Extending to other driving datasets

Citation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages