Thanks to visit codestin.com
Credit goes to github.com

Skip to content

TUM-AVS/NuRisk

Repository files navigation

NuRisk

A Visual Question Answering Dataset for Agent-Level Risk Assessment in Autonomous Driving (ICRA 2026)

arXiv Hugging Face Dataset Project Page License: MIT

This repository contains the end-to-end pipeline that produces the NuRisk VQA dataset — from raw CommonRoad scenarios to a parquet-backed visual question answering dataset ready for VLM fine-tuning.

NuRisk framework


Pipeline overview

flowchart LR
    A[1083 CommonRoad<br/>XML scenarios] --> B[Stage 1<br/>Frenetix<br/>Motion Planner]
    B --> C[Stage 2<br/>Trajectory<br/>extraction]
    C --> D[Stage 3<br/>Safety<br/>metrics]
    D --> E[Stage 4<br/>Close-agent ID<br/>+ risk scoring]
    E --> F[Stage 5<br/>VQA generation<br/>+ HF upload]
    F --> G[Stage 6<br/>VLM<br/>fine-tuning]
Loading
Stage Folder / external repo Output
1 TUM-AVS/Frenetix-Motion-Planner (external) Per-scenario trajectory logs + BEV PNGs
2 Trajectory_collection/ Ego + dynamic-obstacle CSVs with lanelet info
3 Safety_metrics_collection/ relative_metrics.csv (TTC, relative pos/vel per agent)
4 CloesdID_identification/ + Riskscore_calculation/ risk_scores_output_enhanced.json (0–5 risk scale per agent)
5 finetune_preprocess/ Parquet VQA dataset pushed to Yuan-avs/Nurisk
6 training/ (launcher + DeepSpeed configs) wrapping 2U1/Qwen-VL-Series-Finetune LoRA-fine-tuned Qwen2.5-VL checkpoint (NuRisk-VLM-Agent)

Setup

git clone https://github.com/TUM-AVS/NuRisk.git
cd NuRisk
pip install -r requirements.txt

Then edit config.py to point the path constants (XML_SCENARIOS_DIR, LOGS_DIR, VLM_DATASET_DIR, …) at your local data layout. A few individual scripts still carry hard-coded /home/yuan/... paths — search & replace before running them, or override via their CLI flags.

Or run everything in a container — there's a pre-configured CUDA + PyTorch + DeepSpeed image:

docker compose up -d nurisk-train
docker compose exec nurisk-train bash

See DOCKER_SETUP.md for GPU driver + container details.

Fetch the published VQA dataset (parquet + images, ~250 GB once decoded):

bash download_dataset.sh ./Nurisk_dataset
# or, in Python:
# from datasets import load_dataset; load_dataset("Yuan-avs/Nurisk")

Stage 1 — Scenario simulation with Frenetix

The full data-preparation pipeline (stages 1–4) is run on a curated collision-rich set of 1083 CommonRoad XML scenarios (the Frenetix dataset/scenarios_collision/ pool). After downstream cleaning in stage 5, 888 scenarios appear in the published Yuan-avs/Nurisk fine-tuning dataset. All 1083 IDs are listed in scenarios.txt — the 888 that survived into the final dataset are marked with *.

The XMLs themselves are not redistributed here. All scenarios used by this pipeline are publicly available in the official CommonRoad scenario database at https://commonroad.in.tum.de/scenarios/ — search by the IDs in scenarios.txt, download the XMLs, and point XML_SCENARIOS_DIR in config.py at the directory you place them in.

Run the Frenetix Motion Planner on every scenario.

Frenetix produces, per scenario:

  • logs.csv — ego state across time
  • BEV PNG snapshots per timestep (used in stage 5 as VQA image input)

Refer to the Frenetix README for installation and execution. The remainder of this pipeline assumes the per-scenario Frenetix log directories live under one root (configurable via LOGS_DIR in config.py).


Stage 2 — Trajectory extraction

cd Trajectory_collection
python main_multi.py

Input: Frenetix logs + raw CommonRoad XML scenario files Output (per scenario):

  • ego_trajectory_positions_with_lanelets.csv
  • dynamic_obstacles_with_lanelets.csv
  • dynamic_obstacles.csv
  • ego_trajectory.csv
  • <scenario_name>.json (CommonRoad metadata)

Parallelism is set via the num_cpus constant in the script.


Stage 3 — Safety metrics

cd Safety_metrics_collection
python safety_multi.py

For every (ego, agent, timestep) tuple this computes:

  • Longitudinal / lateral relative distance
  • Relative velocity (front / rear / left / right)
  • Time-to-collision (TTC)

Output (per scenario): relative_metrics.csv


Stage 4 — Close-agent identification and risk scoring

4a. Filter to close obstacles

cd CloesdID_identification
python extract_close_metrics_multi.py

Output: close_relative_metrics.csv (per scenario)

4b. Compute risk scores with explanations

cd Riskscore_calculation
python risk_score_calculator_multi_enhanced.py

Output (per scenario):

  • risk_scores_output_enhanced.json — full per-agent reasoning trace
  • risk_scores_close_relative_metrics_enhanced.csv
  • risk_score_summary_enhanced.csv

Risk scale (set in config.py):

Score Label Min. distance TTC
0 Collision < 0.3 m < 0.15 s
1 Extreme < 0.8 m < 0.65 s
2 High < 1.3 m < 1.15 s
3 Medium < 3.0 m < 3.0 s
4 Low < 5.0 m < 5.0 s
5 Negligible ≥ 5.0 m ≥ 5.0 s

Lateral vs. longitudinal weighting and ego dimensions (EGO_LENGTH = 4.508 m, EGO_WIDTH = 1.610 m) are also defined in config.py.


Stage 5 — VQA dataset creation and HF upload

cd finetune_preprocess
python create_groundtruth_dataset.py             # 1. link BEV images ↔ risk scores
python prepare_sequential_dataset.py             # 2. stack 5 consecutive BEV frames per sample
python create_sequential_groundtruth_dataset.py  # 3. attach risk labels to the 5-frame stacks
python create_qwen_finetune_dataset.py           # 4. emit LLaVA-style conversation pairs
python prepare_dataset_splits.py                 # 5. train / validation split
python convert_to_vqa_parquet_fixed.py           # 6. → parquet (image / question / answer)
python upload_vqa_to_hf.py --repo-name <user>/<repo>  # 7. push to the Hugging Face Hub

The final artifact is the public dataset Yuan-avs/Nurisk — 60 train + 7 validation parquet shards with three columns per row: image (5-frame BEV stack), question (driving-risk prompt referencing a specific agent), answer (structured JSON with risk score, level, direction, reasoning).

See finetune_preprocess/README.md for a more detailed view of each sub-step.


Stage 6 — VLM fine-tuning

We fine-tune Qwen2.5-VL on the NuRisk dataset using 2U1/Qwen-VL-Series-Finetune. The launcher and DeepSpeed configs are bundled in training/; the upstream training code is referenced as a sibling clone (see training/README.md).

# 1. Fetch the dataset
bash download_dataset.sh ./Nurisk_dataset
# 2. Clone the upstream training code next to this repo
git clone https://github.com/2U1/Qwen-VL-Series-Finetune.git ../Qwen-VL-Series-Finetune
# 3. Register the local dataset in the upstream data_config.json, then launch:
bash training/train_risk_analysis.sh

Typical LoRA configuration used in the paper:

Hyper-parameter Value
Base model Qwen/Qwen2.5-VL-7B-Instruct
LoRA rank 64 (best) / 256 (high-capacity variant)
LoRA target modules q_proj, k_proj, v_proj, o_proj
Precision bf16
Sequence length 2048 prompt / 1024 completion
DeepSpeed ZeRO-3 (training/scripts/zero3.json)

The resulting checkpoint is referred to in the paper as NuRisk-VLM-Agent.


Quick start (full pipeline)

After editing config.py:

python run_pipeline.py --stage all

Or run a single stage: --stage {trajectory, safety, close, risk, vml}.


Extending to other driving datasets

The same pipeline (stages 2–5) generalises to other sources of ego/agent trajectories + BEV imagery. To target a real-world driving dataset such as nuScenes or Waymo Open, simply swap stage 1 for that dataset's trajectory + BEV producer and keep stages 2–5 as-is — the risk scoring and VQA generation logic are dataset-agnostic.


Citation

@article{gao2025nurisk,
  title   = {NuRisk: A Visual Question Answering Dataset for Agent-Level Risk Assessment in Autonomous Driving},
  author  = {Gao, Yuan and Piccinini, Mattia and Brusnicki, Roberto and Zhang, Yuchen and Betz, Johannes},
  journal = {arXiv preprint arXiv:2509.25944},
  year    = {2025},
  url     = {https://arxiv.org/abs/2509.25944}
}

License

Released under the MIT License.

About

[ICRA'26] NuRisk: A VQA Dataset for Agent-Level Risk Assessment in Autonomous Driving

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors