The official repository for SurgRAW, a Chain-of-Thought-driven, multi-agent framework delivering transparent and interpretable insights for robotic-assisted surgery.
Note: The full codebase and mcq dataset will be released soon.
SurgRAW employs specialized prompts and a hierarchical orchestration system across five core surgical intelligence tasks:
- Instrument Recognition
- Action Recognition
- Action Prediction
- Patient Data Extraction
- Outcome Assessment
- Chain-of-Thought Agents – Task-specific prompts guide VLM agents through structured reasoning, reducing hallucinations and improving explainability.
- Hierarchical Orchestration – A Department Coordinator routes queries to visual-semantic or cognitive-inference agents, mirroring real surgical workflows.
- Panel Discussion – An Action Evaluator cross-checks visual-semantic predictions using a knowledge graph and rubric-based evaluation for logical consistency.
- Retrieval-Augmented Generation (RAG) – Cognitive-inference tasks are grounded in external medical knowledge for reliable, domain-specific responses.
We evaluate SurgRAW on SurgCoTBench — the first reasoning-based dataset covering the entire surgical workflow.
- 12 robotic procedures
- 2,277 frames
- 14,176 vision–query pairs
- 5 task categories aligned with the SurgRAW framework
Release Plan: SurgCoTBench and the corresponding Chain-of-Thought prompts will be made available with our paper.
You may also use SurgCoTBench or any dataset that includes the following columns in its .xlsx file:
image_pathquestionground_truth
This repository currently showcases:
- The SurgRAW agentic framework architecture
- Collaboration metrics
Dataset and full CoT prompt releases will follow publication. Collaborations are warmly welcomed.
Follow these steps to set up the SurgRAW environment:
# 1️⃣ Create a new conda environment
conda create -n SurgRAW python=3.12 -y
# 2️⃣ Activate the environment
conda activate SurgRAW
# 3️⃣ Install required Python packages
pip install -r requirements.txtEnsure
requirements.txtis in the project root.
For GPU, install the CUDA-matching PyTorch wheels per the official PyTorch instructions.
Run the orchestration pipeline on your .xlsx dataset using the provided script (which calls final_orchestrator under the hood).
python run_orchestration.py --xlsx_file /path/to/your/input.xlsx --log_dir /path/to/save/logsArguments
--xlsx_file– Path to the Excel file with columns:image_path,COT_Process,question_mcq,ground_truth(optional)--log_dir– Directory where per-row logs (*.txt) will be written
Example
python run_orchestration.py --xlsx_file data/SurgCoTBench_sample.xlsx --log_dir logs/Each row produces a dedicated log file named like:
<image_name>_<COT_FileNamingConvention>_SurgCOT.txt
If you find this work useful, please cite our paper:
@article{low2025surgraw,
title={Surgraw: Multi-agent workflow with chain-of-thought reasoning for surgical intelligence},
author={Low, Chang Han and Wang, Ziyue and Zhang, Tianyi and Zeng, Zhitao and Zhuo, Zhu and Mazomenos, Evangelos B and Jin, Yueming},
journal={arXiv preprint arXiv:2503.10265},
year={2025}
}Have fun with our work!