Stars
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
Implementation for the NEJM AI original article "Artificial Intelligence Identifies Factors Associated with Blood Loss and Surgical Experience in Cholecystectomy".
[CVPR 2025] MicroVQA eval and 🤖RefineBot code for "MicroVQA: A Multimodal Reasoning Benchmark for Microscopy-Based Scientific Research" code for MicroVQA benchmark and RefineBot method
[CVPR 2025] BIOMEDICA: An Open Biomedical Image-Caption Archive, Dataset, and Vision-Language Models Derived from Scientific Literature
[MedIA'25] Learning multi-modal representations by watching hundreds of surgical video lectures
Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks
Downstream-Dino-V2: A GitHub repository featuring an easy-to-use implementation of the DINOv2 model by Facebook for downstream tasks such as Classification, Semantic Segmentation and Monocular dept…
MedLSAM: Localize and Segment Anything Model for 3D Medical Images
Official repository for the ICCV2023 paper "Kick Back & Relax: Learning to Reconstruct the World by Watching SlowTV"
Code for "Deconstructing Monocular Depth Reconstruction: The Design Decisions that Matter" (https://arxiv.org/abs/2208.01489)
[CVPR 2021] Self-supervised depth estimation from short sequences
TRI-ML Monocular Depth Estimation Repository
Official code repository for "Using deep learning to identify the recurrent laryngeal nerve during thyroidectomy", Scientific Reports 2021.
Tensorflow port of Image-to-Image Translation with Conditional Adversarial Nets https://phillipi.github.io/pix2pix/
Visualize Camera's Pose Using Extrinsic Parameter by Plotting Pyramid Model on 3D Space
VR-Caps: A Virtual Environment for Active Capsule Endoscopy
Official repo for the work titled "SharinGAN: Combining Synthetic and Real Data for Unsupervised GeometryEstimation"
Extremely stupid LabVIEW game (shameless ripoff of the flash helicopter game from the early 2000s)
PyTorch implementation for 3D Bounding Box Estimation Using Deep Learning and Geometry
EndoSLAM Dataset and an Unsupervised Monocular Visual Odometry and Depth Estimation Approach for Endoscopic Videos: Endo-SfMLearner
Collection of generative models, e.g. GAN, VAE in Pytorch and Tensorflow.
A technical report on convolution arithmetic in the context of deep learning