-
CSE PhD @ UC San Diego
- San Diego, CA, USA
-
06:50
(UTC -08:00) - https://soumitri2001.github.io
- @soumitri2001
- https://scholar.google.com/citations?hl=en&user=AyMx6O4AAAAJ
Highlights
- Pro
Stars
Curate, Annotate, and Manage Your Data in LightlyStudio.
All-in-one training for vision models (YOLO, ViTs, RT-DETR, DINOv3): pretraining, fine-tuning, distillation.
Collection of Unsupervised Learning Methods for Vision-Language Models (VLMs)
Optical illusions using stable diffusion
RapidFire AI: Rapid AI Customization from RAG to Fine-Tuning
The official implementation of SAGANet: Video Object Segmentation-Aware Audio Generation (GCPR 2025) (Oral)
This repository contains demos I made with the Transformers library by HuggingFace.
[NeurIPS 2025 D&B] Open-source Multi-agent Poster Generation from Papers
Using Diffusion Models to Segment/Reconstruct Organs from Medical Images [AAAI Most influential Paper]
A curated list of prompt-based paper in computer vision and vision-language learning.
Collection of Composed Image Retrieval (CIR) papers.
Collection of awesome parameter-efficient fine-tuning resources.
A Unified Parameter-Efficient Transfer Learning Benchmark for Computer Vision Tasks
Tool for robust segmentation of >100 important anatomical structures in CT and MR images
A curated list of resources on implicit neural representations.
Movie Gen Bench - two media generation evaluation benchmarks released with Meta Movie Gen
[ECCV 2024 Oral] Adaptive Correspondence Scoring for Unsupervised Medical Image Registration
Offical codebase repository for the ECCV 2024 paper titled "UDA-Bench: Revisiting Common Assumptions in Unsupervised Domain Adaptation Using a Standardized Framework"
An AI focused photo manipulation tool based on Gradio
CUDA accelerated rasterization of gaussian splatting
This repository includes the code to download the curated HuggingFace papers into a single markdown formatted file
[ECCV 2024] Soft Prompt Generation for Domain Generalization
This repository contains a curated list of research papers and resources focusing on saliency and scanpath prediction, human attention, human visual search.
The official repo of the Comics Survey: "A missing piece in Vision and Language: A Survey on Comics Understanding"
Implement a ChatGPT-like LLM in PyTorch from scratch, step by step
3D Slicer extension for Segment Anything Model (SAM) developed by Meta
This repo contains evaluation code for the paper "BLINK: Multimodal Large Language Models Can See but Not Perceive". https://arxiv.org/abs/2404.12390 [ECCV 2024]