- Tübingen, Germany
- https://soumyasj.github.io/
- @soumyasj2222
Stars
[NeurIPS 2025] TTRL: Test-Time Reinforcement Learning
verl: Volcano Engine Reinforcement Learning for LLMs
Video-R1: Reinforcing Video Reasoning in MLLMs [🔥the first paper to explore R1 for video]
A most Frontend Collection and survey of vision-language model papers, and models GitHub repository. Continuous updates.
VisualOverload is a VQA benchmark for image understanding in dense, high-resolution scenes.
The official repo for paper, LLMs-as-Judges: A Comprehensive Survey on LLM-based Evaluation Methods.
A playbook for systematically maximizing the performance of deep learning models.
This repository contains demos I made with the Transformers library by HuggingFace.
This repo lists relevant papers summarized in our survey paper: A Systematic Survey of Prompt Engineering on Vision-Language Foundation Models.
[ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenization
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
This repository collects all relevant resources about interpretability in LLMs
Interpretability for sequence generation models 🐛 🔍
Best practices & guides on how to write distributed pytorch training code
A paper list of some recent works about Token Compress for Vit and VLM
Sharing both practical insights and theoretical knowledge about LLM evaluation that we gathered while managing the Open LLM Leaderboard and designing lighteval!
Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks
DocEnTr: An end-to-end document image enhancement transformer - ICPR 2022
Paper list about multimodal and large language models, only used to record papers I read in the daily arxiv for personal needs.
[ECCV 2024] Official code implementation of Vary: Scaling Up the Vision Vocabulary of Large Vision Language Models.
✨✨Latest Advances on Multimodal Large Language Models
👁️ + 💬 + 🎧 = 🤖 Curated list of top foundation and multimodal models! [Paper + Code + Examples + Tutorials]