Thanks to visit codestin.com
Credit goes to github.com

Skip to content
View soumyasj's full-sized avatar

Block or report soumyasj

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

[NeurIPS 2025] TTRL: Test-Time Reinforcement Learning

Python 883 65 Updated Sep 26, 2025

verl: Volcano Engine Reinforcement Learning for LLMs

Python 14,994 2,402 Updated Nov 1, 2025

Video-R1: Reinforcing Video Reasoning in MLLMs [🔥the first paper to explore R1 for video]

Python 726 38 Updated Sep 19, 2025

A most Frontend Collection and survey of vision-language model papers, and models GitHub repository. Continuous updates.

420 21 Updated Oct 31, 2025

VisualOverload is a VQA benchmark for image understanding in dense, high-resolution scenes.

Python 14 Updated Oct 6, 2025

The official repo for paper, LLMs-as-Judges: A Comprehensive Survey on LLM-based Evaluation Methods.

479 22 Updated Jul 29, 2025
Jupyter Notebook 31 6 Updated Jun 25, 2025
Python 4,355 413 Updated Sep 14, 2025

A playbook for systematically maximizing the performance of deep learning models.

29,330 2,400 Updated Jun 18, 2024
Jupyter Notebook 31 Updated Feb 8, 2024

[ICCV25] Official Implementation of LeGrad

Python 82 8 Updated Oct 14, 2024

This repository contains demos I made with the Transformers library by HuggingFace.

Jupyter Notebook 11,320 1,685 Updated Jul 2, 2025

This repo lists relevant papers summarized in our survey paper: A Systematic Survey of Prompt Engineering on Vision-Language Foundation Models.

497 38 Updated Mar 18, 2025

[ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenization

Python 580 44 Updated Jun 7, 2024

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Python 23,870 2,657 Updated Aug 12, 2024

[ICCV 2025] Dynamic-VLM

Python 26 Updated Dec 16, 2024

This repository collects all relevant resources about interpretability in LLMs

377 26 Updated Nov 1, 2024

Interpretability for sequence generation models 🐛 🔍

Python 444 38 Updated Oct 29, 2025

Best practices & guides on how to write distributed pytorch training code

Python 526 50 Updated Oct 22, 2025

A paper list of some recent works about Token Compress for Vit and VLM

716 30 Updated Oct 21, 2025

Sharing both practical insights and theoretical knowledge about LLM evaluation that we gathered while managing the Open LLM Leaderboard and designing lighteval!

Jupyter Notebook 1,732 95 Updated Oct 7, 2025

Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks

Python 3,287 527 Updated Oct 31, 2025
Python 355 12 Updated Jan 27, 2024
Python 26 1 Updated Oct 14, 2024

DocEnTr: An end-to-end document image enhancement transformer - ICPR 2022

Jupyter Notebook 176 36 Updated Jan 17, 2025

Paper list about multimodal and large language models, only used to record papers I read in the daily arxiv for personal needs.

741 42 Updated Oct 20, 2025

[ECCV 2024] Official code implementation of Vary: Scaling Up the Vision Vocabulary of Large Vision Language Models.

Python 1,874 144 Updated Dec 30, 2024

✨✨Latest Advances on Multimodal Large Language Models

16,584 1,069 Updated Nov 1, 2025

👁️ + 💬 + 🎧 = 🤖 Curated list of top foundation and multimodal models! [Paper + Code + Examples + Tutorials]

Python 636 45 Updated Feb 29, 2024
Next