Thanks to visit codestin.com
Credit goes to github.com

Skip to content
View zaiquanyang's full-sized avatar
💭
I may be slow to respond.
💭
I may be slow to respond.

Block or report zaiquanyang

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A paper list of Awesome Latent Space.

258 10 Updated Dec 26, 2025

Code for paper "SPG Sandwiched Policy Gradient for Masked Diffusion Language Models"

Python 45 4 Updated Oct 29, 2025

SGLang is a fast serving framework for large language models and multi-modality models.

Python 22,018 3,879 Updated Dec 27, 2025

[AAAI26] LongLLaDA: Unlocking Long Context Capabilities in Diffusion LLMs

Python 49 5 Updated Dec 7, 2025

dInfer: An Efficient Inference Framework for Diffusion Language Models

Python 374 35 Updated Dec 23, 2025

My learning notes for ML SYS.

Python 4,824 310 Updated Dec 24, 2025

Thinking with Videos from Open-Source Priors. We reproduce chain-of-frames visual reasoning by fine-tuning open-source video models. Give it a star 🌟 if you find it useful.

Python 198 7 Updated Oct 12, 2025

The official implementation of paper “VChain: Chain-of-Visual-Thought for Reasoning in Video Generation”

110 1 Updated Oct 7, 2025

LLaVA-Next for STVG

Python 15 2 Updated Dec 5, 2025

🚀 LLM-I: Transform LLMs into natural interleaved multimodal creators! ✨ Tool-use framework supporting image search, generation, code execution & editing

Python 37 1 Updated Oct 20, 2025

Fully Open Framework for Democratized Multimodal Training

Python 665 53 Updated Dec 27, 2025

Official implementation of "Diffusion Language Models Know the Answer Before Decoding"

Python 42 Updated Sep 8, 2025

Project Page For "Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement"

Python 581 27 Updated Jul 30, 2025

Awesome List for Agentic RL

HTML 665 27 Updated Dec 9, 2025

[AAAI 2026] ✨ TSPO: Temporal Sampling Policy Optimization for Long-form Video Language Understanding

Python 109 8 Updated Nov 12, 2025

Official repo for "PAPO: Perception-Aware Policy Optimization for Multimodal Reasoning"

Python 108 6 Updated Dec 7, 2025

📰 Must-read papers and blogs on LLM based Long Context Modeling 🔥

1,858 78 Updated Dec 6, 2025
Python 205 24 Updated Jul 25, 2025

[NeurIPS 2025] Efficient Reasoning Vision Language Models

Python 440 29 Updated Sep 18, 2025

Agent-R1: Training Powerful LLM Agents with End-to-End Reinforcement Learning

Python 1,107 78 Updated Nov 25, 2025

A MemAgent framework that can be extrapolated to 3.5M, along with a training framework for RL training of any agent workflow.

Python 843 58 Updated Jul 31, 2025

Long-RL: Scaling RL to Long Sequences (NeurIPS 2025)

Python 680 25 Updated Sep 24, 2025

[NeurIPS 2025] OST-Bench: Evaluating the Capabilities of MLLMs in Online Spatio-temporal Scene Understanding

Python 69 1 Updated Sep 29, 2025
Python 4 Updated Jul 11, 2025

[ICCV 2025] Auto Interpretation Pipeline and many other functionalities for Multimodal SAE Analysis.

Python 171 10 Updated Sep 26, 2025

**Deep Video Discovery (DVD)** is a deep-research style question answering agent designed for understanding extra-long videos.

Python 323 7 Updated Nov 3, 2025

verl-agent is an extension of veRL, designed for training LLM/VLM agents via RL. verl-agent is also the official code for paper "Group-in-Group Policy Optimization for LLM Agent Training"

Python 1,324 117 Updated Dec 11, 2025

This is the official implementation of ICCV 2025 "Flash-VStream: Efficient Real-Time Understanding for Long Video Streams"

Python 256 18 Updated Oct 15, 2025

StreamingBench: Assessing the Gap for MLLMs to Achieve Streaming Video Understanding

Python 140 6 Updated May 16, 2025

[ECCV2024] Video Foundation Models & Data for Multimodal Understanding

Python 2,145 135 Updated Dec 15, 2025
Next