Thanks to visit codestin.com
Credit goes to github.com

Skip to content
View TempleX98's full-sized avatar

Block or report TempleX98

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Official Repo for Rolling Forcing: Autoregressive Long Video Diffusion in Real Time

Python 181 4 Updated Oct 23, 2025

ULMEvalKit: One-Stop Eval ToolKit for Image Generation

Python 45 1 Updated Oct 22, 2025

TalkVid: A Large-Scale Diversified Dataset for Audio-Driven Talking Head Synthesis

Python 106 4 Updated Sep 5, 2025

Official codebase for "Self Forcing: Bridging Training and Inference in Autoregressive Video Diffusion" (NeurIPS 2025 Spotlight)

Python 2,763 195 Updated Sep 12, 2025

Text-audio foundation model from Boson AI

Python 7,534 555 Updated Sep 15, 2025

Awesome lists about framework figures in papers

902 23 Updated Aug 27, 2025

Code for PhysDreamer

Python 590 26 Updated Feb 10, 2025

The official SpeakerVid-5M data curation code.

Python 48 3 Updated Jul 23, 2025

(CVPR 2025) From Slow Bidirectional to Fast Autoregressive Video Diffusion Models

Python 999 53 Updated Aug 7, 2025

[NeurIPS 2025] T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT

Python 410 24 Updated Sep 18, 2025

Scalable and memory-optimized training of diffusion models

Python 1,291 137 Updated Jun 4, 2025

Solve Visual Understanding with Reinforced VLMs

Python 5,659 364 Updated Oct 21, 2025

Fully open reproduction of DeepSeek-R1

Python 25,585 2,399 Updated Sep 8, 2025

Official repository for LTX-Video

Python 8,606 781 Updated Oct 25, 2025

[CVPR 2025] EchoMimicV2: Towards Striking, Simplified, and Semi-Body Human Animation

Python 4,333 506 Updated Aug 11, 2025

StreamDiffusion: A Pipeline-Level Solution for Real-Time Interactive Generation

Python 10,486 807 Updated Dec 4, 2024

[ICML 2025] EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM

Jupyter Notebook 68 Updated Jul 16, 2025

Efficient Triton Kernels for LLM Training

Python 5,783 421 Updated Oct 28, 2025

[Neurips'24 Spotlight] Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought Reasoning

Python 389 20 Updated Dec 22, 2024

[ICLR 2025] The First Multimodal Seach Engine Pipeline and Benchmark for LMMs

Python 479 31 Updated Jan 23, 2025

One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks

Python 3,238 407 Updated Oct 29, 2025

Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Jupyter Notebook 15,721 1,231 Updated Oct 27, 2025

[NeurIPS 2024] MoVA: Adapting Mixture of Vision Experts to Multimodal Context

Python 167 4 Updated Sep 25, 2024

[NeurIPS 2024] 💫CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching

Python 166 8 Updated Nov 18, 2024
Python 8,653 514 Updated Oct 9, 2024
Python 796 46 Updated Jul 8, 2024

[ICCV 2023] GeoMIM: towards better 3d knowledge transfer via masked image modeling for multi-view 3d understanding

Python 49 Updated Aug 28, 2023

The code repository for "RCNet: Reverse Feature Pyramid and Cross-scale Shift Network for Object Detection" (ACM MM'21)

Python 11 1 Updated Oct 23, 2021

[ICLR 2024 & ECCV 2024] The All-Seeing Projects: Towards Panoptic Visual Recognition&Understanding and General Relation Comprehension of the Open World"

Python 499 17 Updated Aug 9, 2024

[ICCV 2023] Temporal Enhanced Training of Multi-view 3D Object Detector via Historical Object Prediction

Python 193 14 Updated Aug 24, 2023
Next