Thanks to visit codestin.com
Credit goes to github.com

Skip to content
View sshan-zhao's full-sized avatar

Block or report sshan-zhao

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

MUG-V 10B: High-efficiency Training Pipeline for Large Video Generation Models

Python 69 2 Updated Oct 21, 2025

Automatic Video Generation from Scientific Papers

Python 1,250 160 Updated Oct 20, 2025

Home page for Microsoft Phi-Ground tech-report

Python 22 Updated Sep 8, 2025

[ICCV 2025] MMReason, MLLMs, step by step, reasoning benchmark, AGI

14 1 Updated Jul 15, 2025

EmoCapCLIP: Learning Transferable Facial Emotion Representations from Large-Scale Semantically Rich Captions

20 Updated Jul 29, 2025

The official repo for “Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting”, ACL, 2025.

Python 7,609 613 Updated Oct 22, 2025
Python 6 Updated Jul 15, 2025

Building a comprehensive and handy list of papers for GUI agents

Python 532 29 Updated Oct 21, 2025

💻 A curated list of papers and resources for multi-modal Graphical User Interface (GUI) agents.

940 53 Updated Aug 17, 2025

Agentic ADK is an Agent application development framework launched by Alibaba International AI Business, based on Google-ADK and Ali-LangEngine.

Java 564 107 Updated Oct 20, 2025

An unified model that seamlessly integrates multimodal understanding, text-to-image generation, and image editing within a single powerful framework.

Python 428 14 Updated Aug 4, 2025

Lets make video diffusion practical!

Python 16,009 1,526 Updated Oct 16, 2025

Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.

Python 2,061 117 Updated Jul 29, 2024

The real state 10k dataset from https://google.github.io/realestate10k

Python 48 5 Updated Dec 23, 2020
Python 118 7 Updated Feb 28, 2025

[CVPR 2025 (Oral)] Open implementation of "RandAR"

Python 197 6 Updated Jul 14, 2025

[CVPR 2025 Oral]Infinity ∞ : Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis

Python 1,475 79 Updated Jun 24, 2025

[NeurIPS 2024 Best Paper Award][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". A…

Jupyter Notebook 8,447 541 Updated May 18, 2025

📖 This is a repository for organizing papers, codes and other resources related to unified multimodal models.

726 41 Updated Oct 10, 2025

Official implementation of "Single Image Iterative Subject-driven Generation and Editing".

Python 101 5 Updated May 30, 2025

[ICCV 2025] FreeFlux: Understanding and Exploiting Layer-Specific Roles in RoPE-Based MMDiT for Versatile Image Editing

Jupyter Notebook 63 1 Updated Sep 3, 2025

Stable Virtual Camera: Generative View Synthesis with Diffusion Models

Python 1,479 105 Updated Jun 5, 2025
Python 7,993 561 Updated Oct 23, 2025

This repository contains the official implementation of the research paper, "FastViT: A Fast Hybrid Vision Transformer using Structural Reparameterization" ICCV 2023

Python 1,962 118 Updated Nov 30, 2023

The ultimate training toolkit for finetuning diffusion models

Python 6,626 789 Updated Oct 23, 2025

HunyuanVideo: A Systematic Framework For Large Video Generation Model

Python 11,191 1,108 Updated Aug 27, 2025

[ICLR 2025 Oral] TANGO: Co-Speech Gesture Video Reenactment with Hierarchical Audio-Motion Embedding and Diffusion Interpolation

Python 1,115 145 Updated Aug 24, 2025

[ECCV 2024 Oral] PetFace: A Large-Scale Dataset and Benchmark for Animal Identification https://arxiv.org/abs/2407.13555

Python 77 4 Updated Jul 27, 2025

Inference and training library for high-quality TTS models.

Python 5,451 580 Updated Dec 10, 2024

This is a simple ComfyUI custom TTS node based on Parler_tts.

Python 46 5 Updated Jul 2, 2025
Next