Stars
SkyReels-V2: Infinite-length Film Generative model
Enjoy the magic of Diffusion models!
The official implementation of CVPR'25 Oral paper "Go-with-the-Flow: Motion-Controllable Video Diffusion Models Using Real-Time Warped Noise"
A web-based collaborative LaTeX editor
Official repo for [NeurlPS 2025 Spotlight] "GeoLLaVA-8K: Scaling Remote-Sensing Multimodal Large Language Models to 8K Resolution"
[ICCV'25] When Large Vision-Language Model Meets Large Remote Sensing Imagery: Coarse-to-Fine Text-Guided Token Pruning
This project is the official implementation of 'DreamOmni2: Multimodal Instruction-based Editing and Generation''
[Nature Machine Intelligence 2025] This repository is the official implementation of the paper "A semantic-enhanced multi-modal remote sensing foundation model for Earth observation".
AC3D: Analyzing and Improving 3D Camera Control in Video Diffusion Transformers
Code of Paper OmniFuse: Composite Degradation-Robust Image Fusion with Language-Driven Semantics.
This is the official code of the NeurIPS 2024 paper "Text-DiFuse: An Interactive Multi-Modal Image Fusion Framework based on Text-modulated Diffusion Model"
Official Code of Text-IF: Leveraging Semantic Text Guidance for Degradation-Aware and Interactive Image Fusion (CVPR2024)
A curated list of recent diffusion models for video generation, editing, and various other applications.
这是本人学习清华大学70240403-200大数据机器学习课程的开源工作,包括对往期Assignment的实现、对Lecture的笔记与理解、对即将来的Project的实现等,欢迎各位同学一起学习一起讨论,对知识取得更好的理解。可在线阅读文档:https://thu-coursework-machine-learning-for-big-data-docs.vercel.app/
SegEarth-R1: Geospatial Pixel Reasoning via Large Language Model
Project Page For "Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement"
Venus Collective Communication Library, supported by SII and Infrawaves.
ArtiMuse: Fine-Grained Image Aesthetics Assessment with Joint Scoring and Expert-Level Understanding(书生 · 妙析多模态美学理解大模型)
[ICCV 2025] MM-IFEngine: Towards Multimodal Instruction Following
Vision Manus: Your versatile Visual AI assistant
Aims to implement dual-port and multi-qp solutions in deepEP ibrc transport
UnrealZoo / unrealzoo-gym
Forked from zfw1226/gym-unrealcv[ICCV 2025 Highlights] Large-scale photo-realistic virtual worlds for embodied AI
[GRSM] Project Page for "GeoPix: Multi-Modal Large Language Model for Pixel-level Image Understanding in Remote Sensing"
【TMM 2025🔥】 Mixture-of-Experts for Large Vision-Language Models
[ICCV2025] PyTorch implementation of "Perceive, Understand and Restore: Real-World Image Super-Resolution with Autoregressive Multimodal Generative Models"