Stars
【AIGC 实战入门笔记 —— AIGC 摩天大楼】分享 大语言模型(LLMs),大模型高效微调(SFT),检索增强生成(RAG),智能体(Agent),PPT自动生成, 角色扮演,文生图(Stable Diffusion) ,图像文字识别(OCR),语音识别(ASR),语音合成(TTS),人像分割(SA),多模态(VLM),Ai 换脸(Face Swapping), 文生视频(VD),图生…
rCM: SOTA Diffusion Distillation & Few-Step Video Generation
CoTracker is a model for tracking any point (pixel) on a video.
A unified inference and post-training framework for accelerated video generation.
(NeurIPS 2024 Oral 🔥) Improved Distribution Matching Distillation for Fast Image Synthesis
The ultimate training toolkit for finetuning diffusion models
[ICCV 2025] 🔥🔥 UNO: A Universal Customization Method for Both Single and Multi-Subject Conditioning
📹 A more flexible framework that can generate videos at any resolution and creates videos from images.
Scaling Diffusion Transformers with Mixture of Experts
[CVPR 2025 Oral]Infinity ∞ : Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis
[CVPR 2024 Highlight] MIGC and [TPAMI 2024] MIGC++ (Official Implementation)
[ICLR 2025 Oral] Official code for "LVSM: A Large View Synthesis Model with Minimal 3D Inductive Bias"
[CVPR'24 Highlight & Best Demo Award] Gaussian Splatting SLAM
[ICCV'25 Best Paper Finalist] ReCamMaster: Camera-Controlled Generative Rendering from A Single Video
[CVPR 2025 Highlight] VideoScene: Distilling Video Diffusion Model to Generate 3D Scenes in One Step
面向开发者的 LLM 入门教程,吴恩达大模型系列课程中文版
The official implementation of "MagicColor: Multi-Instance Sketch Colorization"
Code release for https://kovenyu.com/WonderWorld/
Implementation of "Multi-Track Timeline Control for Text-Driven 3D Human Motion Generation" from CVPR Workshop on Human Motion Generation 2024.
Implementation of the paper CPTR : FULL TRANSFORMER NETWORK FOR IMAGE CAPTIONING
Pytorch implementation of image captioning using transformer-based model.
[NeurIPS 2023] MotionGPT: Human Motion as a Foreign Language, a unified motion-language generation model using LLMs
Official implementation of "MoMask: Generative Masked Modeling of 3D Human Motions (CVPR2024)"
[EMNLP2025] "LightRAG: Simple and Fast Retrieval-Augmented Generation"
基于AI的图片/视频硬字幕去除、文本水印去除,无损分辨率生成去字幕、去水印后的图片/视频文件。无需申请第三方API,本地实现。AI-based tool for removing hard-coded subtitles and text-like watermarks from videos or Pictures.
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
Image inpainting tool powered by SOTA AI Model. Remove any unwanted object, defect, people from your pictures or erase and replace(powered by stable diffusion) any thing on your pictures.
[CVPR 2025] EchoMimicV2: Towards Striking, Simplified, and Semi-Body Human Animation
(IJCV 2024) Code of "AniClipart: Clipart Animation with Text-to-Video Priors"