Thanks to visit codestin.com
Credit goes to GitHub.com

Skip to content
View tkpham3105's full-sized avatar

Block or report tkpham3105

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Better, Stronger, Faster: Tackling the Trilemma in MLLM-based Segmentation with Simultaneous Textual Mask Prediction

Python 32 Updated Jan 6, 2026

(ICCV 2025) "Principal Components" Enable A New Language of Images

Jupyter Notebook 78 6 Updated Jul 28, 2025

Official code for the CVPR 2025 paper "Navigation World Models".

Python 518 46 Updated Nov 24, 2025

Download audioset data super fastly with youtube-dl, ffmpeg and python multiprocessing

Python 44 Updated Aug 1, 2024
Jupyter Notebook 144 8 Updated Jun 20, 2025

This is a repository to collect training-free algorithms for visual generation and manipulation

200 7 Updated Jan 13, 2026

Visualization of DiT self attention features

Python 235 10 Updated Aug 12, 2024

Official Pytorch Implementation for “DINO-Tracker: Taming DINO for Self-Supervised Point Tracking in a Single Video” (ECCV 2024)

Python 548 53 Updated Nov 23, 2024

[ICCV 2025] Official implementation of the paper: REPA-E: Unlocking VAE for End-to-End Tuning of Latent Diffusion Transformers

Python 446 21 Updated Dec 6, 2025

Wan: Open and Advanced Large-Scale Video Generative Models

Python 15,209 2,338 Updated Dec 15, 2025
Python 14 3 Updated Mar 24, 2025

OmniGen: Unified Image Generation. https://arxiv.org/pdf/2409.11340

Jupyter Notebook 4,301 370 Updated Dec 4, 2025

[CVPR 2024] Code release for "InstanceDiffusion: Instance-level Control for Image Generation"

Python 607 32 Updated Jun 17, 2025

Let's finetune video generation models!

Python 533 29 Updated Sep 15, 2025

Fine-tune Stable Audio Open with DiT ControlNet.

Python 249 9 Updated May 16, 2025

Improved Implementation for Training GLIGEN: Open-Set Grounded Text-to-Image Generation

Python 46 8 Updated Jun 1, 2024

Source code for "Synchformer: Efficient Synchronization from Sparse Cues" (ICASSP 2024)

Python 104 9 Updated Sep 15, 2025

Code for the paper: GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities

Python 150 13 Updated Dec 5, 2024

Get up and running with OpenAI GLM-4.7, DeepSeek, gpt-oss, Qwen, Gemma and other models.

Go 160,669 14,292 Updated Jan 26, 2026

[ICLR 2024] LLM-grounded Video Diffusion Models (LVD): official implementation for the LVD paper

Python 164 8 Updated May 7, 2024

[ECCV 2024 Oral] Audio-Synchronized Visual Animation

Python 57 1 Updated Sep 12, 2024

Lumina-T2X is a unified framework for Text to Any Modality Generation

Python 2,248 94 Updated Feb 16, 2025

[ACM MM 2024] Training-free Cross-domain Image Composition via Adaptive Latent Manipulation and Energy-guided Optimization

Python 20 1 Updated Dec 15, 2024

[NIPS2023] Code and Model for VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset

Jupyter Notebook 297 18 Updated Mar 14, 2024

Text-to-Audio/Music Generation

Python 2,573 205 Updated Sep 29, 2024

[CSUR] A Survey on Video Diffusion Models

2,263 113 Updated Jun 27, 2025

[TMLR 2025] Latte: Latent Diffusion Transformer for Video Generation.

Python 1,908 190 Updated Oct 30, 2025

[CVPR 2024] Official repository for "MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model"

Python 10,898 1,100 Updated Aug 29, 2025

PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation

Python 1,894 95 Updated Oct 31, 2024
Next