Starred repositories
A recreation of Neuro-Sama originally created in 7 days.
Cook up amazing multimodal AI applications effortlessly with MiniCPM-o
A powerful data recovery utility for Linux with many advanced features based on Scott Dwyer's HDDSuperClone.
Official Implementation of LaViDa: :A Large Diffusion Language Model for Multimodal Understanding
[ArXiv 2025] DiffusionVL: Translating Any Autoregressive Models into Diffusion Vision Language Models
Adaptive FNO transformer - official Pytorch implementation
Large Kernel Vision Mamba UNet for Medical Image Segmentation
Official code for ICCV 2025 paper, X2I: Seamless Integration of Multimodal Understanding into Diffusion Transformer via Attention Distillation
GLM-Image: Auto-regressive for Dense-knowledge and High-fidelity Image Generation.
[CVPR 2025] DiG: Scalable and Efficient Diffusion Models with Gated Linear Attention
Dungeon procedural generator similar to whatabou's "One Page Dungeon"
[ICCV 2025] Enhancing spatial understanding in text-to-Image diffusion models
[NeurIPS 2024] The official code of "U-DiTs: Downsample Tokens in U-Shaped Diffusion Transformers"
deepbeepmeep / Wan2GP
Forked from Wan-Video/Wan2.1A fast AI Video Generator for the GPU Poor. Supports Wan 2.1/2.2, Qwen Image, Hunyuan Video, LTX Video and Flux.
Official Python inference and LoRA trainer package for the LTX-2 audio–video generative model.
Swiftly get tons of images from indexed tars on Huggingface
Hydra is a framework for elegantly configuring complex applications
Reflection Removal through Efficient Adaptation of Diffusion Transformers
Reference PyTorch implementation and models for DINOv3
Official PyTorch Implementation of "SVG-T2I: Scaling up Text-to-Image Latent Diffusion Model Without Variational Autoencoder".
[NeurIPS 2025 Oral]Infinity⭐️: Unified Spacetime AutoRegressive Modeling for Visual Generation
[NeurIPS 2024 Best Paper Award][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". A…
GLM-4.6V/4.5V/4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning
A high-throughput and memory-efficient inference and serving engine for LLMs
verl: Volcano Engine Reinforcement Learning for LLMs
MoH: Multi-Head Attention as Mixture-of-Head Attention
D2 is a modern diagram scripting language that turns text to diagrams.