🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

Python 154,885 31,689 Updated Jan 9, 2026

huggingface / pytorch-image-models

The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (V…

Python 36,170 5,101 Updated Jan 9, 2026

hiyouga / LlamaFactory

Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

Python 65,391 7,946 Updated Jan 9, 2026

google-research / vision_transformer

Jupyter Notebook 12,206 1,432 Updated Jan 9, 2026

ModelTC / LightX2V

Light Image Video Generation Inference Framework

Python 1,748 134 Updated Jan 9, 2026

ALEEEHU / World-Simulator

Simulating the Real World: Survey & Resources, which contains our survey "Simulating the Real World: A Unified Survey of Multimodal Generative Models" and Awesome-Text2X-Resources. Watch this repos…

321 13 Updated Jan 9, 2026

wangkai930418 / awesome-diffusion-categorized

collection of diffusion model papers categorized by their subareas

2,108 97 Updated Jan 9, 2026

ccfddl / ccf-deadlines

⏰ Collaboratively track worldwide conference deadlines (Website, Python Cli, Wechat Applet) / If you find it useful, please star this project, thanks~

Rust 8,469 557 Updated Jan 9, 2026

helblazer811 / Diffusion-Explorer

Interactive visualizations of the geometric intuition behind diffusion models.

JavaScript 944 44 Updated Jan 8, 2026

zhengxuJosh / Awesome-RAG-Vision

Awesome-RAG-Vision: a curated list of advanced retrieval augmented generation (RAG) for Computer Vision

296 8 Updated Jan 8, 2026

scikit-image / scikit-image

Image processing in Python

Python 6,420 2,350 Updated Jan 8, 2026

wang-xinyu / tensorrtx

Implementation of popular deep learning networks with TensorRT network definition API

C++ 7,629 1,863 Updated Jan 8, 2026

lucidrains / vit-pytorch

Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch

Python 24,840 3,483 Updated Jan 8, 2026

Junjue-Wang / EarthVL

EarthVL: A Progressive Earth Vision-Language Understanding and Generation Framework

26 Updated Jan 8, 2026

Junjue-Wang / EarthVQA

[AAAI 2024] EarthVQA: Towards Queryable Earth via Relational Reasoning-Based Remote Sensing Visual Question Answering

Python 148 4 Updated Jan 8, 2026

ChunmingHe / awesome-concealed-object-segmentation

323 14 Updated Jan 8, 2026

Sen Lei Shaosifan

Lists (1)

Our Proposed Works

Starred repositories

Deep learning

Awesome Lists

object-detection