Highlights
- Pro
Stars
Scaling Spatial Intelligence with Multimodal Foundation Models
[AAAI2026] X-SAM: From Segment Anything to Any Segmentation
Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks
[CVPR 2025 Best Paper Award] VGGT: Visual Geometry Grounded Transformer
Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and performing real-time speech generation.
Minimal library code to deploy XGBoost models in C++.
Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow
SkyReels-V2: Infinite-length Film Generative model
[ECCV 2022] ByteTrack: Multi-Object Tracking by Associating Every Detection Box
State-of-the-art 2D and 3D Face Analysis Project
This repository contains some utilities for using the H3DS dataset
The official PyTorch implementation of Towards Fast, Accurate and Stable 3D Dense Face Alignment, ECCV 2020.
Learning Disentangled Avatars with Hybrid 3D Representations. (Face, Body, Hair and Clothing)
[CVPR2023] A Hierarchical Representation Network for Accurate and Detailed Face Reconstruction from In-The-Wild Images.
MoFaNeRF: Morphable Facial Neural Radiance Field (ECCV2022)
FaceScape (PAMI2023 & CVPR2020)
中文 NLP 预处理、解析工具包,准确、高效、易用 A Chinese NLP Preprocessing & Parsing Package www.jionlp.com
Count the MACs / FLOPs of your PyTorch model.
The official repo for [NeurIPS'21] "ViTAE: Vision Transformer Advanced by Exploring Intrinsic Inductive Bias" and [IJCV'22] "ViTAEv2: Vision Transformer Advanced by Exploring Inductive Bias for Ima…
The official repo for [NeurIPS'22] "ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation" and [TPAMI'23] "ViTPose++: Vision Transformer for Generic Body Pose Estimation"
This is an official implementation of facial landmark detection for our TPAMI paper "Deep High-Resolution Representation Learning for Visual Recognition". https://arxiv.org/abs/1908.07919
Official implementation of AnimateDiff.
Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"
(MICCAI 2022) PyTorch implementation of Denoising of 3D MR images using a voxel-wise hybrid residual MLP-CNN model to improve small lesion diagnostic confidence.