Stars
RF-DETR is a real-time object detection and segmentation model architecture developed by Roboflow, SOTA on COCO and designed for fine-tuning.
[CVPR 2024] Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers
The code for our newly accepted paper in Pattern Recognition 2020: "U^2-Net: Going Deeper with Nested U-Structure for Salient Object Detection."
DualAnoDiff: Dual-Interrelated Diffusion Model for Few-Shot Anomaly Image Generation
[CVPR 2024 Highlightš„] Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding
ćEMNLP 2024š„ćVideo-LLaVA: Learning United Visual Representation by Alignment Before Projection
LAVIS - A One-stop Library for Language-Vision Intelligence
a state-of-the-art-level open visual language model | å¤ęØ”ęé¢č®ē»ęØ”å
[ACL 2024 š„] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted foā¦
(NeurIPS 2024 Oral š„) Improved Distribution Matching Distillation for Fast Image Synthesis
PyTorch code and model checkpoints for Score identity Distillation (SiD) and its adversarial version (SiDA)
The official implementation of āOne-for-More: Continual Diffusion Model for Anomaly Detectionā ļ¼CVPR2025)
š© PatchCore - easier implementation of this image-level anomaly detector in python
Official repo for consistency models.
[ICML 2025] This is the official PyTorch implementation of "OmniBal: Towards Fast Instruction-Tuning for Vision-Language Models via Omniverse Computation Balance".
A flexible and efficient codebase for training visually-conditioned language models (VLMs)
[CVPR 2025] FLAIR: VLM with Fine-grained Language-informed Image Representations
[CVPR 2024 š„] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.
This project makes available the code and data from our NAACL paper: "Capturing Row and Column Semantics in Transformer Based Question Answering over Tables"
This repository contains source code for the TaBERT model, a pre-trained language model for learning joint representations of natural language utterances and (semi-)structured tables for semantic pā¦
Source code for "Large Self-Supervised Models Bridge the Gap in Domain Adaptive Object Detection"
A large annotated semantic parsing corpus for developing natural language interfaces.
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
[CVPR2025] AnomalyNCD: Towards Novel Anomaly Class Discovery in Industrial Scenarios. Paper is available at https://arxiv.org/abs/2410.14379
PyTorch code and models for the DINOv2 self-supervised learning method.
Reference PyTorch implementation and models for DINOv3
The collection of diffusion models for anomaly detection, a survey paper submitted to IJCAI 2025.
Offical implementation of "RealNet: A Feature Selection Network with Realistic Synthetic Anomaly for Anomaly Detection (CVPR 2024)"