CV
YOLOv6: a single-stage object detection framework dedicated to industrial applications.
LAVIS - A One-stop Library for Language-Vision Intelligence
ImageBind One Embedding Space to Bind Them All
Text-to-Image generation. The repo for NeurIPS 2021 paper "CogView: Mastering Text-to-Image Generation via Transformers".
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
[BIONLP@ACL 2024] XrayGPT: Chest Radiographs Summarization using Medical Vision-Language Models.
Data annotation toolbox supports image, audio and video data.
✨✨Latest Advances on Multimodal Large Language Models
Multimodal chatbot with computer vision capabilities integrated, our 1st-gen LMM
Chinese and English multimodal conversational language model | 多模态中英双语对话语言模型
Benchmarking toolkit for patch-based histopathology image classification.
[ICLR'24 spotlight] Chinese and English Multimodal Large Model Series (Chat and Paint) | 基于CPM基础模型的中英双语多模态大模型系列
Official code implementation of Vary-toy (Small Language Model Meets with Reinforced Vision Vocabulary)
A Large-Scale In-the-wild Dataset for Plant Disease Segmentation