Stars
[IGARSS 2025 Oral] A Simple Aerial Detection Baseline of Multimodal Language Models.
[NeurIPS 2024 Best Paper Award][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". A…
The code used to train and run inference with the ColVision models, e.g. ColPali, ColQwen2, and ColSmol.
Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 500+ LLMs (Qwen3, Qwen3-MoE, Llama4, GLM4.5, InternLM3, DeepSeek-R1, ...) and 200+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, Llava, GLM4v, Ph…
Janus-Series: Unified Multimodal Understanding and Generation Models
A curated list for vision-and-language navigation. ACL 2022 paper "Vision-and-Language Navigation: A Survey of Tasks, Methods, and Future Directions"
Codebase for Automated Creation of Digital Cousins for Robust Policy Learning
Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.
The paper list of the review on LLMs in medicine - "Large Language Models Illuminate a Progressive Pathway to Artificial Healthcare Assistant: A Review".
From Chain-of-Thought prompting to OpenAI o1 and DeepSeek-R1 🍓
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
Label Studio is a multi-type data labeling and annotation tool with standardized output format
RAGOnMedicalKG,将大模型RAG与KG结合,完成demo级问答,旨在给出基础的思路。
An LLM-powered repository agent designed to assist developers and teams in generating documentation and understanding repositories quickly.
A Next-Generation Training Engine Built for Ultra-Large MoE Models
AutoRAG: An Open-Source Framework for Retrieval-Augmented Generation (RAG) Evaluation & Optimization with AutoML-Style Automation
A tutorial and implement of disease centered Medical knowledge graph and qa system based on it。知识图谱构建,自动问答,基于kg的自动问答。以疾病为中心的一定规模医药领域知识图谱,并以该知识图谱完成自动问答与分析服务。
streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL
Implementation of Nougat Neural Optical Understanding for Academic Documents
Medical NLP Competition, dataset, large models, paper
中文医学NLP公开资源整理:术语集/语料库/词向量/预训练模型/知识图谱/命名实体识别/QA/信息抽取/模型/论文/etc
Open-source coding assistant for Visual Studio Code. Connect to LLMs from OpenAI or Google.
Simple samples for TensorRT programming