Stars
MiniCPM-V 4.5: A GPT-4o Level MLLM for Single Image, Multi Image and High-FPS Video Understanding on Your Phone
mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding
Xwin-LM: Powerful, Stable, and Reproducible LLM Alignment
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
MMICL, a state-of-the-art VLM with the in context learning ability from ICL, PKU
The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
Monash FIT5171 Assignment 1 Test Plan and Unit/Integration Testing on Airline Reservation System
An open-source framework for training large multimodal models.
为GPT/GLM等LLM大语言模型提供实用化交互接口,特别优化论文阅读/润色/写作体验,模块化设计,支持自定义快捷按钮&函数插件,支持Python和C++等项目剖析&自译解功能,PDF/LaTex论文翻译&总结功能,支持并行问询多种LLM模型,支持chatglm3等本地模型。接入通义千问, deepseekcoder, 讯飞星火, 文心一言, llama2, rwkv, claude2, m…
CerberusX / Sign-Language-Transformer
Forked from neccam/sltSign Language Transformers (CVPR'20)
Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch
东南大学信息门户自动登录,SEU每日自动健康打卡,附赠绩点计算功能。Github Action一键部署,自动打卡