Stars
posters for all CVPR2024 Award papers (Highlight and Oral)
[CVPR 2025 Highlight] Official repository of MapDR dataset proposed in paper "Driving by the Rules: A Benchmark for Integrating Traffic Sign Regulations into Vectorized HD Map"
The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.
Monkey (LMM): Image Resolution and Text Label Are Important Things for Large Multi-modal Models (CVPR 2024 Highlight)
[TPAMI2025&CVPR2024] Official Pytorch Implementation of SED: A Simple Encoder-Decoder for Open-Vocabulary Semantic Segmentation.
Code for ALBEF: a new vision-language pre-training method
This is a collection of our NAS and Vision Transformer work.
Visualizer for neural network, deep learning and machine learning models
[NeurIPS 2023] MCUFormer: Deploying Vision Transformers on Microcontrollers with Limited Memory
On-Device Training Under 256KB Memory [NeurIPS'22]
[NeurIPS 2020] MCUNet: Tiny Deep Learning on IoT Devices; [NeurIPS 2021] MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning; [NeurIPS 2022] MCUNetV3: On-Device Training Under 2…
《Machine Learning Systems: Design and Implementation》- Chinese Version
MNN is a blazing fast, lightweight deep learning framework, battle-tested by business-critical use cases in Alibaba. Full multimodal LLM Android App:[MNN-LLM-Android](./apps/Android/MnnLlmChat/READ…
A playbook for systematically maximizing the performance of deep learning models.
“让爷康康”是一款手机 AI 应用程序,可以监测不良坐姿并进行语音提示
A toolbox of vision models and algorithms based on MindSpore