Starred repositories
Bytebot is a self-hosted AI desktop agent that automates computer tasks through natural language commands, operating within a containerized Linux desktop environment.
Label, clean and enrich text datasets with LLMs.
A modular active learning framework for Python
MNN is a blazing fast, lightweight deep learning framework, battle-tested by business-critical use cases in Alibaba. Full multimodal LLM Android App:[MNN-LLM-Android](./apps/Android/MnnLlmChat/READ…
A gallery that showcases on-device ML/GenAI use cases and allows people to try and use models locally.
You like pytorch? You like micrograd? You love tinygrad! ❤️
An Android ExoPlayer wrapper to simplify Audio and Video implementations
Run your own AI cluster at home with everyday devices 📱💻 🖥️⌚
UI Library for Design Engineers. Animated components and effects you can copy and paste into your apps. Free. Open Source.
Autoware - the world's leading open-source software project for autonomous driving
Agent Reinforcement Trainer: train multi-step agents for real-world tasks using GRPO. Give your agents on-the-job training. Reinforcement learning for Qwen2.5, Qwen3, Llama, and more!
A native, user-mode, multi-process, graphical debugger.
Instant neural graphics primitives: lightning fast NeRF and more
File Parser optimised for LLM Ingestion with no loss 🧠 Parse PDFs, Docx, PPTx in a format that is ideal for LLMs.
Make beautiful isometric infrastructure diagrams
An open-source AI agent that brings the power of Gemini directly into your terminal.
An overhauled fork of the original Custom UI Editor for Microsoft Office, built with WPF
OmniGen2: Exploration to Advanced Multimodal Generation.
一个先进的聊天应用,演示了一种独特的对话范式:用户的查询首先由两个不同的人工智能角色进行辩论和提炼,然后才提供最终的综合答案。该项目利用 Google Gemini API 驱动一个逻辑型 AI (Cognito) 和一个怀疑型 AI (Muse),它们协作生成更健壮、准确和经过严格审查的响应。
Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.
Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.
Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
LlamaIndex is the leading framework for building LLM-powered agents over your data.
Trained models with fast variant of the "best" LSTM models + legacy models
Best (most accurate) trained LSTM models.
A high-performance runtime framework for modern robotics.
[ACL 2025] Graph-guided agentic framework for code localization https://arxiv.org/abs/2503.09089