๐ Truly open-source AI avatar(digital human) toolkit for offline video generation and digital human cloning.
-
Updated
Apr 21, 2026 - C
๐ Truly open-source AI avatar(digital human) toolkit for offline video generation and digital human cloning.
Multi-modal Generative Media Skills for AI Agents (Claude Code, Cursor, Gemini CLI). High-quality image, video, and audio generation powered by muapi.ai.
Mano-P: Open-source GUI-VLA agent for edge devices. #1 on OSWorld (specialized, 58.2%). Runs locally on Apple M4 Mac mini/MacBook โ no data leaves your device.Mano-P ๆฏไธไธชๅผๆบ GUI-VLA ้กน็ฎ๏ผๆฏๆๅจ Mac mini/MacBook ไธๆ้่ฟ็ฎๅๆฃๆฌๅฐ่ฟ่กๆจ็๏ผๅฎ็ฐ็บฏ่ง่ง้ฉฑๅจ็่ทจๅนณๅฐ GUI ่ชๅจๅๆไฝใๆฐๆฎๅฎๅ จๆฌๅฐๅค็๏ผๆฏๆๅคๆๅคๆญฅ้ชคไปปๅก่งๅไธๆง่กใ
OpenAI and Anthropic compatible server for Apple Silicon. Run LLMs and vision-language models (Llama, Qwen-VL, LLaVA) with continuous batching, MCP tool calling, and multimodal support. Native MLX backend, 400+ tok/s. Works with Claude Code.
Resource, examples & tutorials for multimodal AI, RAG and agents using vector search and LLMs
ๅฐ้ฅๆ็ดข๏ผๅฌๆไฝ ็่ฏใ็ๆไฝ ็ๅพ๏ผ็จAIๆพๅฐๆฌๅฐไปปไฝๆไปถใ่ฎฉๆ็ดขๅ่ๅคฉไธๆ ท็ฎๅใXiaoyaoSearch: Understands your words, reads your images, finds any local file with AI. Making search as easy as chatting.
GPT-image-2 and seedance2 workflows and prompt templates to produce high-quality AI videos.
[ICML 2026] ByteDance's All-in-One Video Generation Model for Human-Object Interaction Video Generation
EVA OS โ A real-time multimodal AIOS for next-generation hardware, enabling your devices being โaliveโ and as intelligent as a real brain.
This GitHub repository contains the complete code for building Business-Ready Generative AI Systems (GenAISys) from scratch. It guides you through architecting and implementing advanced AI controllers, intelligent agents, and dynamic RAG frameworks. The projects demonstrate practical applications across various domains.
InferrLM - On-device AI for iOS & Android
๐ Snappy's unique approach unifies vision-language late interaction with structured OCR for region-level knowledge retrieval. Like the project? Drop a star! โญ
Server-side video workflows for agents: ingest, understand, search, edit, stream.
A proactive AI agent for secure, traceable, human-in-the-loop task execution over long-running workflows.
๐ญ Real-time voice-controlled 3D avatar with multimodal AI - speak naturally and watch your AI companion respond with perfect lip-sync
A modern multimodal knowledge graph with type-specific metadata across biomedical domains.
Hub for researchers exploring VLMs and Multimodal Learning:)
A web app that dynamically generates playable 'Spot the Difference' games from a single text prompt using a multimodal pipeline with Google's Gemini and Imagen models.
AI-powered tool to turn long videos into short, viral-ready clips. Combines transcription, speaker diarization, scene detection & 9:16 resizing โ perfect for creators & smart automation.
[๐ง๐๐ญ๐ฎ๐ซ๐ ๐ฆ๐๐๐ก๐ข๐ง๐ ๐ข๐ง๐ญ๐๐ฅ๐ฅ๐ข๐ ๐ง๐๐๐] ImmunoStruct enables multimodal deep learning for immunogenicity prediction
Add a description, image, and links to the multimodal-ai topic page so that developers can more easily learn about it.
To associate your repository with the multimodal-ai topic, visit your repo's landing page and select "manage topics."