multimodal-ai

Star

Here are 547 public repositories matching this topic...

duixcom / Duix-Avatar

Star

🚀 Truly open-source AI avatar(digital human) toolkit for offline video generation and digital human cloning.

cloning video-generation digital-human cloning-tool ai-avatar ai-avatars video-synthesis multimodal-ai

Updated Apr 21, 2026
C

SamurAIGPT / Generative-Media-Skills

Star

Multi-modal Generative Media Skills for AI Agents (Claude Code, Cursor, Gemini CLI). High-quality image, video, and audio generation powered by muapi.ai.

Updated May 19, 2026
Shell

Mano-P: Open-source GUI-VLA agent for edge devices. #1 on OSWorld (specialized, 58.2%). Runs locally on Apple M4 Mac mini/MacBook — no data leaves your device.Mano-P 是一个开源 GUI-VLA 项目，支持在 Mac mini/MacBook 上或通过算力棒本地运行推理，实现纯视觉驱动的跨平台 GUI 自动化操作。数据完全本地处理，支持复杂多步骤任务规划与执行。

desktop-automation mano gui-automation edge-computing on-device-ai local-inference vision-language-action multimodal-ai gui-grounding osworld computer-use-agents visual-language-model mano-p

Updated May 22, 2026

waybarrios / vllm-mlx

Star

OpenAI and Anthropic compatible server for Apple Silicon. Run LLMs and vision-language models (Llama, Qwen-VL, LLaVA) with continuous batching, MCP tool calling, and multimodal support. Native MLX backend, 400+ tok/s. Works with Claude Code.

Updated May 21, 2026
Python

lancedb / vectordb-recipes

Star

Resource, examples & tutorials for multimodal AI, RAG and agents using vector search and LLMs

machine-learning ai deep-learning embeddings openai gpt agents fine-tuning multimodal rag vector-database llms langchain llama-index lancedb gpt-4-vision multimodal-ai

Updated Apr 24, 2026
Jupyter Notebook

dtsola / xiaoyaosearch

Star

小遥搜索，听懂你的话、看懂你的图，用AI找到本地任何文件。让搜索像聊天一样简单。XiaoyaoSearch: Understands your words, reads your images, finds any local file with AI. Making search as easy as chatting.

productivity mcp natural-language local-search semantic-search file-search ai-search document-search multimodal-ai agent-skills

Updated May 22, 2026
Python

EvoLinkAI / GPT-Image-2-Seedance2-Workflow

Star

GPT-image-2 and seedance2 workflows and prompt templates to produce high-quality AI videos.

Updated May 7, 2026
Python

Correr-Zhou / OmniShow

Star

[ICML 2026] ByteDance's All-in-One Video Generation Model for Human-Object Interaction Video Generation

computer-vision deep-learning large-models icml dit video-generation multimodal-deep-learning diffusion-models aigc multimodal-ai visual-generation mmdit icml-2026

Updated May 19, 2026
Python

AutoArk / EVA-OS

Star

EVA OS — A real-time multimodal AIOS for next-generation hardware, enabling your devices being “alive” and as intelligent as a real brain.

real-time robotics webrtc smart-devices voice-assistant aios multimodal-ai

Updated Mar 17, 2026
TypeScript

Denis2054 / Building-Business-Ready-Generative-AI-Systems

Star

This GitHub repository contains the complete code for building Business-Ready Generative AI Systems (GenAISys) from scratch. It guides you through architecting and implementing advanced AI controllers, intelligent agents, and dynamic RAG frameworks. The projects demonstrate practical applications across various domains.

multi-agent-systems ai-agents rag human-centered-ai llms chain-of-thought enterprise-ai agentic-ai ai-architecture multimodal-ai deepseek-r1 context-engineering generative-ai-systems

Updated Feb 11, 2026
Jupyter Notebook

sbhjt-gr / InferrLM

Star

InferrLM - On-device AI for iOS & Android

embeddings gemini http-server openai document-processing rag edge-ai on-device-ai local-inference anthropic llamacpp llama-cpp local-llm gguf multimodal-ai

Updated May 5, 2026
TypeScript

athrael-soju / Snappy

Star

🐊 Snappy's unique approach unifies vision-language late interaction with structured OCR for region-level knowledge retrieval. Like the project? Drop a star! ⭐

python docker typescript computer-vision nextjs document-retrieval rag fastapi vector-search document-understanding pdf-search vector-database vision-ai qdrant colpali multimodal-ai multivector-search deepseek-ocr visual-retrieval

Updated Feb 9, 2026
Python

video-db / skills

Star

Server-side video workflows for agents: ingest, understand, search, edit, stream.

ai skills amp opencode video-processing perception codex vlm claude realtime-video multimodal-ai videodb claude-code

Updated May 12, 2026
Python

placet-io / facio

Star

A proactive AI agent for secure, traceable, human-in-the-loop task execution over long-running workflows.

agent mcp opencode hermes harness human-in-the-loop multimodal approval-workflow ai-agent hitl agentic-workflow agentic-ai multimodal-ai claude-code traceable-ai auditable-ai openclaw openclaw-alternative proactive-agent

Updated May 22, 2026
Python

kiranbaby14 / TalkMateAI

Star

🎭 Real-time voice-controlled 3D avatar with multimodal AI - speak naturally and watch your AI companion respond with perfect lip-sync

websocket nextjs vlm fastapi huggingface whisper-ai flash-attention-2 multimodal-ai kokoro-tts smolvlm

Updated Jul 5, 2025
TypeScript

mims-harvard / OptimusKG

Star

A modern multimodal knowledge graph with type-specific metadata across biomedical domains.

python neo4j ontology knowledge-graph biomedical multimodal-data heterogeneous-graphs multimodal-ai graph-ai

Updated May 19, 2026
Python

thubZ09 / vision-language-model-research

Star

Hub for researchers exploring VLMs and Multimodal Learning:)

nlp machine-learning research computer-vision deep-learning multimodal-learning multimodal-deep-learning vision-language multimodal-large-language-models vlms multimodal-ai

Updated Mar 24, 2026

seehiong / prompt-to-puzzle

Star

A web app that dynamically generates playable 'Spot the Difference' games from a single text prompt using a multimodal pipeline with Google's Gemini and Imagen models.

react game typescript computer-vision html5-canvas puzzle-game generative-art text-to-image hackathon-project appwrite google-cloud-run generative-ai google-gemini google-ai-studio spot-the-difference multimodal-ai google-imagen

Updated Sep 13, 2025
TypeScript

alperensumeroglu / ai-clips-maker

Star

AI-powered tool to turn long videos into short, viral-ready clips. Combines transcription, speaker diarization, scene detection & 9:16 resizing — perfect for creators & smart automation.

Updated Apr 2, 2025
Python

KrishnaswamyLab / ImmunoStruct

Star

[𝐧𝐚𝐭𝐮𝐫𝐞 𝐦𝐚𝐜𝐡𝐢𝐧𝐞 𝐢𝐧𝐭𝐞𝐥𝐥𝐢𝐠𝐧𝐞𝐜𝐞] ImmunoStruct enables multimodal deep learning for immunogenicity prediction

Updated Mar 5, 2026
Python

Improve this page

Add a description, image, and links to the multimodal-ai topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the multimodal-ai topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

multimodal-ai

Here are 547 public repositories matching this topic...

duixcom / Duix-Avatar

SamurAIGPT / Generative-Media-Skills

Mininglamp-AI / Mano-P

waybarrios / vllm-mlx

lancedb / vectordb-recipes

dtsola / xiaoyaosearch

EvoLinkAI / GPT-Image-2-Seedance2-Workflow

Correr-Zhou / OmniShow

AutoArk / EVA-OS

Denis2054 / Building-Business-Ready-Generative-AI-Systems

sbhjt-gr / InferrLM

athrael-soju / Snappy

video-db / skills

placet-io / facio

kiranbaby14 / TalkMateAI

mims-harvard / OptimusKG

thubZ09 / vision-language-model-research

seehiong / prompt-to-puzzle

alperensumeroglu / ai-clips-maker

KrishnaswamyLab / ImmunoStruct

Improve this page

Add this topic to your repo