Stars
Tien Kung-Lab: Direct IsaacLab Workflow for Legged Robots
a open framework for blind navigation based on esp32
Easily compute clip embeddings and build a clip retrieval system with them
The most accurate document search and store for building AI apps
Kimi-Audio, an open-source audio foundation model excelling in audio understanding, generation, and conversation
Kimi K2 is the large language model series developed by Moonshot AI team
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
Send a phone call from AI agent, in an API call. Or, directly call the bot from the configured phone number!
Ai call-center based on Large Language Model and FreeSWITCH.
基于SparkTTS、OrpheusTTS等模型,提供高质量中文语音合成与声音克隆服务。
Gradio WebUI for creators and developers, featuring key TTS (Edge-TTS, kokoro) and zero-shot Voice Cloning (E2 & F5-TTS, CosyVoice), with Whisper audio processing, YouTube download, Demucs vocal is…
Chat with your database or your datalake (SQL, CSV, parquet). PandasAI makes data analysis conversational using LLMs and RAG.
User-friendly AI Interface (Supports Ollama, OpenAI API, ...)
GIM: Learning Generalizable Image Matcher From Internet Videos (ICLR 2024 Spotlight)
Code for "LoFTR: Detector-Free Local Feature Matching with Transformers", CVPR 2021, T-PAMI 2022
[ICLR 2025] Hallo2: Long-Duration and High-Resolution Audio-driven Portrait Image Animation
[CVPR 2025] EchoMimicV2: Towards Striking, Simplified, and Semi-Body Human Animation
A code driven low-code builder, develop low-code app on your codebase.
⚡️HivisionIDPhotos: a lightweight and efficient AI ID photos tools. 一个轻量级的AI证件照制作算法。
text2vec, text to vector. 文本向量表征工具,把文本转化为向量矩阵,实现了Word2Vec、RankBM25、Sentence-BERT、CoSENT等文本表征、文本相似度计算模型,开箱即用。
Use Microsoft Edge's online text-to-speech service from Python WITHOUT needing Microsoft Edge or Windows or an API key
Agent framework and applications built upon Qwen>=3.0, featuring Function Calling, MCP, Code Interpreter, RAG, Chrome extension, etc.
Official implementations for paper: DreamTalk: When Expressive Talking Head Generation Meets Diffusion Probabilistic Models
Instant voice cloning by MIT and MyShell. Audio foundation model.
🔊 Text-Prompted Generative Audio Model
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
fay是一个帮助数字人(2.5d、3d、移动、pc、网页)或大语言模型(openai兼容、deepseek)连通业务系统的agent框架。