Stars
Visualizer for neural network, deep learning and machine learning models
gengyuchao / GLM-ASR
Forked from zai-org/GLM-ASRGLM-ASR-Nano: A robust, open-source speech recognition model with 1.5B parameters
GLM-ASR-Nano: A robust, open-source speech recognition model with 1.5B parameters
An Open Phone Agent Model & Framework. Unlocking the AI Phone for Everyone
Multilingual Voice Understanding Model
Spec-driven development for AI coding assistants.
Voice-to-text app for macOS to transcribe what you say to text almost instantly
🔥 今日热榜 API,一个聚合热门数据的 API 接口,支持 RSS 模式 及 Vercel 部署 | 前端页面:https://github.com/imsyy/DailyHot
Ming-UniAudio: Speech LLM for Joint Understanding, Generation and Editing with Unified Representation
The AI Browser Automation Framework
MineContext is your proactive context-aware AI partner(Context-Engineering+ChatGPT Pulse)
Voice-to-text dictation app with local Whisper models and OpenAI API. Privacy-first, cross-platform, global hotkey activated.
A codebase for data crawling and preprocessing for TTS and ASR systems training.
Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and video, as well as generating speech in real time.
A feature-rich command-line audio/video downloader
MiMo-Audio: Audio Language Models are Few-Shot Learners
The Postgres development platform. Supabase gives you a dedicated Postgres database to build your web, mobile, and AI applications.
Production First and Production Ready End-to-End Speech Recognition Toolkit
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
Examples and guides for using the OpenAI API
Lightweight coding agent that runs in your terminal
AgentCPM-GUI: An on-device GUI agent for operating Android apps, enhancing reasoning ability with reinforcement fine-tuning for efficient task execution.
MCP-Universe is a comprehensive framework designed for developing, testing, and benchmarking AI agents
LEAKED SYSTEM PROMPTS FOR CHATGPT, GEMINI, GROK, CLAUDE, PERPLEXITY, CURSOR, DEVIN, REPLIT, AND MORE! - AI SYSTEMS TRANSPARENCY FOR ALL! 👐
gpt-oss-120b and gpt-oss-20b are two open-weight language models by OpenAI
Build resilient language agents as graphs.