Lists (1)
Sort Name ascending (A-Z)
Stars
MCP server for video/audio processing via FFmpeg - convert, compress, trim, extract audio, add subtitles
A polyglot document intelligence framework with a Rust core. Extract text, metadata, and structured information from PDFs, Office documents, images, and 50+ formats. Available for Rust, Python, Rub…
Transcription, forced alignment, and audio indexing with OpenAI's Whisper
An open-source AI agent that lives in your terminal.
Undetected Python version of the Playwright testing and automation library.
Long-form streaming TTS system for multi-speaker dialogue generation
Qwen-TTS offers a robust voice synthesis service using FastAPI, supporting bilingual and dialect options. Explore seamless audio generation on GitHub! 🚀🌟
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
Undetected version of the Playwright testing and automation library.
AI agents can now use real Android and iOS apps, just like a human.
A Python module to bypass Cloudflare's anti-bot page.
Proxy server to bypass Cloudflare protection
A self-contained, lightweight workflow engine with a built-in Web UI. Define workflows in a simple, declarative YAML format. Execute them anywhere, compose complex pipelines, and distribute tasks. …
Get started with building Fullstack Agents using Gemini 2.5 and LangGraph
⚛️ A feature rich notifications library for React Native.
An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System
ACE-Step: A Step Towards Music Generation Foundation Model
A TTS model capable of generating ultra-realistic dialogue in one pass.
A SOTA open-source image editing model, which aims to provide comparable performance against the closed-source models like GPT-4o and Gemini 2 Flash.
Python SDK, Proxy Server (AI Gateway) to call 100+ LLM APIs in OpenAI (or native) format, with cost tracking, guardrails, loadbalancing and logging. [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthr…
Model Context Protocol Server for Mobile Automation and Scraping (iOS, Android, Emulators, Simulators and Real Devices)
Lets make video diffusion practical!
[NeurIPS 2025] OmniSVG is the first family of end-to-end multimodal SVG generators that leverage pre-trained Vision-Language Models (VLMs), capable of generating complex and detailed SVGs, from sim…