- Changsha, China
- https://longcw.github.io
Stars
🎦 Micam 是一个专为小米摄像头设计的 RTSP 桥接服务(非官方),能够将小米摄像头的视频流本地转推到RTSP服务器,支持接入 HomeAssistant、Go2rtc、Frigate、Scrypted、Homekit 等多种NVR和智能家居系统。该项目采用 Docker Compose 快速部署方案,基于小米官方的Miloco,并集成Go2rtc实现RTSP流服务,无需GPU即可运行…
A Fully Self-Hosted Solution for Full-Duplex Voice Interaction
Unlimited-length talking video generation that supports image-to-video and video-to-video generation
[ICCV 2025] Official Pytorch Implementation of FLOAT: Generative Motion Latent Flow Matching for Audio-driven Talking Portrait.
OpenAI Agents adapter for Livekit
A tool for Container Debloating that removes bloat and improves performance.
LiveKit Agent integrated with MCP server of Home Assistant
Turns any OpenAI voice agent into a lively visual agent with bitHuman SDK
Dockerized FastAPI wrapper for Kokoro-82M text-to-speech model w/CPU ONNX and NVIDIA GPU PyTorch support, handling, and auto-stitching
MuseTalk: Real-Time High Quality Lip Synchorization with Latent Space Inpainting
A debugging and profiling tool that can trace and visualize python code execution
coredumpy saves your crash site for post-mortem debugging
A lightweight, powerful framework for multi-agent workflows
Voice activity detector (VAD) for the browser with a simple API
The complete stack for AI Engineers: framework, runtime and control plane.
[CVPR2025] We present StableAnimator, the first end-to-end ID-preserving video diffusion framework, which synthesizes high-quality videos without any post-processing, conditioned on a reference ima…
RAGFlow is a leading open-source Retrieval-Augmented Generation (RAG) engine that fuses cutting-edge RAG with Agent capabilities to create a superior context layer for LLMs
Run your own AI cluster at home with everyday devices 📱💻 🖥️⌚
A powerful framework for building realtime voice AI agents 🤖🎙️📹
Human: AI-powered 3D Face Detection & Rotation Tracking, Face Description & Recognition, Body Pose Tracking, 3D Hand & Finger Tracking, Iris Analysis, Age & Gender & Emotion Prediction, Gaze Tracki…
Playground Web UI using segment-anything-2 models from the Meta.
Fast and accurate automatic speech recognition (ASR) for edge devices
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
Instant voice cloning by MIT and MyShell. Audio foundation model.