Stars
Text-audio foundation model from Boson AI
AI app store powered by 24/7 desktop history. open source | 100% local | dev friendly | 24/7 screen, mic recording
š Beautiful and accessible drag and drop for lists with React. āļø Star to support our work!
A Conversational Speech Generation Model
coss.com is the new holding company of cal.com, the pioneers of open source scheduling infrastructure and cal.com continues to be the 'google search' of our alphabet.
A simple screen parsing tool towards pure vision based GUI agent
āļø TypeScript style guide, formatter, and linter.
š Fast and simple Node.js version manager, built in Rust
Scan for React performance issues and eliminate slow renders in your app
Tencent Hunyuan3D-1.0: A Unified Framework for Text-to-3D and Image-to-3D Generation
Mediapipe faceLandmarker demo project
[CVPR 2025] Official repository for "Dora: Sampling and Benchmarking for 3D Shape Variational Auto-Encoders"
TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching
Human: AI-powered 3D Face Detection & Rotation Tracking, Face Description & Recognition, Body Pose Tracking, 3D Hand & Finger Tracking, Iris Analysis, Age & Gender & Emotion Prediction, Gaze Trackiā¦
Stable diffusion for real-time music generation
Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllableā¦
State-of-the-art Machine Learning for the web. Run š¤ Transformers directly in your browser, with no need for a server!
ML-powered speech recognition directly in your browser
An enterprise-class UI design language and React UI library
Official repo for paper "Structured 3D Latents for Scalable and Versatile 3D Generation" (CVPR'25 Spotlight).
Write PIXI apps using React declarative style
[ICCV 2025] Official impl. of "MV-Adapter: Multi-view Consistent Image Generation Made Easy"
A powerful tool that translates ComfyUI workflows into executable Python code.
[ECCV2024] IDM-VTON : Improving Diffusion Models for Authentic Virtual Try-on in the Wild
Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec.