Stars
- All languages
- Assembly
- Batchfile
- Bicep
- Bikeshed
- C
- C#
- C++
- CMake
- CSS
- Clojure
- Cuda
- Cython
- Dart
- Dockerfile
- Elixir
- Emacs Lisp
- Go
- HLSL
- HTML
- Haxe
- Java
- JavaScript
- Jsonnet
- Julia
- Jupyter Notebook
- Kotlin
- LLVM
- Lua
- MATLAB
- MDX
- MLIR
- Macaulay2
- Makefile
- Markdown
- Max
- Mojo
- Nim
- Nix
- OCaml
- Objective-C
- PHP
- Perl
- PostScript
- PowerShell
- Processing
- Pure Data
- Python
- QML
- R
- Rich Text Format
- Roff
- Ruby
- Rust
- SCSS
- SQLPL
- Scala
- Shell
- Solidity
- Svelte
- Swift
- TeX
- TypeScript
- TypeSpec
- V
- Vim Script
- Vue
- Zig
Truly universal encoding detector in pure Python.
Open-Source Chrome extension for AI-powered web automation. Run multi-agent workflows using your own LLM API key. Alternative to OpenAI Operator.
Noise supression using deep filtering
Easy to use stem (e.g. instrumental/vocals) separation from CLI or as a python package, using a variety of amazing pre-trained models (primarily from UVR)
A novel media player that allows you to navigate by speaker
⚡ Accelerate speaker diarization with Senko, processing 1 hour of audio in just 5 seconds on powerful hardware—boost your audio analysis efficiency.
LLM story writer with a focus on high-quality long output based on a user provided prompt.
A ComfyUI custom node integration for multi-engine multi-language Text-to-Speech and Voice Conversion. Supports: RVC, IndexTTS-2, Chatterbox (classic and multilingual 23-lang), F5-TTS, Higgs Audio …
Self-host the powerful Chatterbox TTS model. This server offers a user-friendly Web UI, flexible API endpoints (incl. OpenAI compatible), predefined voices, voice cloning, and large audiobook-scale…
Modified version of Chatterbox that accepts text files as input and no character restrictions. I use it to make audiobooks, especially for my kids.
Gradio WebUI for creators and developers, featuring key TTS (Edge-TTS, kokoro) and zero-shot Voice Cloning (E2 & F5-TTS, CosyVoice), with Whisper audio processing, YouTube download, Demucs vocal is…
Video translation and dubbing tool powered by LLMs. The video translator offers 100 language translations and one-click full-process deployment. The video translation output is optimized for platfo…
智能视频多语言AI配音/翻译工具 - Linly-Dubbing — “AI赋能,语言无界”
A flask built web app that leverages the power of OpenAI's whisper model to transcribe audio and video files. Has support for various file formats. Generates timestamped .srt files.
A small wrapper package around whisper-timestamped. Create force-aligned transcription TextGrids from raw audio!
Gradio WebUI for whisper, faster-whisper, whisper-timestamped. Supports YouTube Downloader, Vocal Remover and Transcription.
A robust audio transcription tool using OpenAI's Whisper API. Handles files of any length by automatically splitting them into chunks, with progress tracking and timestamped output.
Comprehensive Gradio WebUI for audio processing, powered by Whisper engines (Whisper, Faster-Whisper, Whisper-Timestamped). Features Voice Changer(RVC), zero-shot Voice Cloning (E2, F5-TTS), YouTub…
🎬 Clipify: Instantly transform long videos into engaging, social media-ready clips with cutting-edge AI technology.
[CVPR-2025] The official code of HunyuanPortrait: Implicit Condition Control for Enhanced Portrait Animation
Phoneme Recognition using pre-trained models Wav2vec2, HuBERT and WavLM. Throughout this project, we compared specifically three different self-supervised models, Wav2vec (2019, 2020), HuBERT (2021…
An open-source RAG-based tool for chatting with your documents.
The Open-Source Multimodal AI Agent Stack: Connecting Cutting-Edge AI Models and Agent Infra
Tired of boring PDFs? Want to inject some chaotic energy into your documents? PDF2BRAINROT is here to help! This script takes your standard PDF files and transforms them into dynamic, attention-gra…
Audio Reactivity Nodes for ComfyUI 🔊 Create AI generated audio-driven animations. Compatible with IPAdapter, ControlNets, AnimateDiff...