-
gerzz.inc
- shanghai
- dubbing-ai.com dubbingai.io
Stars
SpikeMamba presents a novel integration of spiking neural networks (SNNs) with the Mamba state space model architecture, investigating the potential for biologically-inspired temporal dynamics in l…
Resources to develop programming and software development skills
Collection of extracted System Prompts from popular chatbots like ChatGPT, Claude & Gemini
[EMNLP 2025 Demo] PDF scientific paper translation with preserved formats - 基于 AI 完整保留排版的 PDF 文档全文双语翻译,支持 Google/DeepL/Ollama/OpenAI 等服务,提供 CLI/GUI/MCP/Docker/Zotero
Official implementation: "AudioSet-R: A Refined AudioSet with Multi-Stage LLM Label Reannotation"
[ICML 2025] PyTorch Implementation of "OmniAudio: Generating Spatial Audio from 360-Degree Video"
Towards Fine-grained Audio Captioning with Multimodal Contextual Cues
This is a Phi Family of SLMs book for getting started with Phi Models. Phi a family of open sourced AI models developed by Microsoft. Phi models are the most capable and cost-effective small langua…
[ICASSP 2025] "FLowHigh: Towards efficient and high-quality audio super-resolution with single-step flow matching"
Repository of ACL2023 paper: Unbalanced Optimal Transport for Unbalanced Word Alignment
LibriTTS-P: A Corpus with Speaking Style and Speaker Identity Prompts for Text-to-Speech and Style Captioning
Implementing DeepSeek R1's GRPO algorithm from scratch
SkyReels-V2: Infinite-length Film Generative model
A list of publicly available room impulse response datasets and scripts to download them.
The dataset comprises of 5000 hours speech corpus in Akan, Ewe, Dagbani, Daagare, and Ikposo. Each language includes 1000 hours of audio speech from indigenous speakers of the language. Of which 10…
A free, licensed, and industrial animation dataset
Python tool for converting files and office documents to Markdown.
[TACL 2024] MAPS enables LLMs🤖 to mimic the human😁 translation process.
Official implementation of paper: Frame-Wise Breath Detection with Self-Training: An Exploration of Enhancing Breath Naturalness in Text-to-Speech
Official repository of "SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory"
Guiding Frame-Level CTC Alignments Using Self-knowledge Distillation
A Chinese Expressive Long-dialogue Speech Dataset with Scripts
A piano music dataset with Audio, Symbolic and Text labels
A neural network layer API and library for sequence modeling, designed for easy creation of sequence models that can be executed layerwise (training) and stepwise (sampling).