-
Zhipu Ai
- Beijing
Stars
GLM-TTS: Controllable & Emotion-Expressive Zero-shot TTS with Multi-Reward Reinforcement Learning
An Open Phone Agent Model & Framework. Unlocking the AI Phone for Everyone
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
TASU: A New Style of Alignment of Speech LLM with only Text Training Data, zero-shot on ASR and Other SU tasks
[EMNLP 2025 Demo] PDF scientific paper translation with preserved formats - 基于 AI 完整保留排版的 PDF 文档全文双语翻译,支持 Google/DeepL/Ollama/OpenAI 等服务,提供 CLI/GUI/MCP/Docker/Zotero
Flops counter for neural networks in pytorch framework
Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and video, as well as generating speech in real time.
MiMo-Audio: Audio Language Models are Few-Shot Learners
MiMo: Unlocking the Reasoning Potential of Language Model – From Pretraining to Posttraining
An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System
MeanAudio: Fast and Faithful Text-to-Audio Generation with Mean Flows
Agent OS is a system for better planning and executing software development tasks with your AI agents.
Production-ready Claude subagents collection with 100+ specialized AI agents for full-stack development, DevOps, data science, and business operations.
Allosaurus is a pretrained universal phone recognizer for more than 2000 languages
A book about Text-to-Speech (TTS) in Chinese.
Recommend new arxiv papers of your interest daily according to your Zotero libarary.
MOSS-TTSD is a spoken dialogue generation model that enables expressive dialogue speech synthesis in both Chinese and English, supporting zero-shot multi-speaker voice cloning, and long-form speech…
Text-audio foundation model from Boson AI
The TTSDS benchmark evaluates synthetic speech quality by considering prosody, speaker identity, and intelligibility, comparing these factors with real speech and noise datasets.
Implementing DeepSeek R1's GRPO algorithm from scratch
verl: Volcano Engine Reinforcement Learning for LLMs
[CVPR 2025] MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis
GLM-4.6V/4.5V/4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning
中文分词 词性标注 命名实体识别 依存句法分析 成分句法分析 语义依存分析 语义角色标注 指代消解 风格转换 语义相似度 新词发现 关键词短语提取 自动摘要 文本分类聚类 拼音简繁转换 自然语言处理