[EMNLP 2025 Demo] PDF scientific paper translation with preserved formats - 基于 AI 完整保留排版的 PDF 文档全文双语翻译，支持 Google/DeepL/Ollama/OpenAI 等服务，提供 CLI/GUI/MCP/Docker/Zotero

Python 30,702 2,749 Updated Nov 25, 2025

sovrasov / flops-counter.pytorch

Flops counter for neural networks in pytorch framework

Python 2,958 308 Updated Aug 20, 2025

QwenLM / Qwen3-Omni

Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and video, as well as generating speech in real time.

Jupyter Notebook 3,140 192 Updated Oct 9, 2025

XiaomiMiMo / MiMo-Audio

MiMo-Audio: Audio Language Models are Few-Shot Learners

Python 905 87 Updated Sep 20, 2025

XiaomiMiMo / MiMo

MiMo: Unlocking the Reasoning Potential of Language Model – From Pretraining to Posttraining

Python 1,805 75 Updated Jun 5, 2025

index-tts / index-tts

An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System

Python 16,870 2,029 Updated Dec 2, 2025

xiquan-li / MeanAudio

MeanAudio: Fast and Faithful Text-to-Audio Generation with Mean Flows

Python 116 11 Updated Sep 2, 2025

buildermethods / agent-os

Agent OS is a system for better planning and executing software development tasks with your AI agents.

Shell 2,923 534 Updated Dec 11, 2025

VoltAgent / awesome-claude-code-subagents

Production-ready Claude subagents collection with 100+ specialized AI agents for full-stack development, DevOps, data science, and business operations.

6,076 650 Updated Dec 17, 2025

xinjli / allosaurus

Allosaurus is a pretrained universal phone recognizer for more than 2000 languages

Python 685 97 Updated Apr 26, 2024

cnlinxi / book-text-to-speech

A book about Text-to-Speech (TTS) in Chinese.

TeX 613 80 Updated Apr 19, 2022

TideDra / zotero-arxiv-daily

Recommend new arxiv papers of your interest daily according to your Zotero libarary.

Python 4,267 3,773 Updated Dec 17, 2025

OpenMOSS / MOSS-TTSD

MOSS-TTSD is a spoken dialogue generation model that enables expressive dialogue speech synthesis in both Chinese and English, supporting zero-shot multi-speaker voice cloning, and long-form speech…

Python 1,060 95 Updated Dec 8, 2025

boson-ai / higgs-audio

Text-audio foundation model from Boson AI

Python 7,754 577 Updated Sep 15, 2025

ttsds / ttsds

The TTSDS benchmark evaluates synthetic speech quality by considering prosody, speaker identity, and intelligibility, comparing these factors with real speech and noise datasets.

Python 72 5 Updated Sep 29, 2025

policy-gradient / GRPO-Zero

Implementing DeepSeek R1's GRPO algorithm from scratch

Python 1,716 81 Updated Apr 18, 2025

volcengine / verl

verl: Volcano Engine Reinforcement Learning for LLMs

Python 17,649 2,859 Updated Dec 20, 2025

hkchengrex / MMAudio

[CVPR 2025] MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis

Python 2,017 237 Updated Nov 30, 2025

mapull / chinese-dictionary

中文汉语拼音辞典，汉字拼音字典，词典，成语词典，常用字、多音字字典数据库

701 159 Updated Feb 4, 2025

zai-org / GLM-V

GLM-4.6V/4.5V/4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

Python 2,060 139 Updated Dec 18, 2025

hankcs / HanLP

中文分词词性标注命名实体识别依存句法分析成分句法分析语义依存分析语义角色标注指代消解风格转换语义相似度新词发现关键词短语提取自动摘要文本分类聚类拼音简繁转换自然语言处理

Python 36,001 10,882 Updated Nov 15, 2025

Tele-AI / TeleSpeech-ASR

Python 815 74 Updated Jun 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cocii Cocii

Block or report Cocii

Stars

zai-org / GLM-TTS

zai-org / Open-AutoGLM

FunAudioLLM / CosyVoice

stepfun-ai / Step-Audio-R1

character-ai / Ovi

PigeonDan1 / ps-slm

loks666 / get_jobs

ddlBoJack / Speech-Resources

PDFMathTranslate / PDFMathTranslate