Thanks to visit codestin.com
Credit goes to Github.com

Skip to content
View jpWang's full-sized avatar
  • South China University of Technology
  • Guangzhou, China

Organizations

@SCUT-DLVCLab

Block or report jpWang

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

🎯 Read research papers faster with AI. Resophy is an HTML-based AI paper reader with: 🤖 AI Translation & Analysis — instantly understand structure, contributions, and results 🚀 Daily arXiv Recommen…

Python 96 3 Updated Dec 18, 2025

Scalable toolkit for efficient model reinforcement

Python 1,163 201 Updated Dec 23, 2025

Official implementation of URaG: Unified Retrieval and Generation in Multimodal LLMs for Efficient Long Document Understanding (AAAI 2026 Oral).

31 Updated Nov 14, 2025

Tongyi Deep Research, the Leading Open-source Deep Research Agent

Python 17,697 1,355 Updated Dec 17, 2025

This repository collects papers for "A Survey on Knowledge Distillation of Large Language Models". We break down KD into Knowledge Elicitation and Distillation Algorithms, and explore the Skill & V…

1,227 71 Updated Mar 9, 2025

VeOmni: Scaling Any Modality Model Training with Model-Centric Distributed Recipe Zoo

Python 1,443 122 Updated Dec 22, 2025

Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and video, as well as generating speech in real time.

Jupyter Notebook 3,154 193 Updated Oct 9, 2025

Ongoing research training transformer models at scale

Python 14,673 3,404 Updated Dec 23, 2025

ICASSP2026 HumDial Challenge

Python 28 3 Updated Dec 13, 2025

verl: Volcano Engine Reinforcement Learning for LLMs

Python 17,709 2,868 Updated Dec 23, 2025

MiMo-Audio: Audio Language Models are Few-Shot Learners

Python 908 87 Updated Sep 20, 2025

MCP for xiaohongshu.com

Go 7,598 1,192 Updated Dec 21, 2025

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

Python 16,343 3,243 Updated Dec 22, 2025

Step-Audio 2 is an end-to-end multi-modal large language model designed for industry-strength audio understanding and speech conversation.

Python 1,276 92 Updated Sep 22, 2025

OSUM & OSUM-EChat, open speech understanding model and empathetic spoken chatbot based on it, open-sourced by ASLP@NPU.

Python 461 30 Updated Nov 23, 2025

Official PyTorch code for Deep Audio-Signal Holistic Embeddings

Python 171 12 Updated Nov 7, 2025

Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM

Python 2,449 332 Updated Dec 22, 2025

Open-source framework for conversational voice AI agents

Python 9,370 1,098 Updated Dec 22, 2025

Fast and High-Quality Zero-Shot Text-to-Speech with Flow Matching

Python 744 104 Updated Dec 2, 2025

Speech-to-text, text-to-speech, speaker diarization, speech enhancement, source separation, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Andr…

C++ 9,415 1,040 Updated Dec 22, 2025

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 2 Updated Nov 22, 2025

Python library for audio and music analysis

Python 8,090 1,023 Updated Sep 16, 2025

This repository includes the code to reproduce our paper "Automatic speaker verification spoofing and deepfake detection using wav2vec 2.0 and data augmentation".

Python 154 36 Updated Sep 26, 2023

A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.

Python 14,095 1,462 Updated Dec 19, 2025

Research and Production Oriented Speaker Verification, Recognition and Diarization Toolkit

Python 1,123 169 Updated Dec 17, 2025

High-quality and streaming Speech-to-Speech interactive agent in a single file. 只用一个文件实现的流式全双工语音交互原型智能体!

Python 481 51 Updated Dec 15, 2025

Efficient audio understanding with general audio captions

Python 391 39 Updated Nov 3, 2025

A Python library for audio data augmentation. Useful for making audio ML models work well in the real world, not just in the lab.

Python 2,196 207 Updated Sep 26, 2025

Production First and Production Ready End-to-End Speech Recognition Toolkit

Python 4,964 1,170 Updated Dec 19, 2025
Next