Thanks to visit codestin.com
Credit goes to github.com

Skip to content
View mrjunjieli's full-sized avatar
😀
Focusing
😀
Focusing

Organizations

@TJUCocktailParty

Block or report mrjunjieli

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

[ACM MM Award] AV-Deepfake1M: A Large-Scale LLM-Driven Audio-Visual Deepfake Dataset

Python 163 11 Updated Aug 3, 2025

official implementation of paper ExPO: Explainable Phonetic Trait-Oriented Network for Speaker Verification

Python 12 Updated Mar 14, 2025

轻量、灵活、易上手的Python剪映草稿生成及导出工具,构建全自动化视频剪辑/混剪流水线。本项目的CapCut版本正于 https://github.com/GuanYixuan/pyCapCut 内开发

Python 2,095 431 Updated Sep 12, 2025

Di♪♪Rhythm: Blazingly Fast and Embarrassingly Simple End-to-End Full-Length Song Generation with Latent Diffusion

Python 2,002 224 Updated Sep 2, 2025

🚀AI拟声: 5秒内克隆您的声音并生成任意语音内容 Clone a voice in 5 seconds to generate arbitrary speech in real-time

Python 36,709 5,272 Updated Nov 15, 2024

Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and video, as well as generating speech in real time.

Jupyter Notebook 2,737 152 Updated Oct 9, 2025

MiMo: Unlocking the Reasoning Potential of Language Model – From Pretraining to Posttraining

Python 1,602 67 Updated Jun 5, 2025

HighRateMOS is the first non-intrusive MOS prediction model that explicitly models sampling rates, achieving first place in five out of eight metrics in AudioMOS Challenge 2025 Track3.

10 Updated Sep 15, 2025
Python 3 Updated Jun 30, 2025
Python 96 13 Updated Apr 29, 2025

This repo contains my attempt to create a Speaker Recognition and Verification system using SideKit-1.3.1

Python 113 33 Updated May 22, 2019

An implementation of local windowed attention for language modeling

Python 483 51 Updated Jul 16, 2025

This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows".

Python 15,337 2,188 Updated Jul 24, 2024

Experiments around a simple idea for inducing multiple hierarchical predictive model within a GPT

Python 222 13 Updated Aug 20, 2024

A PyTorch implementation of the Transformer model in "Attention is All You Need".

Python 9,462 2,052 Updated Apr 16, 2024

Mamba SSM architecture

Python 16,197 1,474 Updated Oct 10, 2025

[NeurIPS 2025] Benchmark data and code for MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix

Python 173 4 Updated Jun 6, 2025

[ACM CCS'24] SafeEar: Content Privacy-Preserving Audio Deepfake Detection

Python 165 18 Updated Mar 24, 2025

AudioTrust: Benchmarking the Multi-faceted Trustworthiness of Audio Large Language Models

Shell 208 22 Updated Sep 30, 2025

[ACL 2024] Official PyTorch code for extracting features and training downstream models with emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation

Python 974 73 Updated Dec 23, 2024

A novel human-interaction method for real-time speech extraction on headphones.

Python 585 65 Updated Jun 5, 2024

EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine

Python 8,355 735 Updated Aug 13, 2024

📚 从零开始的大语言模型原理与实践教程

Jupyter Notebook 20,470 1,785 Updated Oct 17, 2025

Simple project webpage template. Originally used in Colorful Image Colorization. ECCV, 2016.

HTML 478 164 Updated Oct 20, 2020

Spark-TTS Inference Code

Python 10,634 1,130 Updated Apr 9, 2025

The most cited deep learning papers

TeX 26,015 4,463 Updated Jan 18, 2024

SoftVC VITS Singing Voice Conversion

Python 27,705 5,064 Updated Nov 11, 2023
Python 1,264 375 Updated Oct 5, 2025

Audio-JEPA is an adaptation of the Joint-Embedding Predictive Architecture (JEPA) for self-supervised audio representation learning. Built upon the I-JEPA paradigm, it uses a Vision Transformer (Vi…

Python 29 Updated Jun 17, 2025

zero-shot voice conversion & singing voice conversion, with real-time support

Python 3,339 390 Updated Apr 20, 2025
Next