mrjunjieli

😀

Focusing

李俊杰(Junjie LI) mrjunjieli

😀

Focusing

Ph.D Student in The Hong Kong Polytechnic University

48 followers · 53 following

The Hong Kong Polytechnic University
Hong Kong SAR, China
[email protected]

Achievements

Organizations

Lists (3)

Sort

Stars

ControlNet / AV-Deepfake1M

[ACM MM Award] AV-Deepfake1M: A Large-Scale LLM-Driven Audio-Visual Deepfake Dataset

Python 163 11 Updated Aug 3, 2025

mmmmayi / ExPO

official implementation of paper ExPO: Explainable Phonetic Trait-Oriented Network for Speaker Verification

Python 12 Updated Mar 14, 2025

GuanYixuan / pyJianYingDraft

轻量、灵活、易上手的Python剪映草稿生成及导出工具，构建全自动化视频剪辑/混剪流水线。本项目的CapCut版本正于 https://github.com/GuanYixuan/pyCapCut 内开发

Python 2,095 431 Updated Sep 12, 2025

ASLP-lab / DiffRhythm

Di♪♪Rhythm: Blazingly Fast and Embarrassingly Simple End-to-End Full-Length Song Generation with Latent Diffusion

Python 2,002 224 Updated Sep 2, 2025

babysor / MockingBird

🚀AI拟声: 5秒内克隆您的声音并生成任意语音内容 Clone a voice in 5 seconds to generate arbitrary speech in real-time

Python 36,709 5,272 Updated Nov 15, 2024

QwenLM / Qwen3-Omni

Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and video, as well as generating speech in real time.

Jupyter Notebook 2,737 152 Updated Oct 9, 2025

XiaomiMiMo / MiMo

MiMo: Unlocking the Reasoning Potential of Language Model – From Pretraining to Posttraining

Python 1,602 67 Updated Jun 5, 2025

rezzsl / HighRateMOS

HighRateMOS is the first non-intrusive MOS prediction model that explicitly models sampling rates, achieving first place in five out of eight metrics in AudioMOS Challenge 2025 Track3.

10 Updated Sep 15, 2025

youzhitu / confusionformer

Python 3 Updated Jun 30, 2025

Liu-Tianchi / Nes2Net

Python 96 13 Updated Apr 29, 2025

Anwarvic / Speaker-Recognition

This repo contains my attempt to create a Speaker Recognition and Verification system using SideKit-1.3.1

Python 113 33 Updated May 22, 2019

lucidrains / local-attention

An implementation of local windowed attention for language modeling

Python 483 51 Updated Jul 16, 2025

microsoft / Swin-Transformer

This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows".

Python 15,337 2,188 Updated Jul 24, 2024

lucidrains / simple-hierarchical-transformer

Experiments around a simple idea for inducing multiple hierarchical predictive model within a GPT

Python 222 13 Updated Aug 20, 2024

jadore801120 / attention-is-all-you-need-pytorch

A PyTorch implementation of the Transformer model in "Attention is All You Need".

Python 9,462 2,052 Updated Apr 16, 2024

state-spaces / mamba

Mamba SSM architecture

Python 16,197 1,474 Updated Oct 10, 2025

ddlBoJack / MMAR

[NeurIPS 2025] Benchmark data and code for MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix

Python 173 4 Updated Jun 6, 2025

LetterLiGo / SafeEar

[ACM CCS'24] SafeEar: Content Privacy-Preserving Audio Deepfake Detection

Python 165 18 Updated Mar 24, 2025

JusperLee / AudioTrust

AudioTrust: Benchmarking the Multi-faceted Trustworthiness of Audio Large Language Models

Shell 208 22 Updated Sep 30, 2025

ddlBoJack / emotion2vec

[ACL 2024] Official PyTorch code for extracting features and training downstream models with emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation

Python 974 73 Updated Dec 23, 2024

vb000 / LookOnceToHear

A novel human-interaction method for real-time speech extraction on headphones.

Python 585 65 Updated Jun 5, 2024

netease-youdao / EmotiVoice

EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine

Python 8,355 735 Updated Aug 13, 2024

datawhalechina / happy-llm

📚 从零开始的大语言模型原理与实践教程

Jupyter Notebook 20,470 1,785 Updated Oct 17, 2025

richzhang / webpage-template

Simple project webpage template. Originally used in Colorful Image Colorization. ECCV, 2016.

HTML 478 164 Updated Oct 20, 2020

SparkAudio / Spark-TTS

Spark-TTS Inference Code

Python 10,634 1,130 Updated Apr 9, 2025

terryum / awesome-deep-learning-papers

The most cited deep learning papers

TeX 26,015 4,463 Updated Jan 18, 2024

svc-develop-team / so-vits-svc

SoftVC VITS Singing Voice Conversion

Python 27,705 5,064 Updated Nov 11, 2023

k2-fsa / icefall

Python 1,264 375 Updated Oct 5, 2025

LudovicTuncay / Audio-JEPA

Audio-JEPA is an adaptation of the Joint-Embedding Predictive Architecture (JEPA) for self-supervised audio representation learning. Built upon the I-JEPA paradigm, it uses a Vision Transformer (Vi…

Python 29 Updated Jun 17, 2025

Plachtaa / seed-vc

zero-shot voice conversion & singing voice conversion, with real-time support

Python 3,339 390 Updated Apr 20, 2025

李俊杰(Junjie LI) mrjunjieli

Organizations

Lists (3)

🔮 Future ideas

✨ Inspiration

🚀 My stack

Stars