Stars
We Speech Toolkit, LLM based Speech Toolkit for Speech Understanding, Generation, and Interaction
🤗 smolagents: a barebones library for agents that think in code.
DeepTalk: Towards Seamless and Smart Speech Interaction with Adaptive Modality-Specific MoE
✨✨[NeurIPS 2025] VITA-Audio: Fast Interleaved Cross-Modal Token Generation for Efficient Large Speech-Language Model
Fast and memory-efficient exact attention
[SLT2024] DQ-Whisper: Joint Distillation and Quantization for Efficient Multilingual Speech Recognition
[ICASSP2024] One-Shot Sensitivity-Aware Mixed Sparsity Pruning for Large Language Models
[ICASSP2023] Joint Discriminator and Transfer Based Fast Domain Adaptation for End-to-End Speech Recognition
LUCY: Linguistic Understanding and Control Yielding Early Stage of Her
[MICCAI 2024] Can LLMs' Tuning Methods Work in Medical Multimodal Domain?
Fantastic Data Engineering for Large Language Models
✨✨Latest Advances on Multimodal Large Language Models
Implementation of the LLaMA language model based on nanoGPT. Supports flash attention, Int8 and GPTQ 4bit quantization, LoRA and LLaMA-Adapter fine-tuning, pre-training. Apache 2.0-licensed.
Stable Diffusion web UI