📄 Paper: arXiv:2505.13880
🗣️ Conference: Accepted to Interspeech 2025 🎉
Welcome to the official repository of U-SAM, an audio language model designed for unified speech, audio, and music understanding. U-SAM leverages powerful audio representations and language modeling to bridge multiple modalities in audio-centric tasks.
- 🧠 Unified architecture for speech, general audio, and music tasks
- 🔊 Supports a wide range of audio-language applications
- 🔧 Easily extendable and scalable
This is the official codebase of U-SAM.
📚 Detailed documentation, pretrained models are coming soon. Stay tuned! 🔥
For questions or collaboration inquiries, feel free to open an issue or reach out via email (listed in the paper).
If you find U-SAM useful in your research or work, please consider citing our paper:
@misc{wang2025usamaudiolanguagemodel,
title = {U-SAM: An Audio Language Model for Unified Speech, Audio, and Music Understanding},
author = {Ziqian Wang and Xianjun Xia and Xinfa Zhu and Lei Xie},
year = {2025},
eprint = {2505.13880},
archivePrefix = {arXiv},
primaryClass = {eess.AS},
url = {https://arxiv.org/abs/2505.13880}
}