Codestin Search App

U-SAM: An Audio Language Model for Unified Speech, Audio, and Music Understanding

📄 Paper: arXiv:2505.13880
🗣️ Conference: Accepted to Interspeech 2025 🎉

Welcome to the official repository of U-SAM, an audio language model designed for unified speech, audio, and music understanding. U-SAM leverages powerful audio representations and language modeling to bridge multiple modalities in audio-centric tasks.

🚀 Features

🧠 Unified architecture for speech, general audio, and music tasks
🔊 Supports a wide range of audio-language applications
🔧 Easily extendable and scalable

📦 Repository Status

This is the official codebase of U-SAM.
📚 Detailed documentation, pretrained models are coming soon. Stay tuned! 🔥

📬 Stay Connected

For questions or collaboration inquiries, feel free to open an issue or reach out via email (listed in the paper).

📖 Citation

If you find U-SAM useful in your research or work, please consider citing our paper:

@misc{wang2025usamaudiolanguagemodel,
  title     = {U-SAM: An Audio Language Model for Unified Speech, Audio, and Music Understanding},
  author    = {Ziqian Wang and Xianjun Xia and Xinfa Zhu and Lei Xie},
  year      = {2025},
  eprint    = {2505.13880},
  archivePrefix = {arXiv},
  primaryClass  = {eess.AS},
  url       = {https://arxiv.org/abs/2505.13880}
}

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
conf		conf
data		data
models		models
utils		utils
.gitignore		.gitignore
dataset.py		dataset.py
readme.md		readme.md
requirement.txt		requirement.txt
test.py		test.py
test.sh		test.sh
tools.py		tools.py
train.py		train.py
train.sh		train.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

U-SAM: An Audio Language Model for Unified Speech, Audio, and Music Understanding

🚀 Features

📦 Repository Status

📬 Stay Connected

📖 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Honee-W/U-SAM

Folders and files

Latest commit

History

Repository files navigation

U-SAM: An Audio Language Model for Unified Speech, Audio, and Music Understanding

🚀 Features

📦 Repository Status

📬 Stay Connected

📖 Citation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages