PAL: Probing Audio Encoders via LLMs

This repository contains the code and other resources for the paper PAL: Probing Audio Encoders via LLMs - A Study of Information Transfer from Audio Encoders to LLMs.

Project Page | Link to arXiv Paper

Abstract

Integration of audio perception into large language models (LLMs) is an emerging research area for enabling machine listening applications, yet efficient transfer of rich audio semantics from audio encoders to LLMs remains underexplored. The most widely used integration paradigm projects the audio encoder output tokens into the LLM input space (e.g., via an MLP or a Q-Former), then \emph{prepends or inserts} them to the text tokens. We refer to this generic scheme as \emph{Prepend to the LLM’s input token space (PLITS)} integration. We propose an efficient alternative, \underline{L}ightweight \underline{A}udio \underline{L}LM Integration \textbf{(LAL)}. LAL introduces audio representations solely via the attention mechanism within different layers of the LLM, bypassing its feedforward module. LAL encodes rich audio semantics at an appropriate level of abstraction for integration into different blocks of LLMs. Our design significantly reduces computational overhead compared to existing integration approaches. Observing with Whisper that the speech encoder benefits from PLITS integration, we propose an audio encoder aware approach for efficiently \underline{P}robing \underline{A}udio encoders via \underline{L}LM (\textbf{PAL}), which employs PLITS integration for Whisper and LAL for general audio encoders. Under an identical training curriculum, \textbf{LAL} consistently maintains performance or outperforms existing integration approaches across multiple base LLMs and tasks. For general audio tasks, LAL improvement is up to 30% over a strong PLITS baseline while reducing memory usage by up to 64.1% and increasing throughput by up to 247.5%. Furthermore, for general audio-music-speech LLM, \textbf{PAL}, performs on par with a fully PLITS integration-based system but with substantially improved computational and memory efficiency. Project page:\url{https://ta012.github.io/PAL/}

Status

The code and models will be made available here soon.

Citation

If you find our work useful, please consider citing our paper:

@misc{alex2025palprobingaudioencoders,
      title={PAL: Probing Audio Encoders via LLMs -- A Study of Information Transfer from Audio Encoders to LLMs}, 
      author={Tony Alex and Wish Suharitdamrong and Sara Atito and Armin Mustafa and Philip J. B. Jackson and Imran Razzak and Muhammad Awais},
      year={2025},
      eprint={2506.10423},
      archivePrefix={arXiv},
      primaryClass={cs.SD},
      url={https://arxiv.org/abs/2506.10423}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PAL: Probing Audio Encoders via LLMs

Abstract

Status

Citation

About

Uh oh!

Releases

Packages

ta012/PAL-AudioLLM

Folders and files

Latest commit

History

Repository files navigation

PAL: Probing Audio Encoders via LLMs

Abstract

Status

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages