DJ LLM

Fine-tuning multimodal LLMs to be world-class DJs 🎵

🚧 This is a work in progress! A release will be made once it's ready.

Tasks

Song Structure Analysis — Identify sections like intro, verse, chorus, drop, and outro
BPM Estimation — Estimate the tempo or rhythmic feel of a track
Key and Chord Detection — Detect musical key and chord progressions
Genre Classification — Classify the track into one or more genres
Mood and Energy Analysis — Tag tracks with emotional and intensity labels
Cue Point Recommendation — Suggest where to start or end playback for mixing
Instrumental and Vocal Presence Detection — Identify if a track has vocals or is instrumental
Loop Region Suggestion — Find sections that can be looped smoothly
Drop Detection — Locate the most impactful or climactic moment

Dataset

A novel annotated dataset of music licensed under Creative Commons is introduced. The annotations are provided as metadata for each audio file, containing information such as song sections, BPM, key, chord progression, genre, mood, energy, cue points, instrumental and vocal sections, loopable regions, and beat drops.

Preparation

To facilitate the process of fetching, reviewing, and selecting music from ccMixter, the following scripts have been created (they need to be run in this order):

uv run dataset/fetch_ccmixter.py uses ccMixter's API to fetch the list of all uploads with a CC BY license, saving the data as JSONL to dataset/ccmixter_data.jsonl. This script must be run first.
uv run dataset/select_ccmixter.py provides a Terminal User Interface (TUI) to navigate, view, listen to, and select uploads to be included in the dataset. The selected upload IDs are saved one per line to dataset/selected_uploads.txt.

uv run dataset/download_ccmixter.py downloads the selected uploads, saving them to dataset/music/<upload_id>_<file_index>.mp3. It currently only downloads the first file of each upload, if it's in MP3 format.

LLMs

The provided dataset can be used to fine-tune any multimodal LLM suitable for audio understanding, capable of simultaneously processing text and audio inputs.

The project is currently based on Qwen3-Omni. Support for more advanced and smaller multimodal models is planned.

Inference

The baseline or fine-tuned LLM should be run via Gradio, which provides an API for evaluations and the demo app. This requires a GPU with a sufficiently large VRAM (e.g. Nvidia H100).

Qwen3-Omni has an official HuggingFace space. As an example, the Gradio API address of this space is https://qwen-qwen3-omni-demo.hf.space/. However, the model should be run on a local machine with a suitable GPU or via a cloud GPU provider.

Once Gradio is running, inference can be performed using the inference/infer.py script:

uv run inference/infer.py \
  --client https://qwen-qwen3-omni-demo.hf.space/ \
  --text "Estimate the BPM (beats per minute) of this track. 
  Provide your answer as a single numerical value representing 
  the tempo." \
  --audio ~/Music/Test.mp3
120

Fine-tuning

Evaluation

Running

Results

Task	Baseline Accuracy	Fine-Tuned Accuracy
Song Structure Analysis
BPM Estimation
Key and Chord Detection
Genre Classification
Mood and Energy Analysis
Cue Point Recommendation
Instrumental and Vocal Presence Detection
Loop Region Suggestion
Drop Detection

Demo

Author

A project by Mohammad Tomaraei.

Citation

@misc{tomaraei2025,
      title = {DJ LLM: Fine-tuning multimodal LLMs to be world-class DJs},
      author = {Mohammad Tomaraei},
      year = {2025},
      url = {https://github.com/themreza/DJ-LLM},
}

Credits

Qwen3-Omni is a large language model (LLM) developed by the Qwen team at Alibaba Cloud
ms-swift is a fine-tuning framework developed by the ModelScope community
evalscope is an LLM evaluation framework developed by the ModelScope community
The music files used in the dataset are licensed under Creative Commons (please see dataset/ATTRIBUTION.csv for a complete list of attributions)
The DJ LLM logo was generated with Microsoft Copilot and animated with OpenAI's Sora 2 Pro

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
assets/images		assets/images
dataset		dataset
demo		demo
evaluation		evaluation
fine-tuning		fine-tuning
inference		inference
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DJ LLM

Tasks

Dataset

Preparation

LLMs

Inference

Fine-tuning

Evaluation

Running

Results

Demo

Author

Citation

Credits

About

Uh oh!

Releases

Packages

Languages

License

themreza/DJ-LLM

Folders and files

Latest commit

History

Repository files navigation

DJ LLM

Tasks

Dataset

Preparation

LLMs

Inference

Fine-tuning

Evaluation

Running

Results

Demo

Author

Citation

Credits

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages