Thanks to visit codestin.com
Credit goes to Github.com

Skip to content

themreza/DJ-LLM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

17 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

DJ LLM

Fine-tuning multimodal LLMs to be world-class DJs 🎡

🚧 This is a work in progress! A release will be made once it's ready.

Tasks

  • Song Structure Analysis β€” Identify sections like intro, verse, chorus, drop, and outro
  • BPM Estimation β€” Estimate the tempo or rhythmic feel of a track
  • Key and Chord Detection β€” Detect musical key and chord progressions
  • Genre Classification β€” Classify the track into one or more genres
  • Mood and Energy Analysis β€” Tag tracks with emotional and intensity labels
  • Cue Point Recommendation β€” Suggest where to start or end playback for mixing
  • Instrumental and Vocal Presence Detection β€” Identify if a track has vocals or is instrumental
  • Loop Region Suggestion β€” Find sections that can be looped smoothly
  • Drop Detection β€” Locate the most impactful or climactic moment

Dataset

A novel annotated dataset of music licensed under Creative Commons is introduced. The annotations are provided as metadata for each audio file, containing information such as song sections, BPM, key, chord progression, genre, mood, energy, cue points, instrumental and vocal sections, loopable regions, and beat drops.

Preparation

To facilitate the process of fetching, reviewing, and selecting music from ccMixter, the following scripts have been created (they need to be run in this order):

  1. uv run dataset/fetch_ccmixter.py uses ccMixter's API to fetch the list of all uploads with a CC BY license, saving the data as JSONL to dataset/ccmixter_data.jsonl. This script must be run first.

  2. uv run dataset/select_ccmixter.py provides a Terminal User Interface (TUI) to navigate, view, listen to, and select uploads to be included in the dataset. The selected upload IDs are saved one per line to dataset/selected_uploads.txt.

  1. uv run dataset/download_ccmixter.py downloads the selected uploads, saving them to dataset/music/<upload_id>_<file_index>.mp3. It currently only downloads the first file of each upload, if it's in MP3 format.

LLMs

The provided dataset can be used to fine-tune any multimodal LLM suitable for audio understanding, capable of simultaneously processing text and audio inputs.

The project is currently based on Qwen3-Omni. Support for more advanced and smaller multimodal models is planned.

Inference

The baseline or fine-tuned LLM should be run via Gradio, which provides an API for evaluations and the demo app. This requires a GPU with a sufficiently large VRAM (e.g. Nvidia H100).

Qwen3-Omni has an official HuggingFace space. As an example, the Gradio API address of this space is https://qwen-qwen3-omni-demo.hf.space/. However, the model should be run on a local machine with a suitable GPU or via a cloud GPU provider.

Once Gradio is running, inference can be performed using the inference/infer.py script:

uv run inference/infer.py \
  --client https://qwen-qwen3-omni-demo.hf.space/ \
  --text "Estimate the BPM (beats per minute) of this track. 
  Provide your answer as a single numerical value representing 
  the tempo." \
  --audio ~/Music/Test.mp3
120

Fine-tuning

Evaluation

Running

Results

Task Baseline Accuracy Fine-Tuned Accuracy
Song Structure Analysis
BPM Estimation
Key and Chord Detection
Genre Classification
Mood and Energy Analysis
Cue Point Recommendation
Instrumental and Vocal Presence Detection
Loop Region Suggestion
Drop Detection

Demo

Author

A project by Mohammad Tomaraei.

Citation

@misc{tomaraei2025,
      title = {DJ LLM: Fine-tuning multimodal LLMs to be world-class DJs},
      author = {Mohammad Tomaraei},
      year = {2025},
      url = {https://github.com/themreza/DJ-LLM},
}

Credits

  • Qwen3-Omni is a large language model (LLM) developed by the Qwen team at Alibaba Cloud
  • ms-swift is a fine-tuning framework developed by the ModelScope community
  • evalscope is an LLM evaluation framework developed by the ModelScope community
  • The music files used in the dataset are licensed under Creative Commons (please see dataset/ATTRIBUTION.csv for a complete list of attributions)
  • The DJ LLM logo was generated with Microsoft Copilot and animated with OpenAI's Sora 2 Pro