Thanks to visit codestin.com
Credit goes to Github.com

Skip to content

XiaomiMiMo/MiMo-Audio-Training

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Xiaomi-MiMo

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
MiMo-Audio-Training Toolkit
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━



Introduction

Welcome to the MiMo-Audio-Training toolkit! This toolkit is designed to fine-tune the XiaomiMiMo/MiMo-Audio-7B-Instruct. This toolkit serves as a reference implementation for researchers and developers interested in MiMo-Audio and looking to adapt it to their own custom tasks.

Supported Tasks

The MiMo-Audio-Eval toolkit supports a comprehensive set of tasks. Some of the key features include:

  • Tasks:

    • SFT:

      • ASR
      • TTS / InstructTTS
      • Audio Understanding and Reasoning
      • Spoken Dialogue

Getting Started

To get started with the MiMo-Audio-Training toolkit, follow the instructions below to set up the environment and install the required dependencies.

Prerequisites (Linux)

  • Python 3.12
  • CUDA >= 12.0

Installation:

git clone --recurse-submodules https://github.com/XiaomiMiMo/MiMo-Audio-Training
cd MiMo-Audio-Training
pip install -r requirements.txt
pip install flash-attn==2.7.4.post1
pip install -e .

Note

If the compilation of flash-attn takes too long, you can download the precompiled wheel and install it manually:

pip install /path/to/flash_attn-2.7.4.post1+cu12torch2.6cxx11abiFALSE-cp312-cp312-linux_x86_64.whl

Training Process:

Download the fine-tuning Dataset and pre-process the data as the instruct_template.md

Training

We provide multiple training scripts under the scripts directory, supporting both single-GPU and multi-GPU training setups.

cd MiMo-Audio-Training
bash scripts/train_multiGPU_torchrun.sh

Generate and Evaluation

Run inference using: generate.py

Evaluate the SFT model with 🌐MiMo-Audio-Eval.

Citation

@misc{coreteam2025mimoaudio,
      title={MiMo-Audio: Audio Language Models are Few-Shot Learners}, 
      author={LLM-Core-Team Xiaomi},
      year={2025},
      url={https://github.com/XiaomiMiMo/MiMo-Audio}, 
}

Contact

Please contact us at [email protected] or open an issue if you have any questions.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published