Thanks to visit codestin.com
Credit goes to github.com

Skip to content

IST-DASLab/Quartet

Repository files navigation

Quartet: Native FP4 Training Can Be Optimal for Large Language Models

This is the official code for the Quartet FP4 training paper arXiv

[UPDATE 28.09.25]: Quartet has been accepted to NeurIPS 2025!

[UPDATE 28.09.25]: Check out our latest work on MXFP4/NVFP4 for PTQ.

This work was presented at the GPU MODE lecture cycle YouTube

Quickstart

Create a conda environment and install dependencies (we recommend Python 3.11):

conda create -n env python=3.11
conda activate env

Install the requirements (we recommend to install torch from specific channels and compile fast_hadamard_transform from source):

pip install -r requirements.txt

Run a pseudo-quantization e2e MXFP4 pre-training with:

bash main_setup.sh

The above command trains a 30M parameters model with the Llama-style architecture on 3B tokens.

MXFP4 Kernels

Quartet kernels are released as part of the QuTLASS library and the FP-Quant training/inference addon to transformers and vLLM.

Cite This Work

@misc{castro2025quartetnativefp4training,
      title={Quartet: Native FP4 Training Can Be Optimal for Large Language Models}, 
      author={Roberto L. Castro and Andrei Panferov and Soroush Tabesh and Oliver Sieberling and Jiale Chen and Mahdi Nikdan and Saleh Ashkboos and Dan Alistarh},
      year={2025},
      eprint={2505.14669},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2505.14669}, 
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •