This is the official code for the Quartet FP4 training paper
[UPDATE 28.09.25]: Quartet has been accepted to NeurIPS 2025!
[UPDATE 28.09.25]: Check out our latest work on MXFP4/NVFP4 for PTQ.
This work was presented at the GPU MODE lecture cycle
Create a conda environment and install dependencies (we recommend Python 3.11):
conda create -n env python=3.11
conda activate envInstall the requirements (we recommend to install torch from specific channels and compile fast_hadamard_transform from source):
pip install -r requirements.txtRun a pseudo-quantization e2e MXFP4 pre-training with:
bash main_setup.shThe above command trains a 30M parameters model with the Llama-style architecture on 3B tokens.
Quartet kernels are released as part of the QuTLASS library and the FP-Quant training/inference addon to transformers and vLLM.
@misc{castro2025quartetnativefp4training,
title={Quartet: Native FP4 Training Can Be Optimal for Large Language Models},
author={Roberto L. Castro and Andrei Panferov and Soroush Tabesh and Oliver Sieberling and Jiale Chen and Mahdi Nikdan and Saleh Ashkboos and Dan Alistarh},
year={2025},
eprint={2505.14669},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2505.14669},
}