Subham Sekhar Sahoo*1, Zhihan Yang*2, Yash Akhauri†1, Johnna Liu†1, Deepansha Singh†1, Zhoujun Cheng†3, Zhengzhong Liu3, Eric Xing3, John Thickstun2, Arash Vahdat4
1Cornell Tech 2Cornell University 3MBZUAI 4NVIDIA
*Joint first authors †Joint second authors
Pre-print 2025
We propose Esoteric Language Models (Eso-LMs), a new framework for language modeling that fuses AR and MDM paradigms and outperforms the previous hybrid approach, BD3-LMs. Our model uses a revised attention mechanism to support both paradigms, and is trained with a hybrid loss—a combination of AR and MDM objectives—which allows it to interpolate smoothly between the two paradigms in terms of perplexity and sample quality. Further, ours is the first approach to enable KV caching for MDMs while preserving parallel generation, achieving up to 65× faster inference than standard MDMs and 4× faster inference than KV-cached semi-autoregressive baselines.
In this repository, we release both variants of Eso-LMs: Eso-LM (A) and Eso-LM (B).
main.py: Routines for training, evaluation, and generating samplestrainer_base.py: Base classes for AR and all kinds of discrete diffusion inalgo.pyalgo.py: Classes for AR, MDLM, EsoLMdataloader.py: Dataloadersutils.py: LR scheduler, logging,fsspechandling, etc.models/: Denoising network architectures. Supports DiT, EsoLM-DiT, and AR transformerconfigs/: Config files for algorithms/datasets/denoising networks/noise schedules/LR schedulesscripts/: Shell scripts for training, evaluation, and generating samples
To get started, create a conda environment containing the required dependencies.
conda create -n esolm python=3.9
conda activate esolm
pip install -r requirements.txtCreate the following directories to store saved models and slurm logs:
mkdir outputs
mkdir watch_folderRun training as a batch job:
sbatch scripts/esolm/train_owt_esolmb_alpha0_1.shModify DATA_DIR and CHECKPOINT_DIR accordingly within the bash script.
Logging is done with wandb. Configure entity and project in configs/config.yaml to your own.
Download our Eso-LM (B) checkpoints trained on OpenWebText from this Google Drive folder.
Run evaluation as a batch job:
sbatch scripts/esolm/eval_owt_esolmb.sh \
--alpha_0 1 \
--batch_split 1 \
--ckpt_path folder/esolmb-alpha0-1-250k.ckptBy default, this bash script occupies 4 GPUs.
The values of alpha_0 and batch_split used for evaluation must be the same as the ones used for training.
Download our Eso-LM (B) checkpoints trained on OpenWebText from this Google Drive folder.
Run sampling as a batch job (generate 8 samples):
sbatch scripts/esolm/gen_ppl_owt_esolmb.sh \
--alpha_0 1 \
--T 1024 \
--batch_size 8 \
--num_batches 1 \
--ckpt_path folder/esolmb-alpha0-1-250k.ckptBy default, this bash script occupies a single GPU and uses a fixed seed.
The value of alpha_0 used for sampling can be different from the one used for training.
Adjust batch_size (must fit on your GPU) and num_batches to generate the desired total number of samples.
This repository was built off of DUO.
@misc{sahoo2025esotericlanguagemodels,
title={Esoteric Language Models},
author={Subham Sekhar Sahoo and Zhihan Yang and Yash Akhauri and Johnna Liu and Deepansha Singh and Zhoujun Cheng and Zhengzhong Liu and Eric Xing and John Thickstun and Arash Vahdat},
year={2025},
eprint={2506.01928},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2506.01928},
}