Codestin Search App

Note

📢 Call for Collaboration: Training the Next Generation of Efficient dLLMs

DPad interpolates between semi-autoregressive and block diffusion models. Our next step is to explore this unique position further, aiming to create a model that achieves the training efficiency of Block Diffusion while preserving the crucial long-term planning capabilities of semi-autoregressive methods. This research could lead to a new class of models that are both powerful and scalable.

Advancing this research requires significant computational resources. We are actively seeking academic or industry partners with available GPU capacity who are interested in pioneering more efficient language models. If you'd like to collaborate, please reach out to us!

Email:

DPad: Efficient Diffusion Language Models with Suffix Dropout

📄 Paper

LLaDA-1.5 on GSM8K (1024 tokens)

Efficiency: DPad-enhanced dLLMs achieve up to a 61.39× speedup over vanilla dLLM baselines.
Accuracy: DPad-enhanced dLLMs achieve up to a +26.46% improvement over vanilla dLLM baselines.
(Evaluation conducted on NVIDIA A100-PCIe-80GB GPUs).

Diffusion Scratchpad (DPad) is a novel training-free inference paradigm that overcomes a key efficiency bottleneck in Diffusion Language Models (dLLMs): the high computational cost of full suffix attention. By intelligently pruning redundant suffix tokens, DPad achieves:

Up to a staggering 61.39x acceleration over vanilla dLLM baselines on long-sequence benchmarks (GSM8K, 1319 samples).
A significant improvement in strict-match accuracy on reasoning tasks by enhancing in-context learning.
Comparable or better generation quality on standard reasoning and coding benchmarks.
Seamless integration with existing optimizations like parallel decoding and prefix caching for multiplicative speedups.

This repository provides the code to reproduce our evaluation results.

Demo for LLaDA-1.5 on GSM8K (50 samples) (1024, 1-shot)

demo2.mp4

(Latency: Inference Time for 50 samples. F: Flexible-Match Accuracy, S: Strict-Match Accuracy)

🔥 News!

Aug 19, 2025: We've released our paper on Arxiv!

🤔 How It Works

DPad overcomes the high computational overhead of dLLMs, where models predict all future suffix tokens at each step while retaining only a small fraction.

1. The "Scratchpad" Insight: We identify that suffix tokens function as an information reservoir—a "scratchpad"—that collects signals from already decoded prefix tokens to guide generation. However, we found that most of these suffix tokens are redundant and their importance decays sharply with distance.

2. The Diffusion Lottery Tickets (DLT) Hypothesis: We find that even pruning high-attention "spike" tokens in the distant suffix has little effect on accuracy, as the model dynamically shifts its attention to nearby tokens. This suggests that a sparse subset of suffix tokens is sufficient. DPad acts as a training-free lottery ticket search, finding an efficient "winning ticket" for generation on the fly.

3. Suffix Dropout Mechanisms: DPad introduces two simple, training-free strategies to eliminate this redundancy before attention computation:

Sliding Window: Maintains a fixed-length suffix window, preventing computation from scaling with the full sequence length.
Distance-Decay Dropout: Progressively prunes distant suffix tokens using a Gaussian sampling strategy, focusing computation on the most relevant nearby tokens.

Overview of DPad vs. other generation methods:
(a) Autoregressive models generate one token at a time.
(b) Standard dLLMs attend to all suffix tokens, incurring high computational costs.
(c) DPad restricts attention to a small, nearby set of suffix tokens, eliminating redundant computation while preserving fidelity.

✨ Key Features & Modifications

This repository is built upon the Fast-dLLM codebase and incorporates the following key features and modifications to implement the DPad methodology:

Simplified Command-Line Interface: To simplify experiments, the original complex commands have been wrapped into a user-friendly run.py script. You can now run evaluations and generation with simple, intuitive arguments.
Dynamic Suffix Sampling (DPad Core): The core of DPad is implemented in sampler.py and integrated into the main generation pipelines (llada/generate.py for LLaDA and dream/model/generation_utils_block.py for Dream). This module applies distance-decay dropout within the sliding window before the decoding process of each block, efficiently pruning redundant suffix tokens.
Expanded Model Support: We have extended support to include the full semi-autoregressive mode for the Dream-Base model, enabling comprehensive evaluation across different dLLM architectures.
Adaptive Positional Embeddings (RoPE): We have modified the RoPE implementation to correctly handle the non-contiguous token sequences that result from our suffix dropout. This ensures each token retains its original positional information, maintaining the integrity of the model's spatial awareness.

📊 Performance Highlights

DPad delivers transformative speedups while maintaining or improving scores. Below is a comprehensive summary of performance on LLaDA-Instruct , LLaDA-1.5 and Dream-Base, comparing our method against the original vanilla baseline and the optimized parallel decoding variant (Fast-dLLM).

Performance on LLaDA-Instruct

Benchmark	Metric	Vanilla	+DPad	+Parallel (Fast-dLLM)	+Parallel+DPad (Ours)
GSM8K 4-shot	Latency(s) ↓	27.48	18.35 (1.50x)	8.55 (3.21x)	6.64 (4.14x)
	Flexible Acc. ↑	78.39	78.54	78.54	79.76
	Strict Acc. ↑	37.38	63.84	38.67	64.97
MATH 4-shot	Latency(s) ↓	25.40	21.61 (1.18x)	9.91 (2.56x)	9.20 (2.76x)
	Flexible Acc. ↑	33.58	33.42	33.40	33.30
	Strict Acc. ↑	8.42	28.04	8.76	27.98
HumanEval 0-shot	Latency(s) ↓	34.67	27.41 (1.26x)	11.48 (3.02x)	9.14 (3.79x)
HumanEval 0-shot	Acc. ↑	43.90	47.56	43.29	46.34
MBPP 3-shot	Latency(s) ↓	62.11	15.89 (3.91x)	14.26 (4.36x)	6.02 (10.32x)
MBPP 3-shot	Acc. ↑	15.00	40.40	15.00	39.40

Performance on LLaDA-1.5

Benchmark	Metric	Vanilla	+DPad	+Parallel (Fast-dLLM)	+Parallel+DPad (Ours)
GSM8K 4-shot	Latency(s) ↓	27.61	18.26 (1.51x)	8.06 (3.42x)	6.23 (4.43x)
	Flexible Acc. ↑	80.59	80.14	80.82	80.89
	Strict Acc. ↑	61.87	78.47	62.62	78.92
MATH 4-shot	Latency(s) ↓	25.12	20.63 (1.22x)	9.48 (2.65x)	8.57 (2.93x)
	Flexible Acc. ↑	33.52	34.08	33.60	32.92
	Strict Acc. ↑	32.72	37.00	32.92	35.96
HumanEval 0-shot	Latency(s) ↓	34.80	11.55 (3.01x)	11.16 (3.12x)	5.26 (6.61x)
HumanEval 0-shot	Acc. ↑	40.85	44.51	39.63	39.63
MBPP 3-shot	Latency(s) ↓	62.34	14.95 (4.17x)	5.47 (11.39x)	4.41 (14.14x)
MBPP 3-shot	Acc. ↑	38.20	39.80	38.60	41.60

Performance on Dream-Base

Benchmark	Metric	Vanilla	+DPad	+Parallel (Fast-dLLM)	+Parallel+DPad (Ours)
GSM8K 4-shot	Latency(s) ↓	22.30	10.27 (2.17x)	13.84 (1.61x)	5.24 (4.25x)
	Flexible Acc. ↑	75.06	75.28	75.51	74.83
	Strict Acc. ↑	74.37	75.06	74.83	74.75
MATH 4-shot	Latency(s) ↓	21.01	16.64 (1.26x)	8.82 (2.38x)	7.72 (2.72x)
	Flexible Acc. ↑	34.06	34.14	35.12	34.44
	Strict Acc. ↑	37.76	37.64	38.62	38.32
HumanEval 0-shot	Latency(s) ↓	28.49	8.20 (3.47x)	14.15 (2.01x)	4.06 (7.01x)
HumanEval 0-shot	Acc. ↑	51.22	51.22	53.05	52.44
MBPP 3-shot	Latency(s) ↓	49.15	41.36 (1.19x)	12.38 (3.97x)	9.86 (4.98x)
MBPP 3-shot	Acc. ↑	52.40	52.60	55.40	54.80

🚀 Scaling with Long Sequences & Other Optimizations

The true power of DPad emerges in long-sequence settings, where the cost of suffix attention becomes a dominant bottleneck. DPad's performance gains grow substantially with sequence length.

Furthermore, DPad is complementary to other dLLM optimizations. It targets the redundant computation of KV-tokens, while parallel decoding mitigates sequential dependencies. When combined, these approaches yield multiplicative speedups.

LLaDA-1.5 on GSM8K (1024 tokens)

Dream on Humaneval (1024/2048 tokens)

🚀 Usage Guide

1. Installation

First, clone the repository and set up the environment.

# Clone the repository
git clone [https://github.com/Crys-Chen/DPad.git](https://github.com/Crys-Chen/DPad.git)
cd DPad

# Create and activate a conda environment
conda create -n dpad python=3.10
conda activate dpad

# Install dependencies
pip install -r requirements.txt

2. Evaluation

All evaluation scripts are located in the llada/scripts and dream/scripts.

LLaDA

cd llada

bash ./scirpts/main_instruct.sh
bash ./scirpts/main_1.5.sh
bash ./scirpts/long_seq.sh

Results will be saved in the llada/output.

Dream

cd dream

bash ./scirpts/main_base.sh
# Dream-Instruct is coming soon
bash ./scirpts/long_seq.sh

Results will be saved in the dream/output.

❗️ Important Notice for HumanEval

The HumanEval benchmark requires a post-processing step to sanitize the generated code and calculate the final pass@1 score. After the evaluation script finishes, run the following command:
python postprocess_code.py {path/to/your/samples_humaneval_xxx.jsonl}
Replace the path with the actual path to your generated samples file, which can be found in the specified output_path.

3. Demo

To run the demo yourself, please follow these steps:

1. Set Up the Custom Evaluation Task

The demo requires a specific task configuration for the lm-evaluation-harness. You'll need to copy the provided YAML file into your Python environment's lm_eval package.

Copy from: ./demo/gsm8k-split.yaml
Copy to: .../site-packages/lm_eval/tasks/gsm8k/

For example, if you're using Conda, the command would look something like this:

cp ./demo/gsm8k-split.yaml /path/to/conda/envs/YOUR_ENV_NAME/lib/pythonX.X/site-packages/lm_eval/tasks/gsm8k/

2. Configure Dataset Paths

Next, you must edit the gsm8k-split.yaml file you just copied. Open it and modify lines 5-9 to point to the absolute path of the demo dataset on your machine.

# Edit this file: .../site-packages/lm_eval/tasks/gsm8k/gsm8k-split.yaml

dataset_kwargs:
  data_files:
    train: "/path/to/your/repo/dataset/gsm8k-50.arrow"
    validation: "/path/to/your/repo/dataset/gsm8k-50.arrow"
    test: "/path/to/your/repo/dataset/gsm8k-50.arrow"

Make sure to replace /path/to/your/repo/ with the correct path to this repository's root directory.

3. Run the Demo Script

Once configured, simply execute the demo script from the root of the repository:

bash ./demo/demo.sh

📚 Future Works

Integrate Suffix Dropout into Training: Future work can extend to incorporate the distance-decay dropout strategy directly into the Supervised Fine-Tuning (SFT) by modifying the training objective (see paper). This would allow a consistent match in posterior between training and inference, while reducing scratchpad inefficiency even better. We are currently in short of GPUs. If you have , Please feel free to contact us to cooperate!
Our next step is to integrate the distance-decay dropout strategy directly into the Supervised Fine-Tuning (SFT) process. By modifying the training objective, we can mitigate the training-inference distribution gap[cite: 706]. This approach will align the model's training with our efficient inference method, leading to even greater performance and helping the model avoid wasting capacity on redundant information[cite: 715, 717].
Call for Collaboration: Advancing this research requires significant computational resources. We are actively seeking academic or industry partners with available GPU capacity who are interested in this work. If you would like to collaborate on developing the next generation of highly efficient diffusion models, please contact us to explore this opportunity.

🙏 Acknowledgements

This codebase is directly inherited from Fast-dLLM and inspired by dLLM-Cache, with the foundations laid by the original LLaDA and Dream models. We thank their authors for making their work public. We are also grateful for the powerful open-source tools from HuggingFace team that made this research possible.

©️ Citation

If you find our work useful for your research, please consider citing our paper:

@misc{chen2025dpadefficientdiffusionlanguage,
      title={DPad: Efficient Diffusion Language Models with Suffix Dropout}, 
      author={Xinhua Chen and Sitao Huang and Cong Guo and Chiyue Wei and Yintao He and Jianyi Zhang and Hai "Hellen" Li and Yiran Chen},
      year={2025},
      eprint={2508.14148},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2508.14148}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
assets		assets
demo		demo
dream		dream
llada		llada
.DS_Store		.DS_Store
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

📢 Call for Collaboration: Training the Next Generation of Efficient dLLMs

DPad: Efficient Diffusion Language Models with Suffix Dropout

🔥 News!

Contents

🤔 How It Works

✨ Key Features & Modifications

📊 Performance Highlights

🚀 Scaling with Long Sequences & Other Optimizations

🚀 Usage Guide

1. Installation

2. Evaluation

LLaDA

Dream

❗️ Important Notice for HumanEval

3. Demo

📚 Future Works

🙏 Acknowledgements

©️ Citation

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

License

Crys-Chen/DPad

Folders and files

Latest commit

History

Repository files navigation

📢 Call for Collaboration: Training the Next Generation of Efficient dLLMs

DPad: Efficient Diffusion Language Models with Suffix Dropout

🔥 News!

Contents

🤔 How It Works

✨ Key Features & Modifications

📊 Performance Highlights

🚀 Scaling with Long Sequences & Other Optimizations

🚀 Usage Guide

1. Installation

2. Evaluation

LLaDA

Dream

❗️ Important Notice for HumanEval

3. Demo

📚 Future Works

🙏 Acknowledgements

©️ Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages