Thanks to visit codestin.com
Credit goes to github.com

Skip to content

NEUIR/PC-Sampler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

7 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

PC-Sampler: Position-Aware Calibration of Decoding Bias in Masked Diffusion Models

β€’ πŸ“– Introduction β€’ πŸŽ‰ News β€’ ✨ PC-Sampler Pipeline β€’ ⚑️ Evaluation

β€’ πŸ“ˆ Decoding Trajectory β€’ πŸ’» Algorithm β€’ πŸ“§ Contact

πŸ“– Introduction

PC-Sampler is a novel decoding strategy for Masked Diffusion Models (MDMs) that unifies global trajectory planning with content-aware informativeness maximization. It addresses the key limitations of traditional uncertainty-based samplers: a lack of global trajectory control and a bias toward "trivial tokens." By using a position-aware weighting mechanism and a calibrated confidence score, PC-Sampler guides the decoding path and prevents the premature selection of unimportant tokens, significantly improving generation quality.

πŸŽ‰ News

  • 20250912: This release provides enhanced support for decoding with LLaDA, integrating a variety of recent semi- and non-autoregressive sampling strategies, including: ReMDM, Fast-dLLM, Semi-AR, Margin-based sampler, Entropy-based sampler and Confidence-based sampler.
  • 20250819: Released our Paper on arXiv. Released our Code on GitHub.

✨ PC-Sampler Pipeline

PC-sampler

PC-sampler is a novel decoding strategy designed for advanced Masked Diffusion Models (MDMs) like LLaDA and Dream. These models are powerful non-autoregressive alternatives for sequence generation, enabling flexible decoding through the iterative denoising of masked tokens.

βš™οΈ Setup

git clone 
conda create --name pc_sampler python==3.10
conda activate pc_sampler
cd PC_Sampler
pip install -r requirements.txt

πŸ“ƒ Evaluation

Our method, along with all baseline methods, can be applied for prediction across mathematical reasoning, code generation, and question-answering datasets.

Eval Case

This is an example of evaluation on the HumanEval dataset using PC-sampler. And you can run the change the --task and --mode to evaluate on other datasets and decoding methods.

cd scripts
python eval.py \
    --task 'humaneval' \
    --model_name 'GSAI-ML/LLaDA-8B-Instruct' \
    --device 'cuda:5' \
    --gen_length 256 \
    --steps 256 \
    --block_length 256 \
    --mode pc_sampler \
    --lambd 0.25 \
    --alpha 10 \
    --data_path ../data/humaneval.jsonl \
    --result_path results/humaneval_pc_sampler

Following are the evaluation bash scripts for all decoding methods.

Decoding Method Evaluation Command Decoding Method Evaluation Command
Semi-Autoregressive
cd scripts
bash eval_semi_ar.sh
Entropy
cd scripts
bash eval_entropy.sh
EB-Sampler
cd scripts
bash eval_eb_sampler.sh
Fast-dLLM
cd scripts
bash eval_fast_dllm.sh
Margin
cd scripts
bash eval_margin.sh
PC-sampler
cd scripts
bash eval_pc_sampler.sh
ReMDM
cd scripts
bash eval_remdm.sh
Linear_Position
cd scripts
bash eval_linear_position.sh

Evaluation of Decoding Methods

All decoding methods are evaluated on the same set of datasets: HumanEval, MBPP, GSM8K, MATH-500, GPQA, Countdown, and Sudoku. Evaluation results are saved in the results folder.

Evaluation Tools

  • For the GSM8K and GPQA datasets, we use lm-eval for evaluation.
  • For the remaining datasets, we use a custom evaluation script. Please refer to scripts/eval.py for more details.

Consistency Note

All methods are evaluated using the same set of evaluation scripts (including both lm-eval and our custom script) to ensure consistent assessment.

Painting Heatmap

We provide a script to generate heatmaps for the decoding trajectories of different decoding methods. The script is located in scripts/heatmap.sh.

cd scripts
bash heatmap.sh

Results

The heatmap results are saved in the heatmap_results folder.

πŸ“ˆ Decoding Trajectory

The choice of decoding strategy significantly impacts the generation order of Masked Diffusion Models (MDMs). A critical limitation of existing uncertainty-based methods is their tendency to exhibit a "U-shaped" trajectory, where tokens at sequence boundaries are prioritized early in decoding, followed by convergence toward the center . This pattern arises from the greedy nature of these samplers and the model's propensity to assign high confidence to structurally predictable boundary tokens .

In contrast, our PC-Sampler introduces explicit trajectory control through position-aware weighting, enabling adaptive generation order tailored to task requirements. Below, we visualize the decoding trajectories on the GSM8K dataset for four representative sampling strategies:

πŸ” Trajectory Visualizations on GSM8K

Sampling Strategy Decoding Trajectory Heatmap Sampling Strategy Decoding Trajectory Heatmap
Confidence-based Entropy-based
Margin-based PC Sampler

πŸ”‘ Key Observations

  • U-shaped Bias in Uncertainty-based Methods: Confidence, entropy, and margin-based samplers consistently exhibit the characteristic U-shaped pattern, with early decoding of tokens at both sequence boundaries . This behavior limits their ability to capture global dependencies required for complex reasoning tasks like mathematical problem-solving.

  • Controlled Trajectory with PC-Sampler: Our method eliminates the U-shaped bias by regulating the decoding path through exponential positional weighting. This enables a more natural progression that aligns with the logical flow of reasoning tasks, as demonstrated by the sequential trajectory in the GSM8K dataset .

The adaptive trajectory control of PC-Sampler directly contributes to its superior performance on GSM8K (82.2% accuracy) compared to uncertainty-based alternatives, highlighting the importance of aligning decoding order with task-specific structural demands .

πŸ’» Algorithm

Method Overview

PC-Sampler (Position-Aware Confidence-Calibrated Sampling) is a novel decoding strategy for Masked Diffusion Models (MDMs) that addresses key limitations of existing uncertainty-based sampling methods. It unifies global trajectory planning with content-aware informativeness maximization through two core components:

  1. Position-Aware Weighting Mechanism: Regulates the decoding path using an exponential decay function to enable flexible control over the generation order, adapting to task-specific structural demands.

  2. Calibrated Confidence Score: Suppresses premature selection of trivial tokens (e.g., punctuation, filler words) by incorporating frequency-based adjustment from a reference corpus, promoting semantically rich content generation.

Extensive experiments across seven benchmarks demonstrate that PC-Sampler consistently outperforms existing MDM decoding strategies by more than 10% on average, narrowing the performance gap with state-of-the-art autoregressive models .

Algorithm Workflow

The complete workflow of PC-Sampler is summarized in the following algorithm:

Require: Predictor $p_\theta$, prompt $p_0$, answer length $L$, steps $T$, Hyperparams $\lambda, \alpha$; reference corpus $\mathcal{D}'$

  1. $p_{\mathcal{D}'} \gets \text{FreqDist}(\mathcal{D}')$
  2. $x \gets \text{Concat}(p_0, \text{[MASK]} \times L)$
  3. for $t = 1$ to $T$ do
    • $\mathcal{M}_t \gets {i \mid x^i = \text{[MASK]}}$ // Get mask indices
    • if $\mathcal{M}_t = \emptyset$ then
      • break
    • $\hat{x}0, \hat{p}^i \gets p{\theta}(\cdot \mid x)$
    • for each position $i \in \mathcal{M}_t$ do
      • $\mathcal{C}^{(i)} \gets \hat{p}^i \cdot \log p_{\mathcal{D}'}(x^i)$
      • $\mathcal{C}^{(i)} \gets \min(\mathcal{C}^{(i)}, \alpha)$ // Clip salience score
      • $w^{(i)} \gets e^{-\lambda \cdot (i - |p_0|)}$
      • $\text{score}^{(i)} \gets w^{(i)} \cdot \mathcal{C}^{(i)}$
    • $n_k \gets \text{NumToReveal}(k, N, |\mathcal{M}_k|)$
    • $\mathcal{S}_t \gets \text{TopK}(\text{score}, n_k)$ // Select best tokens
    • for each index $j \in \mathcal{S}_t$ do
      • $x^j \gets \hat{x}_0^j$ // Reveal selected token
  4. return $x$

Hyperparameters

  • $\lambda$ (lambda_val): Controls positional bias strength. Typical values range from 0 (no positional bias) to 1.0 (strong left-to-right bias). Recommended: 0 for Sudoku, 0.25 for most tasks, 0.5 for Countdown .

  • $\alpha$: Clipping threshold for confidence scores. Recommended value: 10 (provides stable results across tasks) .

  • Background frequency distribution ($p_{\mathcal{D}'}$): Constructed from a comprehensive corpus combining general text, mathematical reasoning problems, and evaluation datasets (see /data/baseline) .

πŸ“§ Contact

If you have questions, suggestions, and bug reports, please email:

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published