This repository provides the implementation code for our paper accepted to the NeurIPS 2025: Inference-time Alignment in Continuous Space. In this paper, we propose SEA, a simple inference-time alignment algorithm that reformulates alignment as an iterative optimization procedure on an energy function over logits in the continuous space defined by the optimal RLHF policy for deep and effective alignment. Despite its simplicity, SEA enjoys promising performance on extensive benchmarks such as AdvBench and TruthfulQA, consistently and significantly outperforming state-of-the-art baselines across various base models.
Create a Python virtual environment using e.g. Conda:
conda create -n sea python=3.10 && conda activate seaFirst, install PyTorch 2.1.2 from the PyTorch Installation Page.
Then, install the following packages:
pip install -r requirements.txtSee scripts, for example:
bash scripts/adv-llama3.2-1b-base.sh # Default Accelerate Port & GPU id
bash scripts/adv-llama3.2-1b-base.sh 29520 # Default Accelerate Port & Default GPU id
bash scripts/adv-llama3.2-1b-base.sh 29520 "2,4" # Specified Accelerate Port & GPU id
See scripts, for example:
bash scripts/eval.sh
See outputs