Thanks to visit codestin.com
Credit goes to Github.com

Skip to content

sunjie279/gammaPO

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

$\gamma$-PO: Robust Preference Optimization via Dynamic Target Margins

ACL 2025 GitHub

🎉🎉🎉 Thrilled that our work has been accepted to ACL 2025! 🎉🎉🎉

This repository provides the official implementation of $\gamma$-PO, a novel dynamic target margin preference optimization algorithm for aligning large language models (LLMs) with human preferences. $\gamma$-PO adaptively adjusts target margins at the pairwise level to prioritize high-confidence preference pairs while suppressing noise from ambiguous pairs.

Key Features

  • Dynamic Margin Adjustment: Instance-specific margin calibration to handle ambiguous preference pairs.
  • Plug-and-Play Design: Compatible with variants of Direct Preference Optimization (DPO) that rely on reward margins.
  • State-of-the-Art Performance: Achieves an average 4.4% improvement over baselines on AlpacaEval2 and Arena-Hard benchmarks.
  • Efficiency: Minimal code changes required with negligible impact on training efficiency.

Table of Contents

Introduction

$\gamma$-PO addresses the limitations of existing preference optimization methods by introducing adaptive target margins. This approach strategically prioritizes high-confidence pairs (those demonstrating higher reward margins) while reducing the influence of ambiguous pairs. Through dynamic margin scaling, $\gamma$-PO enhances the alignment of LLMs with human preferences, particularly in the presence of noisy data.

Quick Start

Our codebase is built upon the alignment-handbook repo. The following steps will guide you through the installation process.

First, create a Python virtual environment using e.g. Conda:

conda create -n handbook python=3.10 && conda activate handbook

Next, install PyTorch v2.2.2. Since this is hardware-dependent, we direct you to the PyTorch Installation Page.

You can then install the remaining package dependencies of alignment-handbook as follows:

git clone https://github.com/huggingface/alignment-handbook.git
cd ./alignment-handbook/
python -m pip install .

You will also need Flash Attention 2 installed, which can be done by running:

python -m pip install flash-attn --no-build-isolation

Clone the repository and install dependencies:

git clone https://github.com/sunjie279/gammaPO.git
cd gammaPO
pip install -r requirements.txt

Data Preparation

Download the UltraFeedback Binarized dataset and prepare preference pairs:

# Download dataset
huggingface-cli download H4/ultrafeedback_binarized --local-dir data/

Training

Train $\gamma$-PO using the prepared dataset:

ACCELERATE_LOG_LEVEL=info accelerate launch --config_file accelerate_configs/deepspeed_zero3.yaml gammaPO/training_configs/llama-3-8b-it-gmsimpo-beta10-gm0.4-tau10-lr1e-6.yaml

Citation

If you find this repository useful, please cite our ACL 2025 paper:

@inproceedings{sun2025gammaPO,
    title = {$\gamma$-PO: Robust Preference Optimization via Dynamic Target Margins},
    author = {Sun, Jie and Wu, Junkang and Wu, Jiancan and Zhu, Zhibo and Lu, Xingyu and Zhou, Jun and Ma, Lintao and Wang, Xiang},
    booktitle = {Findings of the 63rd Annual Meeting of the Association for Computational Linguistics},
    year = {2025},
    address = {Vienna, Austria}
}

About

adaptive margin (ACL 2025)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages