Thanks to visit codestin.com
Credit goes to github.com

Skip to content

clorenz7/dpo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

75 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

cdpo

Cory's Direct Preference Optimization replication work

The goal is to replicate the dialogue task results (i.e. Figure 3) from the original DPO paper.

Using the Repository

Source code can be found in src.

scripts/eval_win_rate.py can be used to evaluate responses using the ChatGPT AI.

scripts/train_and_eval.py can be used to run the full pipeline of supervised fine tuning (SFT) followed by DPO.

Example configurations files can be found in the config directory

Installation

Use an editable install to work with the code locally:

python -m pip install -e .

Results Summary

The performance of supervised fine tuning and direct preference optimization for various model sizes is summarized here:

final_results

In order to save time and compute costs, smaller models were used on training and test sets of reduced size. Additional regularization is likely needed on the GPT2-large model. Nevertheless, the ability of DPO to improve chat bot performance has been replicated.

A more thorough write-up of the results can be found here.

About

Replicating direct preference optimization chat bot results

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published