cdpo

Cory's Direct Preference Optimization replication work

The goal is to replicate the dialogue task results (i.e. Figure 3) from the original DPO paper.

Using the Repository

Source code can be found in src.

scripts/eval_win_rate.py can be used to evaluate responses using the ChatGPT AI.

scripts/train_and_eval.py can be used to run the full pipeline of supervised fine tuning (SFT) followed by DPO.

Example configurations files can be found in the config directory

Installation

Use an editable install to work with the code locally:

python -m pip install -e .

Results Summary

The performance of supervised fine tuning and direct preference optimization for various model sizes is summarized here:

In order to save time and compute costs, smaller models were used on training and test sets of reduced size. Additional regularization is likely needed on the GPT2-large model. Nevertheless, the ability of DPO to improve chat bot performance has been replicated.

A more thorough write-up of the results can be found here.

Name		Name	Last commit message	Last commit date
Latest commit History 75 Commits
assets		assets
config		config
docs		docs
scripts		scripts
src/cdpo		src/cdpo
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

cdpo

Using the Repository

Installation

Results Summary

About

Uh oh!

Releases

Packages

Languages

License

clorenz7/dpo

Folders and files

Latest commit

History

Repository files navigation

cdpo

Using the Repository

Installation

Results Summary

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages