From Supervision to Exploration: What Does Protein Language Model Learn During Reinforcement Learning?

This repository contains the implementation of the RL-PLM framework introduced in our paper:
From Supervision to Exploration: What Does Protein Language Model Learn During Reinforcement Learning? (Arxiv preprint).

RL-PLM provides a unified platform for analyzing how reinforcement learning enhances and reshapes Protein Language Models (PLMs) across four biological design systems.
The framework supports multiple RL algorithms—DPO, PPO, and GRPO—and integrates them with both auto-regressive and masked-language protein models.

Figure 1. Conceptual analogy of RL for protein design—task difficulty, policy capacity, and reward accuracy jointly determine learning efficacy.

📂 Repository Structure


RL-PLM/
│
├── amp_design/          # Antimicrobial peptide (AMP) design via RL
├── kinase_mutation/     # Kinase optimization via sequence mutation
├── antibody_mutation/   # Antibody–antigen binding optimization
├── inverse_folding/     # Structure-conditioned sequence generation
├── personalization/     # 🆕 Personalized RLHF module
└── examples/            # 🆕 Example scripts for personalization

Figure 2. Four biological systems implemented in RL-PLM.

📦 Data and Model Weights

All datasets and pretrained checkpoints can be downloaded here:
RL_PLM_data (Google Drive)

Unzip and place the contents under your BASE_PATH before running any experiment.

✨ NEW: Personalized RLHF

We now support personalized RLHF for protein design! Train models that adapt to different user preferences and objectives.

Key Features:

🎯 Synthetic User Preferences: Define virtual users with different property trade-offs
📊 Pairwise Preference Learning: Train reward models from preference comparisons
🔄 Multi-User Support: Train one model for multiple user profiles
🧬 Task-Specific Integration: Ready-to-use for AMP, antibody, and kinase design

Quick Start:

# Train personalized reward models for AMP design
python amp_design/personalized_grpo.py \
    --classifier-checkpoint amp_design/best_new_4.pth \
    --output-dir personalized_rewards/amp

# Run complete example
python examples/personalized_amp_example.py

Documentation:

📘 Personalization Module README - Detailed API documentation
📗 Integration Guide - How to add personalization to your RL training
📙 Example Script - Complete end-to-end example

🧬 Citation

If you find this repository useful, please cite:

@article{cao2025supervision,
  title={From Supervision to Exploration: What Does Protein Language Model Learn During Reinforcement Learning?},
  author={Cao, Hanqun and Zhang, Hongrui and Xu, Junde and Zhang, Zhou and Shen, Lingdong and Sun, Minghao and Liu, Ge and Xu, Jinbo and Li, Wu-Jun and Ni, Jinren and others},
  journal={arXiv preprint arXiv:2510.01571},
  year={2025}
}
}

Name		Name	Last commit message	Last commit date
Latest commit History 67 Commits
EsmTherm-main		EsmTherm-main
ToxDL2-main		ToxDL2-main
amp_design		amp_design
antibody_mutation		antibody_mutation
examples		examples
imgs		imgs
inverse_folding @ 2bf7108		inverse_folding @ 2bf7108
kinase_mutation		kinase_mutation
personalization		personalization
slurm		slurm
tests		tests
.gitignore		.gitignore
.gitmodules		.gitmodules
PERSONALIZATION_GUIDE.md		PERSONALIZATION_GUIDE.md
PERSONALIZATION_SUMMARY.md		PERSONALIZATION_SUMMARY.md
QUICKSTART.md		QUICKSTART.md
README.md		README.md
SHARED_PROPERTY_SUMMARY.md		SHARED_PROPERTY_SUMMARY.md
USER_CONDITIONED_IMPLEMENTATION.md		USER_CONDITIONED_IMPLEMENTATION.md
requirements.txt		requirements.txt
train_both.slurm		train_both.slurm

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

From Supervision to Exploration: What Does Protein Language Model Learn During Reinforcement Learning?

📂 Repository Structure

📦 Data and Model Weights

✨ NEW: Personalized RLHF

Key Features:

Quick Start:

Documentation:

🧬 Citation

About

Uh oh!

Releases

Packages

Languages

jordanabinader/RLHF-PLM

Folders and files

Latest commit

History

Repository files navigation

From Supervision to Exploration: What Does Protein Language Model Learn During Reinforcement Learning?

📂 Repository Structure

📦 Data and Model Weights

✨ NEW: Personalized RLHF

Key Features:

Quick Start:

Documentation:

🧬 Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages