holarissun

🎯

Focusing

Hao Sun holarissun

🎯

Focusing

PhD in Reinforcement Learning, LLM Alignment, RLHF

117 followers · 37 following

University of Cambridge
https://holarissun.github.io/
@HolarisSun

Achievements

Highlights

holarissun.github.io Public

Github Pages template for academic personal websites, forked from mmistakes/minimal-mistakes

JavaScript MIT License Updated Oct 20, 2025
InverseRLmeetsLLMs Public

10 1 Updated Jul 27, 2025
SqueezingOpenReview Public

Python 1 Updated May 23, 2025
embedding-based-llm-alignment Public

Codebase for Paper Reusing Embeddings: Reproducible Reward Model Research in Large Language Model Alignment without GPUs

Python 22 2 MIT License Updated Apr 24, 2025
Inverse-RLignment Public

inverse reinforcement learning for LLM alignment

MIT License Updated Apr 15, 2025
RewardModelingBeyondBradleyTerry Public

official implementation of ICLR'2025 paper: Rethinking Bradley-Terry Models in Preference-based Reward Modeling: Foundations, Theory, and Alternatives

reward inverse-reinforcement-learning large-language-models rlhf reward-models largelanguagemodels reward-modeling

Python 70 4 MIT License Updated Apr 2, 2025
HandsOnTransformers Public

attention is all you need!

Python 5 1 MIT License Updated Apr 1, 2025
Data-Centric-OPE Public

Python 1 Updated Sep 20, 2024
Prompt-OIRL Public

code for paper Query-Dependent Prompt Evaluation and Optimization with Offline Inverse Reinforcement Learning

inverse-reinforcement-learning irl offline-rl large-language-models llm prompt-engineering rlhf

Python 43 6 MIT License Updated Mar 20, 2024
Prompt4ReasoningPapers Public
Forked from zjunlp/Prompt4ReasoningPapers

Repository for the ACL2023 paper "Reasoning with Language Model Prompting: A Survey".

MIT License Updated Mar 4, 2024
PanelGPT Public

We introduce new zero-shot prompting magic words that improves the reasoning ability of language models: panel discussion!

gpt prompts prompt-tuning large-language-models prompt-engineering llms chain-of-thought

Python 172 12 Updated Feb 20, 2024
DOMIAS Public

Python 6 6 MIT License Updated Feb 11, 2024
Accountable-Offline-RL Public

Code for NeurIPS 2023 paper Accountability in Offline Reinforcement Learning: Explaining Decisions with a Corpus of Examples

reinforcement-learning behavioral-cloning deep-rl xai interpretable-machine-learning offline-rl offlinerl

Python 5 1 Updated Nov 28, 2023
LeetCodeSolution Public

logs for my leetcoding fall 2023

1 Updated Nov 3, 2023
RewardShifting Public

Code for NeurIPS 2022 paper Exploiting Reward Shifting in Value-Based Deep RL

reinforcement-learning ensemble ensemble-learning rnd deep-q-network reward-design reward-shaping

Python 29 4 Updated Oct 29, 2023
tianshou Public
Forked from thu-ml/tianshou

An elegant PyTorch deep reinforcement learning library.

Python MIT License Updated Oct 13, 2023
Prompt-Engineering-Guide Public
Forked from dair-ai/Prompt-Engineering-Guide

🐙 Guides, papers, lecture, notebooks and resources for prompt engineering

MDX 1 MIT License Updated Oct 12, 2023
BenchmarkPromptsWithResponses Public

Every prompt engineering paper should provide not only on-average performance of the prompting strategy, but should also release the responses to facilitate future research and avoid repeatedly cal…

1 Updated Sep 13, 2023
GPTChatAPI Public

Usage Example of GPT's API in chat bot applications.

Python Updated Apr 21, 2023
Causal-RL Public

Jupyter Notebook Updated Nov 11, 2022
2Groza.github.io Public

HTML Updated Sep 18, 2022
MPhil_Thesis Public

Updated Sep 18, 2022
hindsight-experience-replay Public
Forked from TianhongDai/hindsight-experience-replay

This is the pytorch implementation of Hindsight Experience Replay (HER) - Experiment on all fetch robotic environments.

Jupyter Notebook MIT License Updated Aug 7, 2022
DAUC Public

Code for Latent Density Models for Uncertainty Categorization

Python 3 Updated Jun 8, 2022
MOPA Public

Jupyter Notebook 2 Updated May 18, 2022
Action-Refined-Temporal-Difference Public

Jupyter Notebook 2 Updated Jan 2, 2022
decisionforce.github.io Public
Forked from decisionforce/decisionforce.github.io

HTML Updated Jul 16, 2021
cuhkrlcourse.github.io Public
Forked from StaminaTang/cuhkrlcourse.github.io

CUHK Reinforcement Learning Course

HTML Updated Nov 24, 2020
NPSCO Public

Code for Novel Policy Seeking with Constrained Optimization

Python 2 Updated Aug 11, 2020
Exploiting-Exploitation Public

Python 2 Updated Jul 24, 2020

Hao Sun holarissun

Achievements

Achievements

Highlights

holarissun.github.io Public

Uh oh!

InverseRLmeetsLLMs Public

Uh oh!

SqueezingOpenReview Public

Uh oh!

embedding-based-llm-alignment Public

Uh oh!

Inverse-RLignment Public

Uh oh!

RewardModelingBeyondBradleyTerry Public

Uh oh!

HandsOnTransformers Public

Uh oh!

Data-Centric-OPE Public

Uh oh!

Prompt-OIRL Public

Uh oh!

Prompt4ReasoningPapers Public

Uh oh!

PanelGPT Public

Uh oh!

DOMIAS Public

Uh oh!

Accountable-Offline-RL Public

Uh oh!

LeetCodeSolution Public

Uh oh!

RewardShifting Public

Uh oh!

tianshou Public

Uh oh!

Prompt-Engineering-Guide Public

Uh oh!

BenchmarkPromptsWithResponses Public

Uh oh!

GPTChatAPI Public

Uh oh!

Causal-RL Public

Uh oh!

2Groza.github.io Public

Uh oh!

MPhil_Thesis Public

Uh oh!

hindsight-experience-replay Public

Uh oh!

DAUC Public

Uh oh!

MOPA Public

Uh oh!

Action-Refined-Temporal-Difference Public

Uh oh!

decisionforce.github.io Public

Uh oh!

cuhkrlcourse.github.io Public

Uh oh!

NPSCO Public

Uh oh!

Exploiting-Exploitation Public

Uh oh!