huggingface / trl Public

generated from fastai/nbdev_template

Notifications You must be signed in to change notification settings
Fork 2.4k
Star 17k

Code
Issues 539
Pull requests 94
Discussions
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Projects
Security
Insights

Pull requests: huggingface/trl

Labels 34 Milestones 0

New pull request New

94 Open 2,463 Closed

Author

Filter by author

Uh oh!

There was an error while loading. Please reload this page.

Label

Filter by label

Uh oh!

There was an error while loading. Please reload this page.

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Uh oh!

There was an error while loading. Please reload this page.

Milestones

Filter by milestone

Uh oh!

There was an error while loading. Please reload this page.

Reviews

Filter by reviews

No reviews Review required Approved review Changes requested

Assignee

Filter by who’s assigned

Assigned to nobody

Uh oh!

There was an error while loading. Please reload this page.

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Pull requests list

Compile entropy from logits.

#4858 opened Jan 18, 2026 by pramodith

Loading…

1 of 5 tasks

fix(DeepSeek OPSM): passing correct (vLLM) logprobs

#4857 opened Jan 18, 2026 by casinca

Loading…

3 of 5 tasks

Update CITATION.cff

#4856 opened Jan 17, 2026 by qgallouedec

Loading…

feat(rewards): add conditioned_reward based on GDPO

#4855 opened Jan 17, 2026 by jlcanta

Loading…

Enhance GRPO documentation with scaling notes

#4849 opened Jan 17, 2026 by javadtaghia

Loading…

5 tasks

NeMo-Gym Integration

#4848 opened Jan 17, 2026 by cmunley1 • Draft

Add retry strategy to vLLM Client for increased robustness

#4845 opened Jan 16, 2026 by apalmas-saifh

Loading…

2 of 5 tasks

Update OpenEnv dependency to new version for hf jobs scripts

#4843 opened Jan 16, 2026 by sergiopaniego

Loading…

5 tasks

make dpo compatible with fsdp2

#4838 opened Jan 16, 2026 by flutist

Loading…

4 of 5 tasks

feat: Support log_completion for swanlab backend

#4826 opened Jan 14, 2026 by ZiyiTsang

Loading…

2 of 5 tasks

Add support for training with multiple OpenEnv environments

#4824 opened Jan 13, 2026 by lewtun • Draft

5 tasks

Test distributed training for RewardTrainer, RLOOTrainer and GRPOTrainer

#4823 opened Jan 13, 2026 by qgallouedec

Loading…

[GRPO] Add parquet logging for completions with individual rewards

#4818 opened Jan 13, 2026 by qgallouedec

Loading…

Add Entropy Adaptive Fine Tuning to SFT Trainer

#4802 opened Jan 10, 2026 by electroglyph

Loading…

forward_masked_logits in SFTTrainer

#4794 opened Jan 8, 2026 by qgallouedec • Draft

5 tasks

Refactor KTO [3/N]: Extract dataset processing to _prepare_dataset method

#4788 opened Jan 8, 2026 by albertvillanova

Loading…

Refactor KTO [2/N]: Improve config validation in KTOConfig

#4787 opened Jan 8, 2026 by albertvillanova

Loading…

Add reward shaping to PPOTrainer

#4774 opened Jan 5, 2026 by derivative2002

Loading…

5 tasks

make dpo compatible with qwen3vl

#4773 opened Jan 4, 2026 by flutist

Loading…

feat(sft): add generation-based evaluation support to SFTTrainer

#4768 opened Jan 2, 2026 by CodersAcademy006

Loading…

Extend CLI to orpo trainer

#4757 opened Dec 27, 2025 by murilo-cunha

Loading…

3 of 5 tasks

fix: handle None eval_dataset in example code

#4756 opened Dec 27, 2025 by ciaoyizhen

Loading…

1 of 4 tasks

perf: avoid output_hidden_states when only last_hidden_state is used

#4755 opened Dec 27, 2025 by ciaoyizhen

Loading…

2 of 5 tasks

vllm parameter passthrough for stop sequences

#4754 opened Dec 26, 2025 by kdubovikov

Loading…

Clarify Accelerate usage in SFTTrainer documentation

#4744 opened Dec 23, 2025 by Likhita-17

Loading…

1 task done

Previous 1 2 3 4 Next

Previous Next

ProTip! What’s not been updated in a month: updated:<2025-12-18.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!