Refactor vLLM generation [1/N]: Extract vLLM generation #4700

albertvillanova · 2025-12-16T05:28:01Z

Refactor vLLM generation [1/N]: Extract vLLM generation.

This PR introduces a new initialization module for vLLM generation in TRL trainers. The main change is the addition of conditional support for the VLLMGeneration backend, which will only be imported and exposed if the vllm dependency is available.

HuggingFaceDocBuilderDev · 2025-12-16T05:53:34Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

albertvillanova · 2025-12-16T17:10:36Z

trl/generation/vllm_generation.py

+            # TODO: improve
+            # Calculate max_model_len: use vllm_max_model_length if available, otherwise compute from prompt+completion
+            if hasattr(args, "vllm_max_model_length") and args.vllm_max_model_length is not None:
+                max_model_len = args.vllm_max_model_length
+            elif (
+                hasattr(self.trainer, "max_prompt_length")
+                and self.trainer.max_prompt_length is not None
+                and hasattr(self.trainer, "max_completion_length")
+                and self.trainer.max_completion_length is not None
+            ):
+                max_model_len = self.trainer.max_prompt_length + self.trainer.max_completion_length
+            else:
+                max_model_len = None


This will not be necessary once we merge:

Align use of vllm_max_model_length in RLOOTrainer #4702

I have removed it after the merge.

albertvillanova · 2025-12-16T17:11:17Z

trl/generation/vllm_generation.py

+            # TODO: improve
+            # Add logprobs_mode only for GRPO (not used in RLOO)
+            if "grpo" in type(self.trainer).__name__.lower():
+                llm_kwargs["logprobs_mode"] = "processed_logprobs"


I am also thinking how to improve this.

I think we should pass logprobs_mode in the RLOOTrainer as well, after having read the motivation of the PR that introduced this:

🌡️ Have vLLM return processed (temperature scaled) log probs #4163

I am reverting this condition.

qgallouedec · 2025-12-16T19:20:14Z

trl/generation/vllm_generation.py

+        Args:
+            trainer: Reference to parent trainer for accessing config, model, accelerator, etc.
+        """
+        self.trainer = trainer


I like the idea of having a class dedicated to generation for vLLM now that the amount of code related to generation is starting to saturate the online trainers.

However I'm a bit annoyed by this back-reference. It's not an ideal design choice in my opinion. In this state, I even have trouble seeing how to properly test this.

Conceptually, the generator doesn't need to have a trainer to generation. It only needs the generation parameters, probably the accelerator, and an method to update the weights.

Can we think of a alternative design, something like

class VLLMGeneration: def __init__(self, model_id, accelerator, mode="server", ...): ... def sync_weights(self, model): ... def generate(self, prompts, temperature, ...): ...

Thanks for you insightful review! 🤗

Yes, I totally agree with you that the back-reference to the trainer is not ideal.

Indeed, this PR is a preliminary step in the complete refactoring and I was planning to change that on a later PR. This was my original plan:

PR 1 (current): Extract the vLLM logic into a separate class (prove the concept, remove duplication)

PR 2: Refine the interface to use explicit parameters instead of trainer reference

PR 3: Add a proper protocol/interface

However, I could fix this now in this PR if you think it is better.

I have seen that the trainer instance is used by the rollout_func. This will need further refactoring to separate (but coordinate) rollout_func and vllm functionalities. 🤔

See related PR:

Refactor vLLM generation [2/N]: Decouple rollout_func and vLLM functionalities #4712

I have seen that the trainer instance is used by the rollout_func. This will need further refactoring to separate (but coordinate) rollout_func and vllm functionalities. 🤔

Yes, indeed. The design of rollout_func is really not ideal. We implemented it that way initially because we had to move quickly for the release of OpenEnv, but in my opinion it's really something we should completely rethink and refactor.

Indeed, this PR is a preliminary step in the complete refactoring and I was planning to change that on a later PR. This was my original plan:

I'm fine with this multi-stage PR for this. It will probably be easier to review this way, and for you to write

Additionally, I see that the trainer instance is also used by the profiling_context function. This will need further refactoring as well.

I created a dedicated PR to decouple profiling from trainer:

Refactor vLLM generation [3/N]: Decouple profiling from trainer #4717

…merge

albertvillanova · 2025-12-19T10:22:58Z

As this PR was not approved, I have finally addressed both issues here:

trainer reference within rollout_func
trainer reference within profiling_context

This PR needs merging this PR first though:

Refactor vLLM generation [3/N]: Decouple profiling from trainer #4717

albertvillanova · 2025-12-22T06:35:56Z

I have integrated PR #4717.
CC: @qgallouedec

qgallouedec · 2025-12-23T15:58:41Z

trl/generation/vllm_generation.py

+    def __init__(
+        self,
+        model,
+        args,


what do you think of removing args and have instead

enable_sleep_mode

gpu_memory_utilization

group_port

guided_decoding_regex

max_model_length

mode

model_impl

server_base_url

server_host

server_timeout

tensor_parallel_size

in the argument of this function?

you may want to add max_num_seqs also

Thanks, @qgallouedec.

I had something similar in mind in my original plan:

PR 2: Refine the interface

I had in mind to create something like a VLLMGenerationConfig class with all present (and open for future) VLLM parameters.

If you prefer, I can do all the refactoring in a single PR?

@qgallouedec I think I addressed your suggestion above.

trl/trainer/rloo_trainer.py

qgallouedec · 2026-01-19T16:09:12Z

trl/generation/__init__.py

not necessarily for this PR, but since we add a generation submodule, we could have vllm_client in this submodule as well

trl/generation/vllm_generation.py

albertvillanova · 2026-01-19T19:45:48Z

Oh, thanks a lot for your detailed review!!! 🤗
I will address all your requests.

qgallouedec · 2026-01-21T15:58:12Z

trl/generation/vllm_generation.py

+            # Build LLM initialization kwargs
+            llm_kwargs = {
+                "model": model.name_or_path,
+                "tensor_parallel_size": self.tensor_parallel_size,
+                "gpu_memory_utilization": self.gpu_memory_utilization,
+                "max_num_seqs": self.max_num_seqs,
+                "max_model_len": self.max_model_length,
+                "distributed_executor_backend": "external_launcher",
+                # Feed identical seed for tp groups to ensure sampling results are the same across workers
+                "seed": accelerator.process_index // self.tensor_parallel_size,
+                # Latest vLLM v1 memory profiler is misled by the high default value (i.e., 32768) - thinking there's not enough memory
+                "max_num_batched_tokens": 4096,
+                "model_impl": self.model_impl,
+                "enable_sleep_mode": self.enable_sleep_mode,
+                # Important so temperature scaling/logit tweaking affects the TIS log probs
+                "logprobs_mode": "processed_logprobs",
+                "quantization": quantization,
+            }
+            self.llm = LLM(**llm_kwargs)


nit

instead of

llm_kwargs = { "model": model.name_or_path, ... } self.llm = LLM(**llm_kwargs)

we can also simply do

self.llm = LLM( model=model.name_or_path, ..., )

qgallouedec

This looks good to me! This will simplify the trainers a lot

Extract vLLM generation from GRPOTrainer

fb3fcf1

albertvillanova marked this pull request as ready for review December 16, 2025 05:39

Merge remote-tracking branch 'upstream/main' into refactor-vllm-1

5f7326f

Extract vLLM generation from RLOOTrainer

793ecc0

This was referenced Dec 16, 2025

Align use of vllm_max_model_length in RLOOTrainer #4702

Merged

Deprecate max_prompt_length in RLOOTrainer #4703

Merged

albertvillanova commented Dec 16, 2025

View reviewed changes

qgallouedec reviewed Dec 16, 2025

View reviewed changes

albertvillanova added 3 commits December 17, 2025 12:22

Merge remote-tracking branch 'upstream/main' into refactor-vllm-1

b5d27e1

Update deprecated max_prompt_length with vllm_max_model_length after …

264febc

…merge

Set logprobs_mode in RLOOTrainer as well

dae8c28

This was referenced Dec 17, 2025

Refactor vLLM generation [2/N]: Decouple rollout_func and vLLM functionalities #4712

Draft

Refactor vLLM generation [3/N]: Decouple profiling from trainer #4717

Merged

albertvillanova added 5 commits December 18, 2025 20:08

Remove back-reference to trainer

ae18630

Remove unused param 'mode'

5d6eb43

Merge remote-tracking branch 'upstream/main' into refactor-vllm-1

1a9ca52

Pass profiler to decouple trainer from VLLMGeneration

fcd3f56

Decouple trainer by making rollout_func a single-argument callable

3f11f99

albertvillanova added 2 commits December 22, 2025 07:17

Merge remote-tracking branch 'upstream/main' into refactor-vllm-1

ec26f47

Pass profiler to generate

35f6b4b

albertvillanova added 2 commits December 23, 2025 06:55

Merge remote-tracking branch 'upstream/main' into refactor-vllm-1

54b52f8

Merge remote-tracking branch 'upstream/main' into refactor-vllm-1

1323ef3

qgallouedec reviewed Dec 23, 2025

View reviewed changes

albertvillanova added 3 commits January 7, 2026 13:12

Merge remote-tracking branch 'upstream/main' into refactor-vllm-1

7201755

Fix style

c607a8e

Make precommit

9a91585

albertvillanova added 2 commits January 7, 2026 13:33

Replace args with explicit parameters

da8f976

Rename vllm_quantization as quantization

9580b71

albertvillanova mentioned this pull request Jan 8, 2026

Refactor KTO [1/N]: Modernize model initialization #4783

Merged

albertvillanova added 2 commits January 16, 2026 11:31

Merge remote-tracking branch 'upstream/main' into refactor-vllm-1

b66ab8a

Merge branch 'main' into refactor-vllm-1

2ae35cc

albertvillanova requested a review from qgallouedec January 19, 2026 08:48

qgallouedec reviewed Jan 19, 2026

View reviewed changes

trl/trainer/rloo_trainer.py Outdated Show resolved Hide resolved

qgallouedec reviewed Jan 19, 2026

View reviewed changes

trl/generation/vllm_generation.py Outdated Show resolved Hide resolved

qgallouedec reviewed Jan 19, 2026

View reviewed changes

trl/generation/vllm_generation.py Outdated Show resolved Hide resolved

qgallouedec reviewed Jan 19, 2026

View reviewed changes

trl/generation/vllm_generation.py Outdated Show resolved Hide resolved

qgallouedec reviewed Jan 19, 2026

View reviewed changes

trl/generation/vllm_generation.py Outdated Show resolved Hide resolved

albertvillanova added 7 commits January 21, 2026 10:39

Import VLLMGeneration unconditionally

17b57a1

Fix server_timeout type and align default value

67c93c8

Move __init__ docstring to class header

62ff3cc

Add explanatory comments

4b0c483

Fix misleading renaming

a21e401

Remove mention to old code lines

167be7d

Add type hints

f80a327

qgallouedec reviewed Jan 21, 2026

View reviewed changes

qgallouedec approved these changes Jan 21, 2026

View reviewed changes

albertvillanova added 6 commits January 27, 2026 14:29

Pass kwargs directly

b7726f4

Merge remote-tracking branch 'upstream/main' into refactor-vllm-1

04138e7

Update docstring

1b0c2a6

Update param defaults

edd0101

Sort params

ddb9ce5

Fix param renaming

ea03910

albertvillanova merged commit 0eb66d8 into huggingface:main Jan 27, 2026
10 checks passed

qgallouedec mentioned this pull request Jan 28, 2026

Update wordle.py example with masking of env tokens #4895

Open

5 tasks

Refactor vLLM generation [1/N]: Extract vLLM generation #4700

Refactor vLLM generation [1/N]: Extract vLLM generation #4700

Conversation

albertvillanova commented Dec 16, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Dec 16, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

albertvillanova Dec 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

albertvillanova commented Dec 19, 2025

Uh oh!

albertvillanova commented Dec 22, 2025

Uh oh!

qgallouedec Dec 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

qgallouedec Jan 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

albertvillanova commented Jan 19, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

qgallouedec left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

albertvillanova Dec 17, 2025 •

edited

Loading

qgallouedec Dec 23, 2025 •

edited

Loading

qgallouedec Jan 19, 2026 •

edited

Loading