Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Comments

💬 Add chat to vLLM client and server, update trainer calls#4450

Merged
qgallouedec merged 7 commits intomainfrom
chat-endpoint-vllm-server
Nov 5, 2025
Merged

💬 Add chat to vLLM client and server, update trainer calls#4450
qgallouedec merged 7 commits intomainfrom
chat-endpoint-vllm-server

Conversation

@qgallouedec
Copy link
Member

Slow tests pass locally, and GRPO training works as well:

from datasets import load_dataset
from trl import GRPOTrainer, GRPOConfig

dataset = load_dataset("trl-lib/ultrafeedback-prompt", split="train")

# Dummy reward function: count the number of unique characters in the completions
def reward_num_unique_chars(completions, **kwargs):
    return [len(c[0]["content"]) for c in completions]

trainer = GRPOTrainer(
    model="Qwen/Qwen3-0.6B",
    reward_funcs=reward_num_unique_chars,
    train_dataset=dataset,
    args=GRPOConfig(
        use_vllm=True,
    ),
)
trainer.train()

@qgallouedec qgallouedec requested review from kashif and lewtun November 4, 2025 20:58
]
return {"prompt_ids": prompt_ids, "completion_ids": completion_ids, "logprobs": logprobs}

class ChatRequest(BaseModel):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly the same as generate, expect:

  • images are within the messages (so we drop images from args)
  • chat_template_kwargs argument added

# FIXME: this endpoint doesn't exist in vllm_client
output = self.vllm_client.chat(
prompts=ordered_set_of_prompts,
messages=ordered_set_of_prompts,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I use "messages" instead of "prompt" to align with vLLM

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Member

@albertvillanova albertvillanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks: solid implementation! Just some comments and minor suggestions below.

Copy link
Member

@lewtun lewtun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with a suggestion to double-check models like Llama are not getting a double BOS token.

for seq in completion_ids:
assert all(isinstance(tok, int) for tok in seq)

def test_chat(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to check that the issues with double BOS tokens getting inserted have been fully resolved (e.g. for a Llama model): vllm-project/vllm#9519

@edbeeching ran into this during https://huggingface.co/spaces/HuggingFaceH4/blogpost-scaling-test-time-compute and it has a subtle, but negative impact on the generations.

@qgallouedec qgallouedec merged commit a3af2f1 into main Nov 5, 2025
11 checks passed
@qgallouedec qgallouedec deleted the chat-endpoint-vllm-server branch November 5, 2025 18:51
for seq in completion_ids:
assert all(isinstance(tok, int) for tok in seq)

def test_chat(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@qgallouedec I'm sorry but I'm not able to run this test.
Could you please give me some hint about the environment requirements so I can run it?
Thanks! 🤗

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might have to mock the response?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants