Support GRPO #3146

tastelikefeet · 2025-02-16T05:10:08Z

Thanks for your contribution and we appreciate it a lot. The following instructions would make your pull request more healthy and more easily receiving feedbacks. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers.

Motivation

Compitible with GRPO

Able to select turbomind GPU device
Able to reload weights

Modification

turbomind.py, reload weights and DISABLE the row-major optimization
loader.py, support load state_dict
messages.py, add devices argument

Please check this PR for test results: modelscope/ms-swift#3126

BC-breaking (Optional)

Does the modification introduce changes that break the backward-compatibility of the downstream repositories?
If so, please describe how it breaks the compatibility and how the downstream projects should modify their code to keep compatibility with this PR.

Use cases (Optional)

If this PR introduces a new feature, it is better to list some use cases here, and update the documentation.

Checklist

Pre-commit or other linting tools are used to fix the potential lint issues.
The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness.
If the modification has a dependency on downstream projects of a newer version, this PR should be tested with all supported versions of downstream projects.
The documentation has been modified accordingly, like docstring or example tutorials.

lvhan028 · 2025-02-17T05:17:58Z

Hi, @tastelikefeet Thanks for your contribution
May resolve the conflicts in lmdeploy/turbomind/turbomind.py.

* main: (90 commits) Fix cogvlm and phi3vision (InternLM#3137) support release pipeline (InternLM#3069) [ci] fix some fail in daily testcase (InternLM#3134) Fix internvl2.5 error after eviction (InternLM#3122) fix UT of deepseek chat template (InternLM#3125) Update benchmark script and user guide (InternLM#3110) bump version to v0.7.0.post3 (InternLM#3115) fix postional argument (InternLM#3086) remove logitswarper (InternLM#3109) [Fix] fix the URL judgment problem in Windows (InternLM#3103) fix user guide about cogvlm deployment (InternLM#3088) add option max-concurrent-requests for api_server(InternLM#2961) bump version to v0.7.0.post2 (InternLM#3094) Fix xcomposer2d5 (InternLM#3087) Add system role to deepseek chat template (InternLM#3031) Update tokenizer (InternLM#3061) Add deepseek-r1 chat template (InternLM#3072) bump version to v0.7.0.post1 (InternLM#3076) More arguments in api_client, update docstrings (InternLM#3077) fix sliding window mgr (InternLM#3068) ... # Conflicts: # lmdeploy/turbomind/turbomind.py

tastelikefeet · 2025-02-17T12:30:40Z

Hi, @tastelikefeet Thanks for your contribution May resolve the conflicts in lmdeploy/turbomind/turbomind.py.

Solved

lvhan028 · 2025-02-18T05:55:21Z

THere are linting errors which can be solved by

pip install pre-commit
cd lmdeploy
pre-commit install
pre-commit run --all-files

lvhan028 · 2025-02-18T05:56:06Z

lmdeploy/messages.py

@@ -211,6 +211,7 @@ class TurbomindEngineConfig:
    max_prefill_token_num: int = 8192
    num_tokens_per_iter: int = 0
    max_prefill_iters: int = 1
+    devices: List[int] = field(default_factory=lambda: [0])


Can we specify the cuda devices by the env var CUDA_VISIBLE_DEVICES?

No, trl will specify which GPU to load the model:
https://github.com/huggingface/trl/blob/main/trl/trainer/grpo_trainer.py#L416

https://github.com/vllm-project/vllm/blob/d0a7a2769d92619afdcdc3b91c78098eaa9e38c0/vllm/engine/arg_utils.py#L718
According to vllm's EngineArgs definition, the value of device can be one of the following:

DEVICE_OPTIONS = [ "auto", "cuda", "neuron", "cpu", "openvino", "tpu", "xpu", "hpu", ]

I haven't found a case to build the vllm engine with specifying device ids
Could you please provide an example?

lvhan028 · 2025-02-18T09:08:44Z

Tensor parallelism cases got hang, for instance,

lmdeploy serve api_server meta-llama/Meta-Llama-8-8B-Instruct --tp 2

lvhan028 · 2025-02-18T09:15:50Z

lmdeploy/turbomind/deploy/loader.py

+                    model_dict[key] = value
+            else:
+                model_dict = model_path
+            for key, value in model_dict.items():


Where is model_dict stored? Is it in CPU memory or GPU memory?

GPU memory:
https://github.com/huggingface/trl/blob/main/trl/trainer/grpo_trainer.py#L556

tastelikefeet · 2025-02-19T02:24:45Z

pre-commit install

Done

lvhan028 · 2025-02-19T05:34:54Z

I've noticed modelscope/ms-swift#3126. We'll learn this feature and process this PR ASAP

…oad_state_dict * commit 'f6f7a5d707e3ccbc69af10babf1c9afcaf72a402': fix deepseekv2 has no attribute use_mla error (InternLM#3188) fix blocked fp8 moe (InternLM#3181) [Feature] support deepseek-vl2 for pytorch engine (InternLM#3149) make turbomind support gpu embedding inputs (InternLM#3177) fix temperature=0 (InternLM#3176) Update qwen2.py (InternLM#3174) Fix tool call prompt for InternLM and Qwen (InternLM#3156) Use pad_token_id as image_token_id for vl models (InternLM#3158) fix default temperature value (InternLM#3166) fix min length penalty (InternLM#3150) update cuda runtime package dependencies (InternLM#3142) fix typing (InternLM#3153) support deepseekv2 for maca backend. (InternLM#2918) fix the issue that stop_token may be less than defined in model.py (InternLM#3148) [fix] fix vl gradio, use pipeline api and remove interactive chat (InternLM#3136) [feature] add dlinfer w8a8 support. (InternLM#2988) Use aiohttp inside proxy server && add --disable-cache-status argument (InternLM#3020) support eos_token list in turbomind (InternLM#3044)

lvhan028 · 2025-02-25T13:49:11Z

@irexyc will work on this feature.

irexyc · 2025-02-28T03:21:09Z

@tastelikefeet
请问一下，在你们的使用场景中，支持 device_ids 是不是一个刚需呢？通过设置 CUDA_VISIBLE_DEVICES 能否解决呢？

tastelikefeet · 2025-02-28T06:20:17Z

@tastelikefeet 请问一下，在你们的使用场景中，支持 device_ids 是不是一个刚需呢？通过设置 CUDA_VISIBLE_DEVICES 能否解决呢？

目前的GRPO方案主要有两个方向：

trl的方案支持集群规模较小，但调试使用更加简单，流量更大，但需要infer engine支持device设置
veRL的方案支持集群规模较大，调试相对困难，不需要infer engine支持device
我们目前follow的trl的方案，但目前正在refactor，以在保留易用性和速度的情况下支持更强的扩展性
在实际的使用过程中，vLLM在GRPO等训练中应用较为广泛主要原因有：

支持的模型很多
支持load_weights，或者SPMD模式
支持简单设置devices
在我们已有的方案中，为了快速支持GRPO的速度提升，我们手动hack了一定的代码，跑通了vLLM和LMDeploy的单node多实例，未来我们的方向是：
更大的模型支持（MP/PP）
扩展性更强的架构
所以，我个人觉得更好的方向是整体通盘考虑对GRPO等RL方法的支持，而非我们单独框架的需求
供参考~

tastelikefeet · 2025-02-28T06:23:57Z

我提交的这个PR其实是为了给出一个我看到的需求，然而毕竟是hack的代码，整体问题会多一些，LMDeploy是一个优秀的框架，我们也希望能一起做出一些优秀的产品给开发者

irexyc · 2025-02-28T06:41:21Z

@tastelikefeet

我们内部目前也有一些参数更新以及推理引擎 offload 需求，因为发现构建空模型耗时占比较小，针对单机多实例的情况，初步计划是先通过销毁重建 pipeline / server 的方式来实现，针对 pipeline / server 写了俩 demo。如果是使用 pipeline 的方式来使用的话，因为要设定 CUDA_VISIBLE_DEVICES，对训练确实可能会有影响。

https://aicarrier.feishu.cn/wiki/VmDlwlqB9iGGOAkSOoucxEVwnPb

tastelikefeet · 2025-02-28T06:43:56Z

@tastelikefeet

我们内部目前也有一些参数更新以及推理引擎 offload 需求，因为发现构建空模型耗时占比较小，针对单机多实例的情况，初步计划是先通过销毁重建 pipeline / server 的方式来实现，针对 pipeline / server 写了俩 demo。如果是使用 pipeline 的方式来使用的话，因为要设定 CUDA_VISIBLE_DEVICES，对训练确实可能会有影响。

https://aicarrier.feishu.cn/wiki/VmDlwlqB9iGGOAkSOoucxEVwnPb

CUDA_VISIBLE_DEVICES的问题是同进程只能设置一次
reload weights这个我之前如PR所示，加了一个hack版本，感觉速度还挺快的，但是唯一的问题就是行转列的优化需要注释掉

tastelikefeet added 3 commits February 15, 2025 20:55

support specify device and reload state dict

bb6d2e7

change func name

5d3058b

fix loading

1ac28e4

lvhan028 requested review from lzhangzz, irexyc and lvhan028 February 17, 2025 05:16

lvhan028 added the improvement label Feb 17, 2025

tastelikefeet added 3 commits February 17, 2025 20:07

fix

8429b91

fix device_id

916217e

lvhan028 reviewed Feb 18, 2025

View reviewed changes

lint code

b1fda51

This was referenced Mar 3, 2025

support setting devices for turbomind backend #3203

Open

support loading model with user input params (turbomind) #3204

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support GRPO #3146

Support GRPO #3146

tastelikefeet commented Feb 16, 2025 •

edited

Loading

lvhan028 commented Feb 17, 2025

tastelikefeet commented Feb 17, 2025

lvhan028 commented Feb 18, 2025

lvhan028 Feb 18, 2025

tastelikefeet Feb 19, 2025

lvhan028 Feb 19, 2025 •

edited

Loading

lvhan028 commented Feb 18, 2025

lvhan028 Feb 18, 2025

tastelikefeet Feb 19, 2025 •

edited

Loading

tastelikefeet commented Feb 19, 2025

lvhan028 commented Feb 19, 2025

lvhan028 commented Feb 25, 2025

irexyc commented Feb 28, 2025

tastelikefeet commented Feb 28, 2025 •

edited

Loading

tastelikefeet commented Feb 28, 2025

irexyc commented Feb 28, 2025

tastelikefeet commented Feb 28, 2025

Support GRPO #3146

Are you sure you want to change the base?

Support GRPO #3146

Conversation

tastelikefeet commented Feb 16, 2025 • edited Loading

Motivation

Modification

BC-breaking (Optional)

Use cases (Optional)

Checklist

lvhan028 commented Feb 17, 2025

tastelikefeet commented Feb 17, 2025

lvhan028 commented Feb 18, 2025

lvhan028 Feb 18, 2025

Choose a reason for hiding this comment

tastelikefeet Feb 19, 2025

Choose a reason for hiding this comment

lvhan028 Feb 19, 2025 • edited Loading

Choose a reason for hiding this comment

lvhan028 commented Feb 18, 2025

lvhan028 Feb 18, 2025

Choose a reason for hiding this comment

tastelikefeet Feb 19, 2025 • edited Loading

Choose a reason for hiding this comment

tastelikefeet commented Feb 19, 2025

lvhan028 commented Feb 19, 2025

lvhan028 commented Feb 25, 2025

irexyc commented Feb 28, 2025

tastelikefeet commented Feb 28, 2025 • edited Loading

tastelikefeet commented Feb 28, 2025

irexyc commented Feb 28, 2025

tastelikefeet commented Feb 28, 2025

tastelikefeet commented Feb 16, 2025 •

edited

Loading

lvhan028 Feb 19, 2025 •

edited

Loading

tastelikefeet Feb 19, 2025 •

edited

Loading

tastelikefeet commented Feb 28, 2025 •

edited

Loading