-
Notifications
You must be signed in to change notification settings - Fork 540
Support GRPO #3146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Support GRPO #3146
Conversation
Hi, @tastelikefeet Thanks for your contribution |
* main: (90 commits) Fix cogvlm and phi3vision (InternLM#3137) support release pipeline (InternLM#3069) [ci] fix some fail in daily testcase (InternLM#3134) Fix internvl2.5 error after eviction (InternLM#3122) fix UT of deepseek chat template (InternLM#3125) Update benchmark script and user guide (InternLM#3110) bump version to v0.7.0.post3 (InternLM#3115) fix postional argument (InternLM#3086) remove logitswarper (InternLM#3109) [Fix] fix the URL judgment problem in Windows (InternLM#3103) fix user guide about cogvlm deployment (InternLM#3088) add option max-concurrent-requests for api_server(InternLM#2961) bump version to v0.7.0.post2 (InternLM#3094) Fix xcomposer2d5 (InternLM#3087) Add system role to deepseek chat template (InternLM#3031) Update tokenizer (InternLM#3061) Add deepseek-r1 chat template (InternLM#3072) bump version to v0.7.0.post1 (InternLM#3076) More arguments in api_client, update docstrings (InternLM#3077) fix sliding window mgr (InternLM#3068) ... # Conflicts: # lmdeploy/turbomind/turbomind.py
Solved |
THere are linting errors which can be solved by
|
@@ -211,6 +211,7 @@ class TurbomindEngineConfig: | |||
max_prefill_token_num: int = 8192 | |||
num_tokens_per_iter: int = 0 | |||
max_prefill_iters: int = 1 | |||
devices: List[int] = field(default_factory=lambda: [0]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we specify the cuda devices by the env var CUDA_VISIBLE_DEVICES?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, trl will specify which GPU to load the model:
https://github.com/huggingface/trl/blob/main/trl/trainer/grpo_trainer.py#L416
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/vllm-project/vllm/blob/d0a7a2769d92619afdcdc3b91c78098eaa9e38c0/vllm/engine/arg_utils.py#L718
According to vllm's EngineArgs definition, the value of device
can be one of the following:
DEVICE_OPTIONS = [
"auto",
"cuda",
"neuron",
"cpu",
"openvino",
"tpu",
"xpu",
"hpu",
]
I haven't found a case to build the vllm engine with specifying device ids
Could you please provide an example?
Tensor parallelism cases got hang, for instance,
|
model_dict[key] = value | ||
else: | ||
model_dict = model_path | ||
for key, value in model_dict.items(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is model_dict
stored? Is it in CPU memory or GPU memory?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done |
I've noticed modelscope/ms-swift#3126. We'll learn this feature and process this PR ASAP |
…oad_state_dict * commit 'f6f7a5d707e3ccbc69af10babf1c9afcaf72a402': fix deepseekv2 has no attribute use_mla error (InternLM#3188) fix blocked fp8 moe (InternLM#3181) [Feature] support deepseek-vl2 for pytorch engine (InternLM#3149) make turbomind support gpu embedding inputs (InternLM#3177) fix temperature=0 (InternLM#3176) Update qwen2.py (InternLM#3174) Fix tool call prompt for InternLM and Qwen (InternLM#3156) Use pad_token_id as image_token_id for vl models (InternLM#3158) fix default temperature value (InternLM#3166) fix min length penalty (InternLM#3150) update cuda runtime package dependencies (InternLM#3142) fix typing (InternLM#3153) support deepseekv2 for maca backend. (InternLM#2918) fix the issue that stop_token may be less than defined in model.py (InternLM#3148) [fix] fix vl gradio, use pipeline api and remove interactive chat (InternLM#3136) [feature] add dlinfer w8a8 support. (InternLM#2988) Use aiohttp inside proxy server && add --disable-cache-status argument (InternLM#3020) support eos_token list in turbomind (InternLM#3044)
@irexyc will work on this feature. |
@tastelikefeet |
目前的GRPO方案主要有两个方向:
|
我提交的这个PR其实是为了给出一个我看到的需求,然而毕竟是hack的代码,整体问题会多一些,LMDeploy是一个优秀的框架,我们也希望能一起做出一些优秀的产品给开发者 |
我们内部目前也有一些参数更新以及推理引擎 offload 需求,因为发现构建空模型耗时占比较小,针对单机多实例的情况,初步计划是先通过销毁重建 pipeline / server 的方式来实现,针对 pipeline / server 写了俩 demo。如果是使用 pipeline 的方式来使用的话,因为要设定 CUDA_VISIBLE_DEVICES,对训练确实可能会有影响。 https://aicarrier.feishu.cn/wiki/VmDlwlqB9iGGOAkSOoucxEVwnPb |
CUDA_VISIBLE_DEVICES的问题是同进程只能设置一次 |
Thanks for your contribution and we appreciate it a lot. The following instructions would make your pull request more healthy and more easily receiving feedbacks. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers.
Motivation
Compitible with GRPO
Modification
Please check this PR for test results: modelscope/ms-swift#3126
BC-breaking (Optional)
Does the modification introduce changes that break the backward-compatibility of the downstream repositories?
If so, please describe how it breaks the compatibility and how the downstream projects should modify their code to keep compatibility with this PR.
Use cases (Optional)
If this PR introduces a new feature, it is better to list some use cases here, and update the documentation.
Checklist