-
Notifications
You must be signed in to change notification settings - Fork 606
LMDeploy Distserve #3304
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LMDeploy Distserve #3304
Changes from 1 commit
97d6d5d
3241c1a
1788a28
03b363f
aabb72b
3ba605f
2e6ee7a
cdf55c1
ace6ece
481052e
f9b7409
60032b6
aa43faa
97e4430
1e6c4da
290e606
b530384
efcb72c
a3d973b
31fd9f3
48d791a
2f02e05
ae959a0
11d9961
18da0fb
a478c77
c490de4
df3f9ef
61ad2a7
ad27c3a
1c3b20c
119059f
1f220d4
0a58979
83838d8
b108752
74d9256
39b2c4f
65ba59f
3af751b
6028ec2
3047e7b
649b51e
531524a
ce660ca
957bd68
f6de868
7437bfa
b0a8f1f
a7bb7c4
d488d87
b626d9e
2d6f8c1
fec61ba
2637091
3dedc69
c09a06b
160cb3c
e97a486
0eb588a
a048dfd
506bdb2
4e0f31d
3f53e64
b70fc44
6498133
8d89f55
4ac8f37
d858e81
6741c48
10a70c9
c9d9e13
d292bf5
70dc438
2c54627
82a0a58
ab4a5b9
c8212e3
0e83d26
5312fac
53091e3
4af8d3d
76c3a04
5f10df9
25f3488
2c70c55
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
…roxy.
- Loading branch information
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -10,7 +10,7 @@ | |
pip install lmdeploy[all] >= 0.7.0 | ||
|
||
# Transfer Engine | ||
pip install dlslime==0.0.1.post1 | ||
pip install dlslime==0.0.1.post2 | ||
``` | ||
|
||
## Quick Start | ||
|
@@ -27,11 +27,12 @@ CUDA_VISIBLE_DEVICES=2,3 lmdeploy serve api_server internlm/internlm2_5-7b-chat | |
### 2. Launch Router Service | ||
|
||
``` shell | ||
python -m lmdeploy.disagg.router \ | ||
--host 0.0.0.0 \ | ||
--port 5000 \ | ||
--prefill-endpoint http://prefill-host:port1 http://prefill-host:port2 \ | ||
--decode-endpoint http://decode-host:port3 http://decode-host:port4 | ||
lmdeploy serve proxy | ||
--server-name 10.130.8.139 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We'd better not specify a real IP in the user guide. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The default |
||
--server-port 5000 | ||
--routing-strategy "min_expected_latency" | ||
--serving-strategy DistServe | ||
--log-level INFO | ||
``` | ||
|
||
## API Usage | ||
|
@@ -56,3 +57,15 @@ ibv_devinfo # Check device capabilities | |
|
||
### Check NVSHMEM configuration: | ||
Make sure to verify NVSHMEM installation. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could you kindly provide the checking method or related url links? |
||
|
||
## Fault tolerance | ||
### CacheFree Issue | ||
When the Decode Engine completes migration, it sends a FreeCache request to the Prefill Engine. However, if the connection fails or the Decode Engine encounters an exception, Cache Free may fail, leading to memory leaks. Future improvements may include: | ||
|
||
- Exception monitoring in the Proxy to automatically release unreferenced memory. | ||
- Adding a timeout mechanism to force cache release if a response is delayed. | ||
| ||
### ConnectionPool Issue | ||
Currently, if the Proxy disconnects, the connection pool must be warmed up again. A future enhancement could involve: | ||
|
||
A dedicated connection pool management server (e.g., using Raft-based tools like ETCD, as mentioned in Mooncake) to improve connection discovery and avoid repeated warmups. |
Uh oh!
There was an error while loading. Please reload this page.