feat: improve pymodel bert perf #260

JackTan25 · 2025-10-21T08:42:24Z

support copy kernel for new prefill cuda graph framework
improve cuda graph framework cpu perf
test and check:
3.1 In long text scene, pymodel bert is better than cpp engine
3.2 In short text scene, cuda graph pymodel bert can improve performance up to 20%, but failed to conqueue cpp engine, the reason is that the pymodel prepare work cost more 0.2~0.3ms than cpp engine. We will improve this in next PR.

…copy kernel perf 109us once after warm up

…y, need do reuse

JackTan25 requested a review from LLLLKKKK as a code owner October 21, 2025 08:42

JackTan25 changed the title ~~test ci for new cuda graph prefill, don't review~~ feat: improve pymodel bert perf Oct 23, 2025

JackTan25 added 8 commits November 4, 2025 14:47

feat: support cuda graph copy kernel and pass basic correctess test, …

68c9043

…copy kernel perf 109us once after warm up

feat: support copy kernel for bert model

42c7172

feat: support new prefill cuda graph but test failed and out of memor…

eb6d5f7

…y, need do reuse

feat: refactor py bert model and improve perf 0.6ms

0012975

fix: pass prefill cuda graph

93eceab

fix: remove log

ef585f0

feat: improve pymodel perf

b942c1c

fix: fix test

f76c022

JackTan25 force-pushed the feature/bert_py_model_perf_improve branch from ed6b953 to f76c022 Compare November 4, 2025 06:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: improve pymodel bert perf #260

feat: improve pymodel bert perf #260

Uh oh!

JackTan25 commented Oct 21, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat: improve pymodel bert perf #260

Are you sure you want to change the base?

feat: improve pymodel bert perf #260

Uh oh!

Conversation

JackTan25 commented Oct 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

JackTan25 commented Oct 21, 2025 •

edited

Loading