Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@JackTan25
Copy link
Collaborator

@JackTan25 JackTan25 commented Oct 21, 2025

  1. support copy kernel for new prefill cuda graph framework
  2. improve cuda graph framework cpu perf
  3. test and check:
    3.1 In long text scene, pymodel bert is better than cpp engine
    3.2 In short text scene, cuda graph pymodel bert can improve performance up to 20%, but failed to conqueue cpp engine, the reason is that the pymodel prepare work cost more 0.2~0.3ms than cpp engine. We will improve this in next PR.

@JackTan25 JackTan25 requested a review from LLLLKKKK as a code owner October 21, 2025 08:42
@JackTan25 JackTan25 changed the title test ci for new cuda graph prefill, don't review feat: improve pymodel bert perf Oct 23, 2025
@JackTan25 JackTan25 force-pushed the feature/bert_py_model_perf_improve branch from ed6b953 to f76c022 Compare November 4, 2025 06:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants