Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@zhaoan12-prc
Copy link
Collaborator

No description provided.

Copilot AI review requested due to automatic review settings November 3, 2025 06:15
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds support for fused FP8 per-token quantization with ROCm GEMM operations and layer normalization. The implementation optimizes MoE models by conditionally disabling quantization for layer normalization when processing MoE models, and extends FP8 quantization support in GEMM operations to handle already-quantized input buffers.

  • Added new test file for FP8 per-token, per-channel (PTPC) A8W8 GEMM operations
  • Extended ROCm layer normalization to support fused FP8 per-token quantization for RMSNorm
  • Modified GEMM operations to skip redundant quantization when inputs are already quantized

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
tests/gemm/rocm_ptpc_a8w8_gemm_op_test.py New test file for FP8 A8W8 GEMM with Chinese error messages and comprehensive tensor swizzling/shuffling utilities
tests/gemm/rocm_pertensor_int8_gemm_op_test.py Removed trailing empty line
tests/gemm/gemm_op_test.cc Added blank line after class constructor declaration
tests/BUILD Added build configuration for new PTPC A8W8 GEMM test
rtp_llm/cpp/models/GptModel.cc Added conditional logic to disable quantization for MoE models in post-layernorm
rtp_llm/cpp/devices/rocm_impl/ROCmLayernorm.cc Refactored buffer allocation logic and added FP8 per-token quantization support for RMSNorm
rtp_llm/cpp/devices/rocm_impl/ROCmGemmOp.cc Added logic to skip quantization when input is already QBuffer and extended FP8 GEMM dispatch

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@zhaoan12-prc zhaoan12-prc requested a review from Copilot November 3, 2025 06:23
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 6 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot AI review requested due to automatic review settings November 3, 2025 07:14
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@zhaoan12-prc zhaoan12-prc force-pushed the feature/rmsnorm_fuse_quant_and_unitest branch from 4d57ed4 to bdaaa24 Compare November 3, 2025 07:45
Copilot AI review requested due to automatic review settings November 3, 2025 12:35
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.

Comments suppressed due to low confidence (1)

tests/gemm/rocm_ptpc_a8w8_gemm_op_test.py:1

  • Help text contains Chinese characters '计算使用的数据类型'. Comments and documentation should be in English for consistency with the rest of the codebase.
# SPDX-License-Identifier: MIT

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@zhaoan12-prc zhaoan12-prc force-pushed the feature/rmsnorm_fuse_quant_and_unitest branch from 8057e6b to b1cad8a Compare November 3, 2025 12:55
Copilot AI review requested due to automatic review settings November 4, 2025 02:30
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

f"相对误差={torch.abs((a[idx_tuple] - b[idx_tuple]) / b[idx_tuple])}\n"
)
if len(mismatch_indices) > 10:
error_msg += f"...(共 {len(mismatch_indices)} 处不匹配)"
Copy link

Copilot AI Nov 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error messages contain Chinese text which may not be accessible to all developers. Consider using English for error messages to maintain consistency with the rest of the codebase and ensure international accessibility.

Suggested change
error_msg += f"...(共 {len(mismatch_indices)} 处不匹配)"
error_msg += f"...(total {len(mismatch_indices)} mismatches)"

Copilot uses AI. Check for mistakes.
auto res_tensor = rmsnorm2d(input_tensor, weight_tensor, static_cast<double>(eps), 0);
copy({*norm_output, *torchTensor2Buffer(res_tensor)});
}
if (params.qscheme == QScheme::Qfp8PerToken /* Do fuse fp8 pertoken*/) {
Copy link

Copilot AI Nov 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Grammar correction: 'Do fuse' should be 'Fused' or 'Fuses' in comment.

Suggested change
if (params.qscheme == QScheme::Qfp8PerToken /* Do fuse fp8 pertoken*/) {
if (params.qscheme == QScheme::Qfp8PerToken /* Fused fp8 per-token */) {

Copilot uses AI. Check for mistakes.
autil::StringUtil::toString(arguments.Dshape).c_str(),
params.D->debugString().c_str());
params.D->debugString().c_str());
} else if (params.A.type() == DataType::TYPE_QFP8_E4M3 && params.B.type() == DataType::TYPE_QFP8_E4M3 /* if fused fp8 pertoken & rmsnorm */) {
Copy link

Copilot AI Nov 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Grammar correction: 'if fused' should be 'fused' (remove 'if') in comment.

Suggested change
} else if (params.A.type() == DataType::TYPE_QFP8_E4M3 && params.B.type() == DataType::TYPE_QFP8_E4M3 /* if fused fp8 pertoken & rmsnorm */) {
} else if (params.A.type() == DataType::TYPE_QFP8_E4M3 && params.B.type() == DataType::TYPE_QFP8_E4M3 /* fused fp8 pertoken & rmsnorm */) {

Copilot uses AI. Check for mistakes.
@zhaoan12-prc zhaoan12-prc force-pushed the feature/rmsnorm_fuse_quant_and_unitest branch from 1573862 to 34a8e0e Compare November 5, 2025 10:25
Copilot AI review requested due to automatic review settings November 6, 2025 07:16
@zhaoan12-prc zhaoan12-prc force-pushed the feature/rmsnorm_fuse_quant_and_unitest branch from 34a8e0e to d9f1d42 Compare November 6, 2025 07:16
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

} else if (params.A.type() == DataType::TYPE_INT8 || params.A.type() == DataType::TYPE_QINT8){
DDtype = DataType::TYPE_FP16;
} else if (params.A.type() == DataType::TYPE_FP8_E4M3 || params.A.type() == DataType::TYPE_QFP8_E4M3){
// TO DO: When A is TYPE_FP8_E4M3, choose output dtype according to env "ACT_TYPE".
Copy link

Copilot AI Nov 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment uses 'TO DO' instead of standard 'TODO' format.

Suggested change
// TO DO: When A is TYPE_FP8_E4M3, choose output dtype according to env "ACT_TYPE".
// TODO: When A is TYPE_FP8_E4M3, choose output dtype according to env "ACT_TYPE".

Copilot uses AI. Check for mistakes.
arguments.DDtype,
autil::StringUtil::toString(arguments.Dshape).c_str(),
params.D->debugString().c_str());
params.D->debugString().c_str());
Copy link

Copilot AI Nov 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line appears to have only trailing whitespace removed compared to the original, but the trailing spaces on line 380 should be fully cleaned up for consistency.

Suggested change
params.D->debugString().c_str());
params.D->debugString().c_str());

Copilot uses AI. Check for mistakes.
scale_N);
}
return std::move(output);

Copy link

Copilot AI Nov 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Missing blank line after this return statement before the closing brace for consistency with surrounding code patterns (see line 504 which has similar structure).

Suggested change

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant