feat: rmsnorm fuse quant and unitest #312

zhaoan12-prc · 2025-11-03T06:15:30Z

No description provided.

Copilot

Pull Request Overview

This PR adds support for fused FP8 per-token quantization with ROCm GEMM operations and layer normalization. The implementation optimizes MoE models by conditionally disabling quantization for layer normalization when processing MoE models, and extends FP8 quantization support in GEMM operations to handle already-quantized input buffers.

Added new test file for FP8 per-token, per-channel (PTPC) A8W8 GEMM operations
Extended ROCm layer normalization to support fused FP8 per-token quantization for RMSNorm
Modified GEMM operations to skip redundant quantization when inputs are already quantized

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
tests/gemm/rocm_ptpc_a8w8_gemm_op_test.py	New test file for FP8 A8W8 GEMM with Chinese error messages and comprehensive tensor swizzling/shuffling utilities
tests/gemm/rocm_pertensor_int8_gemm_op_test.py	Removed trailing empty line
tests/gemm/gemm_op_test.cc	Added blank line after class constructor declaration
tests/BUILD	Added build configuration for new PTPC A8W8 GEMM test
rtp_llm/cpp/models/GptModel.cc	Added conditional logic to disable quantization for MoE models in post-layernorm
rtp_llm/cpp/devices/rocm_impl/ROCmLayernorm.cc	Refactored buffer allocation logic and added FP8 per-token quantization support for RMSNorm
rtp_llm/cpp/devices/rocm_impl/ROCmGemmOp.cc	Added logic to skip quantization when input is already QBuffer and extended FP8 GEMM dispatch

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

tests/gemm/rocm_ptpc_a8w8_gemm_op_test.py

rtp_llm/cpp/models/GptModel.cc

rtp_llm/cpp/devices/rocm_impl/ROCmLayernorm.cc

rtp_llm/cpp/devices/rocm_impl/ROCmGemmOp.cc

Copilot

Pull Request Overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 6 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

tests/gemm/rocm_ptpc_a8w8_gemm_op_test.py

rtp_llm/cpp/models/GptModel.cc

rtp_llm/cpp/devices/rocm_impl/ROCmLayernorm.cc

rtp_llm/cpp/devices/rocm_impl/ROCmGemmOp.cc

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copilot

Pull Request Overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.

Comments suppressed due to low confidence (1)

tests/gemm/rocm_ptpc_a8w8_gemm_op_test.py:1

Help text contains Chinese characters '计算使用的数据类型'. Comments and documentation should be in English for consistency with the rest of the codebase.

# SPDX-License-Identifier: MIT

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

rtp_llm/cpp/devices/rocm_impl/ROCmLayernorm.cc

rtp_llm/cpp/models/GptModel.cc

Copilot

Pull Request Overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-11-04T02:35:11Z

tests/gemm/rocm_ptpc_a8w8_gemm_op_test.py

+                f"相对误差={torch.abs((a[idx_tuple] - b[idx_tuple]) / b[idx_tuple])}\n"
+            )
+        if len(mismatch_indices) > 10:
+            error_msg += f"...（共 {len(mismatch_indices)} 处不匹配）"


Error messages contain Chinese text which may not be accessible to all developers. Consider using English for error messages to maintain consistency with the rest of the codebase and ensure international accessibility.

Suggested change

error_msg += f"...（共 {len(mismatch_indices)} 处不匹配）"

error_msg += f"...(total {len(mismatch_indices)} mismatches)"

Copilot · 2025-11-04T02:35:11Z

rtp_llm/cpp/devices/rocm_impl/ROCmLayernorm.cc

-                auto res_tensor = rmsnorm2d(input_tensor, weight_tensor, static_cast<double>(eps), 0);
-                copy({*norm_output, *torchTensor2Buffer(res_tensor)});
-            }    
+            if (params.qscheme == QScheme::Qfp8PerToken /* Do fuse fp8 pertoken*/) {


Grammar correction: 'Do fuse' should be 'Fused' or 'Fuses' in comment.

Suggested change

if (params.qscheme == QScheme::Qfp8PerToken /* Do fuse fp8 pertoken*/) {

if (params.qscheme == QScheme::Qfp8PerToken /* Fused fp8 per-token */) {

Copilot · 2025-11-04T02:35:12Z

rtp_llm/cpp/devices/rocm_impl/ROCmGemmOp.cc

                              autil::StringUtil::toString(arguments.Dshape).c_str(),
-                              params.D->debugString().c_str());
+                              params.D->debugString().c_str());                         
+    } else if (params.A.type() == DataType::TYPE_QFP8_E4M3 && params.B.type() == DataType::TYPE_QFP8_E4M3 /* if fused fp8 pertoken & rmsnorm */) {


Grammar correction: 'if fused' should be 'fused' (remove 'if') in comment.

Suggested change

} else if (params.A.type() == DataType::TYPE_QFP8_E4M3 && params.B.type() == DataType::TYPE_QFP8_E4M3 /* if fused fp8 pertoken & rmsnorm */) {

} else if (params.A.type() == DataType::TYPE_QFP8_E4M3 && params.B.type() == DataType::TYPE_QFP8_E4M3 /* fused fp8 pertoken & rmsnorm */) {

Copilot

Pull Request Overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-11-06T07:20:14Z

rtp_llm/cpp/devices/rocm_impl/ROCmGemmOp.cc

        } else if (params.A.type() == DataType::TYPE_INT8 || params.A.type() == DataType::TYPE_QINT8){
            DDtype = DataType::TYPE_FP16;
+        } else if (params.A.type() == DataType::TYPE_FP8_E4M3 || params.A.type() == DataType::TYPE_QFP8_E4M3){
+            // TO DO: When A is TYPE_FP8_E4M3, choose output dtype according to env "ACT_TYPE".


Comment uses 'TO DO' instead of standard 'TODO' format.

Suggested change

// TO DO: When A is TYPE_FP8_E4M3, choose output dtype according to env "ACT_TYPE".

// TODO: When A is TYPE_FP8_E4M3, choose output dtype according to env "ACT_TYPE".

Copilot · 2025-11-06T07:20:14Z

rtp_llm/cpp/devices/rocm_impl/ROCmGemmOp.cc

                              arguments.DDtype,
                              autil::StringUtil::toString(arguments.Dshape).c_str(),
-                              params.D->debugString().c_str());
+                              params.D->debugString().c_str());                         


This line appears to have only trailing whitespace removed compared to the original, but the trailing spaces on line 380 should be fully cleaned up for consistency.

Suggested change

params.D->debugString().c_str());

params.D->debugString().c_str());

Copilot · 2025-11-06T07:20:14Z

rtp_llm/cpp/devices/rocm_impl/ROCmGemmOp.cc

+                    scale_N);
+            }
+            return std::move(output);            
+


[nitpick] Missing blank line after this return statement before the closing brace for consistency with surrounding code patterns (see line 504 which has similar structure).

Suggested change

Copilot AI review requested due to automatic review settings November 3, 2025 06:15

zhaoan12-prc requested a review from LLLLKKKK as a code owner November 3, 2025 06:15

Copilot AI reviewed Nov 3, 2025

View reviewed changes

zhaoan12-prc requested a review from Copilot November 3, 2025 06:23

Copilot AI reviewed Nov 3, 2025

View reviewed changes

Copilot AI review requested due to automatic review settings November 3, 2025 07:14

Copilot AI reviewed Nov 3, 2025

View reviewed changes

zhaoan12-prc force-pushed the feature/rmsnorm_fuse_quant_and_unitest branch from 4d57ed4 to bdaaa24 Compare November 3, 2025 07:45

Copilot AI review requested due to automatic review settings November 3, 2025 12:35

Copilot AI reviewed Nov 3, 2025

View reviewed changes

rtp_llm/cpp/devices/rocm_impl/ROCmLayernorm.cc Show resolved Hide resolved

rtp_llm/cpp/devices/rocm_impl/ROCmLayernorm.cc Outdated Show resolved Hide resolved

rtp_llm/cpp/models/GptModel.cc Show resolved Hide resolved

zhaoan12-prc force-pushed the feature/rmsnorm_fuse_quant_and_unitest branch from 8057e6b to b1cad8a Compare November 3, 2025 12:55

Copilot AI review requested due to automatic review settings November 4, 2025 02:30

Copilot AI reviewed Nov 4, 2025

View reviewed changes

zhaoan12-prc force-pushed the feature/rmsnorm_fuse_quant_and_unitest branch from 1573862 to 34a8e0e Compare November 5, 2025 10:25

zhaoan12-prc added 2 commits November 6, 2025 11:53

test: add fp8_pertoken a8w8 input gemm test

3b76d67

feat: rmsnorm_fuse_quant for Qwen ptpc dense & moe model

d9f1d42

Copilot AI review requested due to automatic review settings November 6, 2025 07:16

zhaoan12-prc force-pushed the feature/rmsnorm_fuse_quant_and_unitest branch from 34a8e0e to d9f1d42 Compare November 6, 2025 07:16

Copilot AI reviewed Nov 6, 2025

View reviewed changes

	error_msg += f"...（共 {len(mismatch_indices)} 处不匹配）"
	error_msg += f"...(total {len(mismatch_indices)} mismatches)"

	if (params.qscheme == QScheme::Qfp8PerToken /* Do fuse fp8 pertoken*/) {
	if (params.qscheme == QScheme::Qfp8PerToken /* Fused fp8 per-token */) {

	} else if (params.A.type() == DataType::TYPE_QFP8_E4M3 && params.B.type() == DataType::TYPE_QFP8_E4M3 /* if fused fp8 pertoken & rmsnorm */) {
	} else if (params.A.type() == DataType::TYPE_QFP8_E4M3 && params.B.type() == DataType::TYPE_QFP8_E4M3 /* fused fp8 pertoken & rmsnorm */) {

	// TO DO: When A is TYPE_FP8_E4M3, choose output dtype according to env "ACT_TYPE".
	// TODO: When A is TYPE_FP8_E4M3, choose output dtype according to env "ACT_TYPE".

	params.D->debugString().c_str());
	params.D->debugString().c_str());

feat: rmsnorm fuse quant and unitest #312

Are you sure you want to change the base?

feat: rmsnorm fuse quant and unitest #312

Uh oh!

Conversation

zhaoan12-prc commented Nov 3, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Copilot AI Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Copilot AI Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant