Performance improving for MKL-DNN Quantized FullyConnected#14528
Conversation
pengzhao-intel
left a comment
There was a problem hiding this comment.
It's good to use an enum instead of the hardcode number in the code.
LGTM.
| enum FullyConnectedOpOutputs {kOut}; | ||
| } // fullc | ||
|
|
||
| namespace quantized_fullc { |
There was a problem hiding this comment.
just to align with fullc :)
|
|
||
| MKLDNNFCForwardFullFeature(full_param_, ctx, fwd_.get(), new_inputs, new_req, out_data); | ||
|
|
||
| if (mkldnn_param.quantized && !mkldnn_param.enable_float_output) { |
There was a problem hiding this comment.
add comments on why
There was a problem hiding this comment.
I think it's straightforward here, the OutMin and OutMax are only valid when the op is quantized and not generating fp32 output.
|
@mxnet-label-bot update [MKLDNN, Performance, pr-awaiting-testing] |
|
@anirudh2290 @ZhennanQin @xinyu-intel to review |
|
|
||
| const float min_data = | ||
| in_data[num_inputs + quantized_fc_enum::kDataMin].data().dptr<float>()[0]; | ||
| in_data[num_inputs + quantized_fullc::kDataMin].data().dptr<float>()[0]; |
There was a problem hiding this comment.
Quite strange usage. Why not define a whole input sets with original inputs?
There was a problem hiding this comment.
Original inputs might not include bias, which results in different index for all these min/max. Just to simplify the ordering for quantized op only.
|
Thanks for your contribution. Merging now. |
* Cached bias to Quantized FullyCOnnected based on Subgraph to improve performance * retrigger CI * retrigger CI
Description
The patch is mainly for improving the performance of MKL-DNN quantized FullyConnected.
@pengzhao-intel @TaoLv
Checklist
Essentials
Please feel free to remove inapplicable items for your PR.
Changes
Comments