[MKLDNN] add quantized sum#14614
Conversation
|
@anirudh2290 @ZhennanQin @TaoLv @ciyongch to review :) |
|
@mxnet-label-bot Add [Quantization, MKLDNN] |
| } | ||
|
|
||
| NNVM_REGISTER_OP(_contrib_quantized_sum) | ||
| .describe(R"code(Adds arguments element-wise. |
| // dataA && dataB are uint8 | ||
| if (out_data[quantized_sum_enum::kDataA].dtype() == mshadow::kInt8) { | ||
| output_data_range = kInt8Range; | ||
| output_data_type = mkldnn::memory::s8; |
| float B_scale = GetScale(in_data[quantized_sum_enum::kDataB], dataB_min, dataB_max); | ||
| // rescaled_mem is for reorder mkldnn memory | ||
| std::shared_ptr<mkldnn::memory> rescaled_mem; | ||
| // output default set as int32 |
There was a problem hiding this comment.
Int32 by default. Do we have any other choice?
There was a problem hiding this comment.
when fusion with requantize op, out put is int8/uint8
| auto s8_pd = (dataA_int8 == true) | ||
| ? dataA_mem->get_primitive_desc() | ||
| : dataB_mem->get_primitive_desc(); | ||
| rescaled_mem = std::make_shared<mkldnn::memory>(s8_pd); |
There was a problem hiding this comment.
reorder ( line 134 ) is done in this if() field, so need allocate memory first.
There was a problem hiding this comment.
Conventionally, we don't want to allocate memory implicitly inside MKL-DNN API. Besides, seems this allocation will happen every iteration which is performance problematic.
There was a problem hiding this comment.
mkldnn sum doesn't support int8 + uint8, so need to reorder them to the same data type first.
There was a problem hiding this comment.
change them to TmpMemMgr::Get()->Alloc
| output_min = 0 - output_max; | ||
| } | ||
|
|
||
| std::vector<float> scales; |
There was a problem hiding this comment.
How many scales do we have? Is it possible to reserve space for them?
There was a problem hiding this comment.
two, scale 0 for dataA, scale 1 for dataB. OK will reserve first
There was a problem hiding this comment.
Suggest:
// scale 0 is for data A, scale 1 is for data B
std::vector<float> scales(2);|
|
||
| auto dataA_mem = in_data[quantized_sum_enum::kDataA].GetMKLDNNData(); | ||
| auto dataB_mem = in_data[quantized_sum_enum::kDataB].GetMKLDNNData(); | ||
| bool dataA_int8 = (in_data[quantized_sum_enum::kDataA].dtype() == mshadow::kInt8) ? true : false; |
There was a problem hiding this comment.
OK。 add const for const variable
| * KIND, either express or implied. See the License for the | ||
| * specific language governing permissions and limitations | ||
| * under the License. | ||
| */ |
There was a problem hiding this comment.
Please add doc info in the header of the new files, including Copyright/brief/author...
|
|
||
| auto dataA_mem = in_data[quantized_sum_enum::kDataA].GetMKLDNNData(); | ||
| auto dataB_mem = in_data[quantized_sum_enum::kDataB].GetMKLDNNData(); | ||
| const bool dataA_int8 = (in_data[quantized_sum_enum::kDataA].dtype() == mshadow::kInt8) |
There was a problem hiding this comment.
is_dataA_int8 could be better for understanding..
|
|
||
| DMLC_REGISTER_PARAMETER(RequantizeSumParam); | ||
|
|
||
| static float GetScale(const NDArray& data, float min, float max) { |
| if (out_data[quantized_sum_enum::kDataA].dtype() == mshadow::kInt8) { | ||
| output_data_range = kInt8Range; | ||
| output_data_type = mkldnn::memory::s8; | ||
| } else if (out_data[quantized_sum_enum::kDataA].dtype() == mshadow::kUint8) { |
There was a problem hiding this comment.
L74 & L77, is it kOut but not kDataA?
| out_data_scale = output_data_range/MaxAbs(output_min, output_max); | ||
| } else { | ||
| output_max = dataA_absmax + dataB_absmax; | ||
| output_min = 0 - output_max; |
| if (in_type->at(i) == mshadow::kInt8) { | ||
| TYPE_ASSIGN_CHECK(*in_type, i, mshadow::kInt8); | ||
| } else { | ||
| TYPE_ASSIGN_CHECK(*in_type, i, mshadow::kUint8); |
There was a problem hiding this comment.
CHECK(in_type->at(i) == mshadow::kInt8 || in_type->at(i) == mshadow::kUint8);
| #include "mkldnn_conv_property.h" | ||
| #include "mkldnn_fc_property.h" | ||
| #include "mkldnn_conv_post_quantize_property.h" | ||
| #include "mkldnn_post_quantize_property.h" |
There was a problem hiding this comment.
did you remove the conv part?
There was a problem hiding this comment.
merge them in one file
|
|
||
| @with_seed() | ||
| def test_quantized_sum(): | ||
| def check_quantized_sum(data_shape, qtype): |
There was a problem hiding this comment.
Please also add test case in test_subgraph.py
|
Please retrigger the CI |
|
@TaoLv @ciyongch @ZhennanQin please help review again :) |
TaoLv
left a comment
There was a problem hiding this comment.
Why the operator is not called quantized_elemwise_add? sum is another operator which is used to accumulate elements of an array.
| #include <utility> | ||
| #include <vector> | ||
| #include <string> | ||
| #include "../../tensor/elemwise_unary_op.h" |
There was a problem hiding this comment.
Make sure these headers are used.
There was a problem hiding this comment.
remove unnecessary head file
|
|
||
| struct RequantizeSumParam : public dmlc::Parameter<RequantizeSumParam> { | ||
| dmlc::optional<float> min_calib_range; // min float value calculated from calibration dataset | ||
| dmlc::optional<float> max_calib_range; // max float value calculated from calibration dataset |
There was a problem hiding this comment.
Remove comments. I think these two parameters are already described in L43 and L48.
| auto dataA_mem = in_data[quantized_sum_enum::kDataA].GetMKLDNNData(); | ||
| auto dataB_mem = in_data[quantized_sum_enum::kDataB].GetMKLDNNData(); | ||
| const bool is_dataA_int8 = (in_data[quantized_sum_enum::kDataA].dtype() == mshadow::kInt8) | ||
| ? true : false; |
There was a problem hiding this comment.
const bool is_dataA_int8 = (in_data[quantized_sum_enum::kDataA].dtype() == mshadow::kInt8);| } else if (out_data[quantized_sum_enum::kOut].dtype() == mshadow::kUint8) { | ||
| output_data_range = kUint8Range; | ||
| output_data_type = mkldnn::memory::u8; | ||
| } |
| output_min = 0 - output_max; | ||
| } | ||
|
|
||
| std::vector<float> scales; |
There was a problem hiding this comment.
Suggest:
// scale 0 is for data A, scale 1 is for data B
std::vector<float> scales(2);| if (is_dataA_int8 == true) { | ||
| u8_reorder_scale = out_data_scale/B_scale; | ||
| scales.push_back(out_data_scale/A_scale); | ||
| scales.push_back(1); |
There was a problem hiding this comment.
scales[0] = out_data_scale / A_scale;
scales[1] = 1.0f;| } | ||
| mkldnn::memory::format i_fmt = static_cast<mkldnn::memory::format>( | ||
| in_pds[quantized_sum_enum::kDataA].desc().data.format); | ||
| auto output_desc = memory::desc(i_dims, output_data_type, i_fmt); |
| NNVM_REGISTER_OP(_contrib_quantized_sum) | ||
| .set_attr<FInferStorageType>("FInferStorageType", SumStorageType) | ||
| .set_attr<FComputeEx>("FComputeEx<cpu>", MKLDNNQuantizedSumForward) | ||
| .set_attr<FResourceRequest>("FResourceRequest", [](const NodeAttrs& n) { |
| } | ||
|
|
||
| NNVM_REGISTER_OP(_contrib_quantized_sum) | ||
| .describe(R"code(elem_add operator for input dataA and input dataB data type of int8, |
There was a problem hiding this comment.
change to elemwise_add
has changed from quantized_sum to quantized_elemwise_add |
|
@TaoLv @ciyongch @ZhennanQin please help review the change again. |
TaoLv
left a comment
There was a problem hiding this comment.
Some minor comments. Please fix parameter indents after function names are changed,
| } | ||
|
|
||
| static void MKLDNNQuantizedElemwiseAddForward(const nnvm::NodeAttrs& attrs, const OpContext& ctx, | ||
| const std::vector<NDArray>& in_data, |
| // A, B, A_min, A_max, B_min, B_max | ||
| CHECK_EQ(in_data.size(), 6U); | ||
| // C, C_min, C_max | ||
| CHECK_EQ(out_data.size(), 3U); |
There was a problem hiding this comment.
Please add some descriptive message for these two checks.
There was a problem hiding this comment.
I meant the error message if the check is failed~
| if (params.max_calib_range.has_value() && params.min_calib_range.has_value()) { | ||
| output_min = params.min_calib_range.value(); | ||
| output_max = params.max_calib_range.value(); | ||
| out_data_scale = output_data_range/MaxAbs(output_min, output_max); |
| float u8_reorder_scale = 0; | ||
| if (params.max_calib_range.has_value() && params.min_calib_range.has_value()) { | ||
| if (is_dataA_int8 == true) { | ||
| u8_reorder_scale = out_data_scale/B_scale; |
| namespace op { | ||
|
|
||
| static bool ElemwiseAddShape(const nnvm::NodeAttrs& attrs, mxnet::ShapeVector* in_shape, | ||
| mxnet::ShapeVector* out_shape) { |
|
|
||
| fp32_rslt = output.asnumpy() | ||
| int8_rslt = qoutput.asnumpy()*max_val/0x7fffffff | ||
| assert_almost_equal(int8_rslt, int8_rslt, atol = 1) |
|
Please rebase and retrigger the CI |
| // A, B, A_min, A_max, B_min, B_max | ||
| CHECK_EQ(in_data.size(), 6U); | ||
| // C, C_min, C_max | ||
| CHECK_EQ(out_data.size(), 3U); |
There was a problem hiding this comment.
I meant the error message if the check is failed~
| scales[0] = out_data_scale / A_scale; | ||
| scales[1] = out_data_scale / B_scale; | ||
| } else { | ||
| scales[0] = dataA_absmax*output_data_range / ((dataA_absmax + dataB_absmax)*dataA_range); |
There was a problem hiding this comment.
nit: please also add spaces around operation *.
| .set_attr<FCompute>("FCompute<cpu>", QuantizedElemwiseAddForward) | ||
| .set_attr<FNeedRequantize>("FNeedRequantize", [](const NodeAttrs& attrs) { return true; }) | ||
| .add_argument("lhs", "NDArray-or-Symbol", "first input") | ||
| .add_argument("rhs", "NDArray-or-Symbol", "4th input") |
There was a problem hiding this comment.
Does 4th input mean the order of parameter list when users call this operator? Seems it does not align with the order of FListInputNames.
|
CI is not passed yet~ Please take a look. Thank you. @rongzha1 |
| .set_attr<FNeedRequantize>("FNeedRequantize", [](const NodeAttrs& attrs) { return true; }) | ||
| .add_argument("lhs", "NDArray-or-Symbol", "first input") | ||
| .add_argument("rhs", "NDArray-or-Symbol", "second input") | ||
| .add_argument("lhs_min", "NDArray-or-Symbol", "second input") |
| .set_attr<FInferStorageType>("FInferStorageType", ElemwiseAddStorageType) | ||
| .set_attr<FComputeEx>("FComputeEx<cpu>", MKLDNNQuantizedElemwiseAddForward) | ||
| .set_attr<bool>("TIsMKLDNN", true) | ||
| .set_attr_parser(ParamParser<RequantizeElemwiseAddParam>) |
There was a problem hiding this comment.
It's quantize in the operator name but requantize in the param name. Is it intentional?
There was a problem hiding this comment.
Yes. this is for fusion with requantized
ZhennanQin
left a comment
There was a problem hiding this comment.
Overall LGTM. Just minor comment.
| } | ||
| // C | ||
| int dtype = mshadow::kInt32; | ||
| #if MXNET_USE_MKLDNN == 1 |
There was a problem hiding this comment.
This isn't a feature of mkldnn. Consider to remove this macro.
TaoLv
left a comment
There was a problem hiding this comment.
Thanks for addressing the comments. Now it's approved.
|
Finally, the CI pass. Thanks for the contribution :) Merging now. |
* add quantized sum * fix gpu compiler error and cpu testcase fail * add default forward function for quantized_sum * skip quantized_sum for gpu ctx * fix comments * fix indetation and comments * retrigger CI * alloc memeory through TmpMemMgr * fix comments Apr.12 * change sum to elemwise_add * change Sum to ElemwiseAdd * fix indents * retrigger CI * trigger CI * fix indentation and typo * trigger CI * fix typo * fix typo * remove USE_MKLDNN macro for requantize params * rename param same as its op * trigger CI * trigger CI * trigger CI
* add quantized sum * fix gpu compiler error and cpu testcase fail * add default forward function for quantized_sum * skip quantized_sum for gpu ctx * fix comments * fix indetation and comments * retrigger CI * alloc memeory through TmpMemMgr * fix comments Apr.12 * change sum to elemwise_add * change Sum to ElemwiseAdd * fix indents * retrigger CI * trigger CI * fix indentation and typo * trigger CI * fix typo * fix typo * remove USE_MKLDNN macro for requantize params * rename param same as its op * trigger CI * trigger CI * trigger CI
Description
add quantized sum impl for mkldnn, support int8&&uint8
Checklist
Essentials
Please feel free to remove inapplicable items for your PR.
Changes
testcase is added in : tests/python/quantization/test_quantization.py test_quantized_sum()