Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@opencv-alalek
Copy link
Contributor

@opencv-alalek opencv-alalek commented Jan 24, 2024

relates #24909
relates #24917
relates #24892

Performance changes:

  • 12700K (1 thread) + Intel iGPU
Name of Test noOCL convertFp16 convertTo BASE convertTo PATCH
ConvertFP16FP32MatMat::OCL_Core 3.130 3.152 3.127 3.136
ConvertFP16FP32MatUMat::OCL_Core 3.030 3.996 3.007 2.671
ConvertFP16FP32UMatMat::OCL_Core 3.010 3.101 3.056 2.854
ConvertFP16FP32UMatUMat::OCL_Core 3.016 3.298 2.072 2.061
ConvertFP32FP16MatMat::OCL_Core 2.697 2.652 2.723 2.721
ConvertFP32FP16MatUMat::OCL_Core 2.752 4.268 2.662 2.947
ConvertFP32FP16UMatMat::OCL_Core 2.706 2.601 2.603 2.528
ConvertFP32FP16UMatUMat::OCL_Core 2.704 3.215 1.999 1.988

Patched version is not worse than convertFp16 and convertTo baseline (except MatUMat 32->16, baseline uses CPU code+dst buffer map).
There are still gaps against noOpenCL(CPU only) mode due to T-API implementation issues (unnecessary synchronization).

  • 12700K + AMD dGPU
Name of Test noOCL convertFp16 dGPU convertTo BASE dGPU convertTo PATCH dGPU
ConvertFP16FP32MatMat::OCL_Core 3.130 3.133 3.172 3.087
ConvertFP16FP32MatUMat::OCL_Core 3.030 1.713 9.559 1.729
ConvertFP16FP32UMatMat::OCL_Core 3.010 6.515 6.309 4.452
ConvertFP16FP32UMatUMat::OCL_Core 3.016 0.242 23.597 0.170
ConvertFP32FP16MatMat::OCL_Core 2.697 2.641 2.713 2.689
ConvertFP32FP16MatUMat::OCL_Core 2.752 4.076 6.483 4.191
ConvertFP32FP16UMatMat::OCL_Core 2.706 9.042 16.481 1.834
ConvertFP32FP16UMatUMat::OCL_Core 2.704 0.229 15.730 0.176

convertTo-baseline can't compile OpenCL kernel for FP16 properly - FIXED.
dGPU has much more power, so results are x16-17 better than single cpu core.
Patched version is not worse than convertFp16 and convertTo baseline.
There are still gaps against noOpenCL(CPU only) mode due to T-API implementation issues (unnecessary synchronization) and required memory transfers.

force_builders=Linux OpenCL,Linux AVX2,Win64 OpenCL

@opencv-pushbot opencv-pushbot force-pushed the gitee/alalek/core_convertfp16_replacement branch from 99ba03c to b9b3860 Compare January 24, 2024 13:56
@opencv-alalek opencv-alalek marked this pull request as ready for review January 24, 2024 21:57
Copy link
Contributor

@asmorkalov asmorkalov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

DUMP_CONFIG_PROPERTY("cv_ocl_current_maxMemAllocSize", device.maxMemAllocSize());

const char* doubleSupportStr = device.doubleFPConfig() > 0 ? "Yes" : "No";
const char* doubleSupportStr = device.hasFP64() ? "Yes" : "No";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So what is going to happen with doubleFPConfig and halfFPConfig. Are they deprecating as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, they are still needed if we want to compute with proper inf/nans support.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked doubleFPConfig in the whole opencv project and it is basically used like this,

bool doubleSupport = ocl::Device::getDefault().doubleFPConfig() > 0

they are still needed if we want to compute with proper inf/nans support

Did I miss anything here? Or it is in the user code instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usage is not correct.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the correct way? All these code is wrong?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All are subject for revising.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay.

@asmorkalov asmorkalov self-assigned this Jan 26, 2024
@asmorkalov asmorkalov merged commit 40533db into opencv:4.x Jan 26, 2024
asmorkalov pushed a commit that referenced this pull request Jan 26, 2024
…16s_usage

DNN: avoid CV_16S usage for FP16 #24892

**Merge after**: #24918

TODO:
- [x] measure performance changes
- [x] optimize convertTo for OpenCL: #24918

12700K iGPU:

|Name of Test|0|1|1 vs 0 (x-factor)|
|---|:-:|:-:|:-:|
|AlexNet::DNNTestNetwork::OCV/OCL_FP16|7.441|7.480|0.99|
|CRNN::DNNTestNetwork::OCV/OCL_FP16|10.776|10.736|1.00|
|DenseNet_121::DNNTestNetwork::OCV/OCL_FP16|52.762|52.833|1.00|
|EAST_text_detection::DNNTestNetwork::OCV/OCL_FP16|60.694|60.721|1.00|
|EfficientNet::DNNTestNetwork::OCV/OCL_FP16|33.373|33.173|1.01|
|FastNeuralStyle_eccv16::DNNTestNetwork::OCV/OCL_FP16|81.840|81.724|1.00|
|GoogLeNet::DNNTestNetwork::OCV/OCL_FP16|20.965|20.927|1.00|
|Inception_5h::DNNTestNetwork::OCV/OCL_FP16|22.204|22.173|1.00|
|Inception_v2_SSD_TensorFlow::DNNTestNetwork::OCV/OCL_FP16|47.115|47.460|0.99|
|MPHand::DNNTestNetwork::OCV/OCL_FP16|6.760|6.670|1.01|
|MPPalm::DNNTestNetwork::OCV/OCL_FP16|10.188|10.171|1.00|
|MPPose::DNNTestNetwork::OCV/OCL_FP16|12.510|12.561|1.00|
|MobileNet_SSD_Caffe::DNNTestNetwork::OCV/OCL_FP16|17.290|17.072|1.01|
|MobileNet_SSD_v1_TensorFlow::DNNTestNetwork::OCV/OCL_FP16|19.473|19.306|1.01|
|MobileNet_SSD_v2_TensorFlow::DNNTestNetwork::OCV/OCL_FP16|22.874|23.404|0.98|
|OpenFace::DNNTestNetwork::OCV/OCL_FP16|9.568|9.517|1.01|
|OpenPose_pose_mpi_faster_4_stages::DNNTestNetwork::OCV/OCL_FP16|539.899|539.845|1.00|
|PPHumanSeg::DNNTestNetwork::OCV/OCL_FP16|18.015|18.769|0.96|
|PPOCRv3::DNNTestNetwork::OCV/OCL_FP16|63.122|63.540|0.99|
|ResNet_50::DNNTestNetwork::OCV/OCL_FP16|34.947|34.925|1.00|
|SFace::DNNTestNetwork::OCV/OCL_FP16|10.249|10.206|1.00|
|SSD::DNNTestNetwork::OCV/OCL_FP16|213.068|213.108|1.00|
|SqueezeNet_v1_1::DNNTestNetwork::OCV/OCL_FP16|4.867|4.878|1.00|
|VIT_B_32::DNNTestNetwork::OCV/OCL_FP16|200.563|190.788|1.05|
|VitTrack::DNNTestNetwork::OCV/OCL_FP16|7.528|7.173|1.05|
|YOLOX::DNNTestNetwork::OCV/OCL_FP16|132.858|132.701|1.00|
|YOLOv3::DNNTestNetwork::OCV/OCL_FP16|209.559|208.809|1.00|
|YOLOv4::DNNTestNetwork::OCV/OCL_FP16|221.357|220.924|1.00|
|YOLOv4_tiny::DNNTestNetwork::OCV/OCL_FP16|24.446|24.382|1.00|
|YOLOv5::DNNTestNetwork::OCV/OCL_FP16|43.922|44.080|1.00|
|YOLOv8::DNNTestNetwork::OCV/OCL_FP16|64.159|63.842|1.00|
|YuNet::DNNTestNetwork::OCV/OCL_FP16|10.177|10.231|0.99|
|opencv_face_detector::DNNTestNetwork::OCV/OCL_FP16|15.121|15.445|0.98|

Co-authored-by: Alexander Alekhin <[email protected]>
This was referenced Feb 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants