Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@opencv-alalek
Copy link
Contributor

@opencv-alalek opencv-alalek commented Jan 19, 2024

Merge after: #24918

TODO:

12700K iGPU:

Name of Test 0 1 1 vs 0 (x-factor)
AlexNet::DNNTestNetwork::OCV/OCL_FP16 7.441 7.480 0.99
CRNN::DNNTestNetwork::OCV/OCL_FP16 10.776 10.736 1.00
DenseNet_121::DNNTestNetwork::OCV/OCL_FP16 52.762 52.833 1.00
EAST_text_detection::DNNTestNetwork::OCV/OCL_FP16 60.694 60.721 1.00
EfficientNet::DNNTestNetwork::OCV/OCL_FP16 33.373 33.173 1.01
FastNeuralStyle_eccv16::DNNTestNetwork::OCV/OCL_FP16 81.840 81.724 1.00
GoogLeNet::DNNTestNetwork::OCV/OCL_FP16 20.965 20.927 1.00
Inception_5h::DNNTestNetwork::OCV/OCL_FP16 22.204 22.173 1.00
Inception_v2_SSD_TensorFlow::DNNTestNetwork::OCV/OCL_FP16 47.115 47.460 0.99
MPHand::DNNTestNetwork::OCV/OCL_FP16 6.760 6.670 1.01
MPPalm::DNNTestNetwork::OCV/OCL_FP16 10.188 10.171 1.00
MPPose::DNNTestNetwork::OCV/OCL_FP16 12.510 12.561 1.00
MobileNet_SSD_Caffe::DNNTestNetwork::OCV/OCL_FP16 17.290 17.072 1.01
MobileNet_SSD_v1_TensorFlow::DNNTestNetwork::OCV/OCL_FP16 19.473 19.306 1.01
MobileNet_SSD_v2_TensorFlow::DNNTestNetwork::OCV/OCL_FP16 22.874 23.404 0.98
OpenFace::DNNTestNetwork::OCV/OCL_FP16 9.568 9.517 1.01
OpenPose_pose_mpi_faster_4_stages::DNNTestNetwork::OCV/OCL_FP16 539.899 539.845 1.00
PPHumanSeg::DNNTestNetwork::OCV/OCL_FP16 18.015 18.769 0.96
PPOCRv3::DNNTestNetwork::OCV/OCL_FP16 63.122 63.540 0.99
ResNet_50::DNNTestNetwork::OCV/OCL_FP16 34.947 34.925 1.00
SFace::DNNTestNetwork::OCV/OCL_FP16 10.249 10.206 1.00
SSD::DNNTestNetwork::OCV/OCL_FP16 213.068 213.108 1.00
SqueezeNet_v1_1::DNNTestNetwork::OCV/OCL_FP16 4.867 4.878 1.00
VIT_B_32::DNNTestNetwork::OCV/OCL_FP16 200.563 190.788 1.05
VitTrack::DNNTestNetwork::OCV/OCL_FP16 7.528 7.173 1.05
YOLOX::DNNTestNetwork::OCV/OCL_FP16 132.858 132.701 1.00
YOLOv3::DNNTestNetwork::OCV/OCL_FP16 209.559 208.809 1.00
YOLOv4::DNNTestNetwork::OCV/OCL_FP16 221.357 220.924 1.00
YOLOv4_tiny::DNNTestNetwork::OCV/OCL_FP16 24.446 24.382 1.00
YOLOv5::DNNTestNetwork::OCV/OCL_FP16 43.922 44.080 1.00
YOLOv8::DNNTestNetwork::OCV/OCL_FP16 64.159 63.842 1.00
YuNet::DNNTestNetwork::OCV/OCL_FP16 10.177 10.231 0.99
opencv_face_detector::DNNTestNetwork::OCV/OCL_FP16 15.121 15.445 0.98
  • --gtest_filter=*Net*
  • summary.py -f FP16
force_builders=Linux OpenCL

@opencv-alalek opencv-alalek added this to the 4.10.0 milestone Jan 19, 2024
@opencv-pushbot opencv-pushbot force-pushed the gitee/alalek/dnn_avoid_16s_usage branch from efaa09b to e6c3e86 Compare January 20, 2024 05:00
@opencv-pushbot opencv-pushbot force-pushed the gitee/alalek/dnn_avoid_16s_usage branch from e6c3e86 to a87974c Compare January 24, 2024 22:04
asmorkalov pushed a commit that referenced this pull request Jan 26, 2024
…rtfp16_replacement

core(OpenCL): optimize convertTo() with CV_16F (convertFp16() replacement) #24918

relates #24909
relates #24917
relates #24892

Performance changes:

- [x] 12700K (1 thread) + Intel iGPU

|Name of Test|noOCL|convertFp16|convertTo BASE|convertTo PATCH|
|---|:-:|:-:|:-:|:-:|
|ConvertFP16FP32MatMat::OCL_Core|3.130|3.152|3.127|3.136|
|ConvertFP16FP32MatUMat::OCL_Core|3.030|3.996|3.007|2.671|
|ConvertFP16FP32UMatMat::OCL_Core|3.010|3.101|3.056|2.854|
|ConvertFP16FP32UMatUMat::OCL_Core|3.016|3.298|2.072|2.061|
|ConvertFP32FP16MatMat::OCL_Core|2.697|2.652|2.723|2.721|
|ConvertFP32FP16MatUMat::OCL_Core|2.752|4.268|2.662|2.947|
|ConvertFP32FP16UMatMat::OCL_Core|2.706|2.601|2.603|2.528|
|ConvertFP32FP16UMatUMat::OCL_Core|2.704|3.215|1.999|1.988|

Patched version is not worse than convertFp16 and convertTo baseline (except MatUMat 32->16, baseline uses CPU code+dst buffer map).
There are still gaps against noOpenCL(CPU only) mode due to T-API implementation issues (unnecessary synchronization).


- [x] 12700K + AMD dGPU

|Name of Test|noOCL|convertFp16 dGPU|convertTo BASE dGPU|convertTo PATCH dGPU|
|---|:-:|:-:|:-:|:-:|
|ConvertFP16FP32MatMat::OCL_Core|3.130|3.133|3.172|3.087|
|ConvertFP16FP32MatUMat::OCL_Core|3.030|1.713|9.559|1.729|
|ConvertFP16FP32UMatMat::OCL_Core|3.010|6.515|6.309|4.452|
|ConvertFP16FP32UMatUMat::OCL_Core|3.016|0.242|23.597|0.170|
|ConvertFP32FP16MatMat::OCL_Core|2.697|2.641|2.713|2.689|
|ConvertFP32FP16MatUMat::OCL_Core|2.752|4.076|6.483|4.191|
|ConvertFP32FP16UMatMat::OCL_Core|2.706|9.042|16.481|1.834|
|ConvertFP32FP16UMatUMat::OCL_Core|2.704|0.229|15.730|0.176|

convertTo-baseline can't compile OpenCL kernel for FP16 properly - FIXED.
dGPU has much more power, so results are x16-17 better than single cpu core. 
Patched version is not worse than convertFp16 and convertTo baseline.
There are still gaps against noOpenCL(CPU only) mode due to T-API implementation issues (unnecessary synchronization) and required memory transfers.

Co-authored-by: Alexander Alekhin <[email protected]>
@asmorkalov asmorkalov self-requested a review January 26, 2024 10:14
@asmorkalov asmorkalov self-assigned this Jan 26, 2024
Copy link
Contributor

@asmorkalov asmorkalov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@opencv-pushbot opencv-pushbot force-pushed the gitee/alalek/dnn_avoid_16s_usage branch from a87974c to de68623 Compare January 26, 2024 11:30
@opencv-alalek opencv-alalek marked this pull request as ready for review January 26, 2024 11:32
@asmorkalov asmorkalov merged commit efc9837 into opencv:4.x Jan 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants