-
-
Notifications
You must be signed in to change notification settings - Fork 56.4k
5.x merge 4.x #24981
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
5.x merge 4.x #24981
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Resolved issue number opencv#22177
…_convertfp16_replacement core(OpenCL): optimize convertTo() with CV_16F (convertFp16() replacement) opencv#24918 relates opencv#24909 relates opencv#24917 relates opencv#24892 Performance changes: - [x] 12700K (1 thread) + Intel iGPU |Name of Test|noOCL|convertFp16|convertTo BASE|convertTo PATCH| |---|:-:|:-:|:-:|:-:| |ConvertFP16FP32MatMat::OCL_Core|3.130|3.152|3.127|3.136| |ConvertFP16FP32MatUMat::OCL_Core|3.030|3.996|3.007|2.671| |ConvertFP16FP32UMatMat::OCL_Core|3.010|3.101|3.056|2.854| |ConvertFP16FP32UMatUMat::OCL_Core|3.016|3.298|2.072|2.061| |ConvertFP32FP16MatMat::OCL_Core|2.697|2.652|2.723|2.721| |ConvertFP32FP16MatUMat::OCL_Core|2.752|4.268|2.662|2.947| |ConvertFP32FP16UMatMat::OCL_Core|2.706|2.601|2.603|2.528| |ConvertFP32FP16UMatUMat::OCL_Core|2.704|3.215|1.999|1.988| Patched version is not worse than convertFp16 and convertTo baseline (except MatUMat 32->16, baseline uses CPU code+dst buffer map). There are still gaps against noOpenCL(CPU only) mode due to T-API implementation issues (unnecessary synchronization). - [x] 12700K + AMD dGPU |Name of Test|noOCL|convertFp16 dGPU|convertTo BASE dGPU|convertTo PATCH dGPU| |---|:-:|:-:|:-:|:-:| |ConvertFP16FP32MatMat::OCL_Core|3.130|3.133|3.172|3.087| |ConvertFP16FP32MatUMat::OCL_Core|3.030|1.713|9.559|1.729| |ConvertFP16FP32UMatMat::OCL_Core|3.010|6.515|6.309|4.452| |ConvertFP16FP32UMatUMat::OCL_Core|3.016|0.242|23.597|0.170| |ConvertFP32FP16MatMat::OCL_Core|2.697|2.641|2.713|2.689| |ConvertFP32FP16MatUMat::OCL_Core|2.752|4.076|6.483|4.191| |ConvertFP32FP16UMatMat::OCL_Core|2.706|9.042|16.481|1.834| |ConvertFP32FP16UMatUMat::OCL_Core|2.704|0.229|15.730|0.176| convertTo-baseline can't compile OpenCL kernel for FP16 properly - FIXED. dGPU has much more power, so results are x16-17 better than single cpu core. Patched version is not worse than convertFp16 and convertTo baseline. There are still gaps against noOpenCL(CPU only) mode due to T-API implementation issues (unnecessary synchronization) and required memory transfers. Co-authored-by: Alexander Alekhin <[email protected]>
…nings Handle warnings in loongson-related code opencv#24925 See https://github.com/fengyuentau/opencv/actions/runs/7665377694/job/20891162958#step:14:16 Warnings needs to be handled before we add the loongson server to our CI. ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake
…avoid_16s_usage DNN: avoid CV_16S usage for FP16 opencv#24892 **Merge after**: opencv#24918 TODO: - [x] measure performance changes - [x] optimize convertTo for OpenCL: opencv#24918 12700K iGPU: |Name of Test|0|1|1 vs 0 (x-factor)| |---|:-:|:-:|:-:| |AlexNet::DNNTestNetwork::OCV/OCL_FP16|7.441|7.480|0.99| |CRNN::DNNTestNetwork::OCV/OCL_FP16|10.776|10.736|1.00| |DenseNet_121::DNNTestNetwork::OCV/OCL_FP16|52.762|52.833|1.00| |EAST_text_detection::DNNTestNetwork::OCV/OCL_FP16|60.694|60.721|1.00| |EfficientNet::DNNTestNetwork::OCV/OCL_FP16|33.373|33.173|1.01| |FastNeuralStyle_eccv16::DNNTestNetwork::OCV/OCL_FP16|81.840|81.724|1.00| |GoogLeNet::DNNTestNetwork::OCV/OCL_FP16|20.965|20.927|1.00| |Inception_5h::DNNTestNetwork::OCV/OCL_FP16|22.204|22.173|1.00| |Inception_v2_SSD_TensorFlow::DNNTestNetwork::OCV/OCL_FP16|47.115|47.460|0.99| |MPHand::DNNTestNetwork::OCV/OCL_FP16|6.760|6.670|1.01| |MPPalm::DNNTestNetwork::OCV/OCL_FP16|10.188|10.171|1.00| |MPPose::DNNTestNetwork::OCV/OCL_FP16|12.510|12.561|1.00| |MobileNet_SSD_Caffe::DNNTestNetwork::OCV/OCL_FP16|17.290|17.072|1.01| |MobileNet_SSD_v1_TensorFlow::DNNTestNetwork::OCV/OCL_FP16|19.473|19.306|1.01| |MobileNet_SSD_v2_TensorFlow::DNNTestNetwork::OCV/OCL_FP16|22.874|23.404|0.98| |OpenFace::DNNTestNetwork::OCV/OCL_FP16|9.568|9.517|1.01| |OpenPose_pose_mpi_faster_4_stages::DNNTestNetwork::OCV/OCL_FP16|539.899|539.845|1.00| |PPHumanSeg::DNNTestNetwork::OCV/OCL_FP16|18.015|18.769|0.96| |PPOCRv3::DNNTestNetwork::OCV/OCL_FP16|63.122|63.540|0.99| |ResNet_50::DNNTestNetwork::OCV/OCL_FP16|34.947|34.925|1.00| |SFace::DNNTestNetwork::OCV/OCL_FP16|10.249|10.206|1.00| |SSD::DNNTestNetwork::OCV/OCL_FP16|213.068|213.108|1.00| |SqueezeNet_v1_1::DNNTestNetwork::OCV/OCL_FP16|4.867|4.878|1.00| |VIT_B_32::DNNTestNetwork::OCV/OCL_FP16|200.563|190.788|1.05| |VitTrack::DNNTestNetwork::OCV/OCL_FP16|7.528|7.173|1.05| |YOLOX::DNNTestNetwork::OCV/OCL_FP16|132.858|132.701|1.00| |YOLOv3::DNNTestNetwork::OCV/OCL_FP16|209.559|208.809|1.00| |YOLOv4::DNNTestNetwork::OCV/OCL_FP16|221.357|220.924|1.00| |YOLOv4_tiny::DNNTestNetwork::OCV/OCL_FP16|24.446|24.382|1.00| |YOLOv5::DNNTestNetwork::OCV/OCL_FP16|43.922|44.080|1.00| |YOLOv8::DNNTestNetwork::OCV/OCL_FP16|64.159|63.842|1.00| |YuNet::DNNTestNetwork::OCV/OCL_FP16|10.177|10.231|0.99| |opencv_face_detector::DNNTestNetwork::OCV/OCL_FP16|15.121|15.445|0.98| Co-authored-by: Alexander Alekhin <[email protected]>
RISC-V: fix mul 8/16 bit for RVV 0.7
RISC-V: fix scale64f performance for RVV 0.7
Do not release user-provided buffer, if image decoder failed
Add python bindings for Rect2f and Point3i
Raft support added in this sample code opencv#24913 ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake fix: opencv#24424 Update DNN Optical Flow sample with RAFT model I implemented both RAFT and FlowNet v2 leaving it to the user which one he wants to use to estimate the optical flow. Co-authored-by: Uday Sharma <[email protected]>
Vulkan backend for NaryEltwiseLayer in DNN module opencv#24768 We improve Vulkan backend for ``NaryEltwiseLayer`` in DNN module by: - add a basic framework for Vulkan backend in ``NaryEltwiseLayer`` - add a compute shader for binary forwarding (an imitation of what has been done in native OpenCV backend including broadcasting and eltwise-operation) - typo fixed: - Wrong info output in ``context.cpp`` Currently, our implementation (or all layers supporting Vulkan backend) runs pretty slow on discrete GPUs basically due to IO cost in function ``copyToHost``, and we are going to fix that by - find out the best ``VkMemoryProperty`` for various discrete GPUs - prevent ``copyToHost`` in middle layers during forwarding, (i.e keep data in GPU memory) ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake Co-authored-by: IskXCr <[email protected]>
…cutor G-API: Implement concurrent executor opencv#24845 ## Overview This PR introduces the new G-API executor called `GThreadedExecutor` which can be selected when the `GComputation` is compiled in `serial` mode (a.k.a `GComputation::compile(...)`) ### ThreadPool `cv::gapi::own::ThreadPool` has been introduced in order to abstract usage of threads in `GThreadedExecutor`. `ThreadPool` is implemented by using `own::concurrent_bounded_queue` `ThreadPool` has only as single method `schedule` that will push task into the queue for the further execution. The **important** notice is that if `Task` executed in `ThreadPool` throws exception - this is `UB`. ### GThreadedExecutor The `GThreadedExecutor` is mostly copy-paste of `GExecutor`, should we extend `GExecutor` instead? #### Implementation details 1. Build the dependency graph for `Island` nodes. 2. Store the tasks that don't have dependencies into separate `vector` in order to run them first. 3. at the `GThreadedExecutor::run()` schedule the tasks that don't have dependencies that will schedule their dependents and wait for the completion. ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [ ] I agree to contribute to the project under Apache 2 License. - [ ] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [ ] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake
Documentation for Yolo usage in Opencv opencv#24898 This PR introduces documentation for the usage of yolo detection model family in open CV. This is not to be merge before opencv#24691, as the sample will need to be changed. ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake
Build warning fix for Charuco tests
Modified Java tests to run on Android opencv#24910 To run the tests you need to: 1. Build OpenCV using Android pipeline. For example: `cmake -DBUILD_TEST=ON -DANDROID=ON -DANDROID_ABI=arm64-v8a -DCMAKE_TOOLCHAIN_FILE=/usr/lib/android-sdk/ndk/25.1.8937393/build/cmake/android.toolchain.cmake -DANDROID_NDK=/usr/lib/android-sdk/ndk/25.1.8937393 -DANDROID_SDK=/usr/lib/android-sdk ../opencv` `make` 2. Connect Android Phone 3. Run tests: `cd android_tests` `./gradlew tests_module:connectedAndroidTest` Related CI pipeline: opencv/ci-gha-workflow#138 ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake
…phone Added job to test with real hardware
…ning_fix Build warning fix in Tutorial4-OpenCL.
QR codes Structured Append decoding mode opencv#24548 ### Pull Request Readiness Checklist resolves opencv#23245 Merge after opencv#24299 Current proposal is to use `detectAndDecodeMulti` or `decodeMulti` for structured append mode decoding. 0-th QR code in a sequence gets a full message while the rest of codes will correspond to empty strings. See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake
Add CMake policy CMP0071 for AUTOMOC and AUTOUIC
…system Enable file system on Emscripten
Added offline option for Android builds opencv#24956 ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake
…ardDetector-findQuadNeighbor Fix bug in ChessBoardDetector::findQuadNeighbors opencv#24779 ### Pull Request Readiness Checklist `corners` and `neighbors` indices means not filling order, but relative position. So, for example if `quad->count = 2`, it doesn't mean that `quad->neighbors[0]` and `quad->neighbors[1]` are filled. And we should should iterate over all four `neighbors`. See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake
…mess Fix proto and weights mess in dnn performance tests
Allow multiple flags with OPENCV_GRADLE_VERBOSE_OPTIONS opencv#24969 ### Pull Request Readiness Checklist Merge with opencv/ci-gha-workflow#144 See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake
Contributor
Author
|
@zihaomu @fengyuentau Could you take a look on Vulkan NaryEltwiseLayer part. I I'm not sure, if merged all things correctly. |
fengyuentau
approved these changes
Feb 8, 2024
Member
fengyuentau
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont see problems in the naryeltwise vulkan backend part. Also tests are passing. So it should be alright.
opencv-alalek
approved these changes
Feb 8, 2024
6 tasks
This was referenced Feb 12, 2024
cb6507b to
3a55f50
Compare
Merged
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
OpenCV Contrib: opencv/opencv_contrib#3634
OpenCV Extra: opencv/opencv_extra#1147
#24548 from dkurt:qrcode_struct_append_decode
#24768 from Haosonn:pre-pr-2
#24779 from MaximSmolskiy:fix-bug-in-ChessBoardDetector-findQuadNeighbor
#24832 from AryanNanda17:Aryan#22177
#24845 from TolyaTalamanov:at/concurrent-executor
#24892 from opencv-pushbot:gitee/alalek/dnn_avoid_16s_usage
#24898 from Abdurrahheem:ash/yolo_ducumentation
#24910 from alexlyulkov:al/android-tests
#24913 from usyntest:optical-flow-sample-raft
#24918 from opencv-pushbot:gitee/alalek/core_convertfp16_replacement
#24919 from asmorkalov:as/python_Rect2f_Point3i
#24925 from fengyuentau:loongarch_handle_warnings
#24929 from asmorkalov:as/imdecode_user_buffer
#24931 from mshabunin:fix-rvv07-mul
#24934 from GengGode:fix
#24936 from mshabunin:fix-rvv07-scale64f
#24942 from asmorkalov:as/android_warning_fix
#24945 from asmorkalov:as/android_sample_warning_fix
#24947 from asmorkalov:as/android_test_with_phone
#24949 from hoodmane:emscripten-enable-file-system
#24956 from asmorkalov:as/android_build_offline
#24968 from fengyuentau:fix_nary_ocl
#24969 from asmorkalov:as/android_offline
#24973 from asmorkalov:as/fix_weigths_proto_mess
Previous "Merge 4.x": #24912