Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@Junyan721113
Copy link
Contributor

@Junyan721113 Junyan721113 commented Mar 6, 2024

Summary

Previous context

From PR #24556:

As a test outside of this PR, A 3rdparty component called ndsrvp is created, containing one of the non-dnn code (integral_SIMD), and it works very well.
All the non-dnn code in this PR have been removed, currently this PR can be focused on dnn optinizations.
This HAL mechanism is quite suitable for rvp optimizations, all the non-dnn code is expected to be moved into ndsrvp soon.

Progress

Part 1 (This PR)

  • Core
  • Element-wise add and subtract
  • Element-wise minimum or maximum
  • Element-wise absolute difference
  • Bitwise logical operations
  • Element-wise compare
  • ImgProc
  • Integral
  • Threshold
  • WarpAffine
  • WarpPerspective
  • Features2D

Part 2 (Next PR)

Rough Estimate. Todo List May Change.

  • Core
  • ImgProc
  • smaller remap HAL interface
  • AdaptiveThreshold
  • BoxFilter
  • Canny
  • Convert
  • Filter
  • GaussianBlur
  • MedianBlur
  • Morph
  • Pyrdown
  • Resize
  • Scharr
  • SepFilter
  • Sobel
  • Features2D
  • FAST

Performance Tests

The optimization does not contain floating point opreations.

Absolute Difference

Geometric mean (ms)

Name of Test opencv perf core Absdiff opencv perf core Absdiff opencv perf core Absdiff vs opencv perf core Absdiff (x-factor)
Absdiff::OCL_AbsDiffFixture::(640x480, 8UC1) 23.104 5.972 3.87
Absdiff::OCL_AbsDiffFixture::(640x480, 32FC1) 39.500 40.830 0.97
Absdiff::OCL_AbsDiffFixture::(640x480, 8UC3) 69.155 15.051 4.59
Absdiff::OCL_AbsDiffFixture::(640x480, 32FC3) 118.715 120.509 0.99
Absdiff::OCL_AbsDiffFixture::(640x480, 8UC4) 93.001 19.770 4.70
Absdiff::OCL_AbsDiffFixture::(640x480, 32FC4) 161.136 160.791 1.00
Absdiff::OCL_AbsDiffFixture::(1280x720, 8UC1) 69.211 15.140 4.57
Absdiff::OCL_AbsDiffFixture::(1280x720, 32FC1) 118.762 119.263 1.00
Absdiff::OCL_AbsDiffFixture::(1280x720, 8UC3) 212.414 44.692 4.75
Absdiff::OCL_AbsDiffFixture::(1280x720, 32FC3) 367.512 366.569 1.00
Absdiff::OCL_AbsDiffFixture::(1280x720, 8UC4) 285.337 59.708 4.78
Absdiff::OCL_AbsDiffFixture::(1280x720, 32FC4) 490.395 491.118 1.00
Absdiff::OCL_AbsDiffFixture::(1920x1080, 8UC1) 158.827 33.462 4.75
Absdiff::OCL_AbsDiffFixture::(1920x1080, 32FC1) 273.503 273.668 1.00
Absdiff::OCL_AbsDiffFixture::(1920x1080, 8UC3) 484.175 100.520 4.82
Absdiff::OCL_AbsDiffFixture::(1920x1080, 32FC3) 828.758 829.689 1.00
Absdiff::OCL_AbsDiffFixture::(1920x1080, 8UC4) 648.592 137.195 4.73
Absdiff::OCL_AbsDiffFixture::(1920x1080, 32FC4) 1116.755 1109.587 1.01
Absdiff::OCL_AbsDiffFixture::(3840x2160, 8UC1) 648.715 134.875 4.81
Absdiff::OCL_AbsDiffFixture::(3840x2160, 32FC1) 1115.939 1113.818 1.00
Absdiff::OCL_AbsDiffFixture::(3840x2160, 8UC3) 1944.791 413.420 4.70
Absdiff::OCL_AbsDiffFixture::(3840x2160, 32FC3) 3354.193 3324.672 1.01
Absdiff::OCL_AbsDiffFixture::(3840x2160, 8UC4) 2594.585 553.486 4.69
Absdiff::OCL_AbsDiffFixture::(3840x2160, 32FC4) 4473.543 4438.453 1.01

Bitwise Operation

Geometric mean (ms)

Name of Test opencv perf core Bit opencv perf core Bit opencv perf core Bit vs opencv perf core Bit (x-factor)
Bitwise_and::OCL_BitwiseAndFixture::(640x480, 8UC1) 22.542 4.971 4.53
Bitwise_and::OCL_BitwiseAndFixture::(640x480, 32FC1) 90.210 19.917 4.53
Bitwise_and::OCL_BitwiseAndFixture::(640x480, 8UC3) 68.429 15.037 4.55
Bitwise_and::OCL_BitwiseAndFixture::(640x480, 32FC3) 280.168 59.239 4.73
Bitwise_and::OCL_BitwiseAndFixture::(640x480, 8UC4) 90.565 19.735 4.59
Bitwise_and::OCL_BitwiseAndFixture::(640x480, 32FC4) 374.695 79.257 4.73
Bitwise_and::OCL_BitwiseAndFixture::(1280x720, 8UC1) 67.824 14.873 4.56
Bitwise_and::OCL_BitwiseAndFixture::(1280x720, 32FC1) 279.514 59.232 4.72
Bitwise_and::OCL_BitwiseAndFixture::(1280x720, 8UC3) 208.337 44.234 4.71
Bitwise_and::OCL_BitwiseAndFixture::(1280x720, 32FC3) 851.211 182.522 4.66
Bitwise_and::OCL_BitwiseAndFixture::(1280x720, 8UC4) 279.529 59.095 4.73
Bitwise_and::OCL_BitwiseAndFixture::(1280x720, 32FC4) 1132.065 244.877 4.62
Bitwise_and::OCL_BitwiseAndFixture::(1920x1080, 8UC1) 155.685 33.078 4.71
Bitwise_and::OCL_BitwiseAndFixture::(1920x1080, 32FC1) 635.253 137.482 4.62
Bitwise_and::OCL_BitwiseAndFixture::(1920x1080, 8UC3) 474.494 100.166 4.74
Bitwise_and::OCL_BitwiseAndFixture::(1920x1080, 32FC3) 1907.340 412.841 4.62
Bitwise_and::OCL_BitwiseAndFixture::(1920x1080, 8UC4) 635.538 134.544 4.72
Bitwise_and::OCL_BitwiseAndFixture::(1920x1080, 32FC4) 2552.666 556.397 4.59
Bitwise_and::OCL_BitwiseAndFixture::(3840x2160, 8UC1) 634.736 136.355 4.66
Bitwise_and::OCL_BitwiseAndFixture::(3840x2160, 32FC1) 2548.283 561.827 4.54
Bitwise_and::OCL_BitwiseAndFixture::(3840x2160, 8UC3) 1911.454 421.571 4.53
Bitwise_and::OCL_BitwiseAndFixture::(3840x2160, 32FC3) 7663.803 1677.289 4.57
Bitwise_and::OCL_BitwiseAndFixture::(3840x2160, 8UC4) 2543.983 562.780 4.52
Bitwise_and::OCL_BitwiseAndFixture::(3840x2160, 32FC4) 10211.693 2237.393 4.56
Bitwise_not::OCL_BitwiseNotFixture::(640x480, 8UC1) 22.341 4.811 4.64
Bitwise_not::OCL_BitwiseNotFixture::(640x480, 32FC1) 89.975 19.288 4.66
Bitwise_not::OCL_BitwiseNotFixture::(640x480, 8UC3) 67.237 14.643 4.59
Bitwise_not::OCL_BitwiseNotFixture::(640x480, 32FC3) 276.324 58.609 4.71
Bitwise_not::OCL_BitwiseNotFixture::(640x480, 8UC4) 89.587 19.554 4.58
Bitwise_not::OCL_BitwiseNotFixture::(640x480, 32FC4) 370.986 77.136 4.81
Bitwise_not::OCL_BitwiseNotFixture::(1280x720, 8UC1) 67.227 14.541 4.62
Bitwise_not::OCL_BitwiseNotFixture::(1280x720, 32FC1) 276.357 58.076 4.76
Bitwise_not::OCL_BitwiseNotFixture::(1280x720, 8UC3) 206.752 43.376 4.77
Bitwise_not::OCL_BitwiseNotFixture::(1280x720, 32FC3) 841.638 177.787 4.73
Bitwise_not::OCL_BitwiseNotFixture::(1280x720, 8UC4) 276.773 57.784 4.79
Bitwise_not::OCL_BitwiseNotFixture::(1280x720, 32FC4) 1127.740 237.472 4.75
Bitwise_not::OCL_BitwiseNotFixture::(1920x1080, 8UC1) 153.808 32.531 4.73
Bitwise_not::OCL_BitwiseNotFixture::(1920x1080, 32FC1) 627.765 129.990 4.83
Bitwise_not::OCL_BitwiseNotFixture::(1920x1080, 8UC3) 469.799 98.249 4.78
Bitwise_not::OCL_BitwiseNotFixture::(1920x1080, 32FC3) 1893.591 403.694 4.69
Bitwise_not::OCL_BitwiseNotFixture::(1920x1080, 8UC4) 627.724 129.962 4.83
Bitwise_not::OCL_BitwiseNotFixture::(1920x1080, 32FC4) 2529.967 540.744 4.68
Bitwise_not::OCL_BitwiseNotFixture::(3840x2160, 8UC1) 628.089 130.277 4.82
Bitwise_not::OCL_BitwiseNotFixture::(3840x2160, 32FC1) 2521.817 540.146 4.67
Bitwise_not::OCL_BitwiseNotFixture::(3840x2160, 8UC3) 1905.004 404.704 4.71
Bitwise_not::OCL_BitwiseNotFixture::(3840x2160, 32FC3) 7567.971 1627.898 4.65
Bitwise_not::OCL_BitwiseNotFixture::(3840x2160, 8UC4) 2531.476 540.181 4.69
Bitwise_not::OCL_BitwiseNotFixture::(3840x2160, 32FC4) 10075.594 2181.654 4.62
Bitwise_or::OCL_BitwiseOrFixture::(640x480, 8UC1) 22.566 5.076 4.45
Bitwise_or::OCL_BitwiseOrFixture::(640x480, 32FC1) 90.391 19.928 4.54
Bitwise_or::OCL_BitwiseOrFixture::(640x480, 8UC3) 67.758 14.740 4.60
Bitwise_or::OCL_BitwiseOrFixture::(640x480, 32FC3) 279.253 59.844 4.67
Bitwise_or::OCL_BitwiseOrFixture::(640x480, 8UC4) 90.296 19.802 4.56
Bitwise_or::OCL_BitwiseOrFixture::(640x480, 32FC4) 373.972 79.815 4.69
Bitwise_or::OCL_BitwiseOrFixture::(1280x720, 8UC1) 67.815 14.865 4.56
Bitwise_or::OCL_BitwiseOrFixture::(1280x720, 32FC1) 279.398 60.054 4.65
Bitwise_or::OCL_BitwiseOrFixture::(1280x720, 8UC3) 208.643 45.043 4.63
Bitwise_or::OCL_BitwiseOrFixture::(1280x720, 32FC3) 850.042 180.985 4.70
Bitwise_or::OCL_BitwiseOrFixture::(1280x720, 8UC4) 279.363 60.385 4.63
Bitwise_or::OCL_BitwiseOrFixture::(1280x720, 32FC4) 1134.858 243.062 4.67
Bitwise_or::OCL_BitwiseOrFixture::(1920x1080, 8UC1) 155.212 33.155 4.68
Bitwise_or::OCL_BitwiseOrFixture::(1920x1080, 32FC1) 634.985 134.911 4.71
Bitwise_or::OCL_BitwiseOrFixture::(1920x1080, 8UC3) 474.648 100.407 4.73
Bitwise_or::OCL_BitwiseOrFixture::(1920x1080, 32FC3) 1912.049 414.184 4.62
Bitwise_or::OCL_BitwiseOrFixture::(1920x1080, 8UC4) 635.252 132.587 4.79
Bitwise_or::OCL_BitwiseOrFixture::(1920x1080, 32FC4) 2544.471 560.737 4.54
Bitwise_or::OCL_BitwiseOrFixture::(3840x2160, 8UC1) 634.574 134.966 4.70
Bitwise_or::OCL_BitwiseOrFixture::(3840x2160, 32FC1) 2545.129 561.498 4.53
Bitwise_or::OCL_BitwiseOrFixture::(3840x2160, 8UC3) 1910.900 419.365 4.56
Bitwise_or::OCL_BitwiseOrFixture::(3840x2160, 32FC3) 7662.603 1685.812 4.55
Bitwise_or::OCL_BitwiseOrFixture::(3840x2160, 8UC4) 2548.971 560.787 4.55
Bitwise_or::OCL_BitwiseOrFixture::(3840x2160, 32FC4) 10201.407 2237.552 4.56
Bitwise_xor::OCL_BitwiseXorFixture::(640x480, 8UC1) 22.718 4.961 4.58
Bitwise_xor::OCL_BitwiseXorFixture::(640x480, 32FC1) 91.496 19.831 4.61
Bitwise_xor::OCL_BitwiseXorFixture::(640x480, 8UC3) 67.910 15.151 4.48
Bitwise_xor::OCL_BitwiseXorFixture::(640x480, 32FC3) 279.612 59.792 4.68
Bitwise_xor::OCL_BitwiseXorFixture::(640x480, 8UC4) 91.073 19.853 4.59
Bitwise_xor::OCL_BitwiseXorFixture::(640x480, 32FC4) 374.641 79.155 4.73
Bitwise_xor::OCL_BitwiseXorFixture::(1280x720, 8UC1) 67.704 15.008 4.51
Bitwise_xor::OCL_BitwiseXorFixture::(1280x720, 32FC1) 279.229 60.088 4.65
Bitwise_xor::OCL_BitwiseXorFixture::(1280x720, 8UC3) 208.156 44.426 4.69
Bitwise_xor::OCL_BitwiseXorFixture::(1280x720, 32FC3) 849.501 180.848 4.70
Bitwise_xor::OCL_BitwiseXorFixture::(1280x720, 8UC4) 279.642 59.728 4.68
Bitwise_xor::OCL_BitwiseXorFixture::(1280x720, 32FC4) 1129.826 242.880 4.65
Bitwise_xor::OCL_BitwiseXorFixture::(1920x1080, 8UC1) 155.585 33.354 4.66
Bitwise_xor::OCL_BitwiseXorFixture::(1920x1080, 32FC1) 634.090 134.995 4.70
Bitwise_xor::OCL_BitwiseXorFixture::(1920x1080, 8UC3) 474.931 99.598 4.77
Bitwise_xor::OCL_BitwiseXorFixture::(1920x1080, 32FC3) 1910.519 413.138 4.62
Bitwise_xor::OCL_BitwiseXorFixture::(1920x1080, 8UC4) 635.026 135.155 4.70
Bitwise_xor::OCL_BitwiseXorFixture::(1920x1080, 32FC4) 2560.167 560.838 4.56
Bitwise_xor::OCL_BitwiseXorFixture::(3840x2160, 8UC1) 634.893 134.883 4.71
Bitwise_xor::OCL_BitwiseXorFixture::(3840x2160, 32FC1) 2548.166 560.831 4.54
Bitwise_xor::OCL_BitwiseXorFixture::(3840x2160, 8UC3) 1911.392 419.816 4.55
Bitwise_xor::OCL_BitwiseXorFixture::(3840x2160, 32FC3) 7646.634 1677.988 4.56
Bitwise_xor::OCL_BitwiseXorFixture::(3840x2160, 8UC4) 2560.637 560.805 4.57
Bitwise_xor::OCL_BitwiseXorFixture::(3840x2160, 32FC4) 10227.044 2249.458 4.55

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

  • I agree to contribute to the project under Apache 2 License.
  • To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
  • The PR is proposed to the proper branch
  • There is a reference to the original bug report and related work
  • There is accuracy test, performance test and test data in opencv_extra repository, if applicable
    Patch to opencv_extra has the same branch name.
  • The feature is well documented and sample code can be built with the project CMake

@asmorkalov
Copy link
Contributor

@Junyan721113 Could you add more details on hardware configuration and how to reproduce the result?

@Junyan721113
Copy link
Contributor Author

@Junyan721113 Could you add more details on hardware configuration and how to reproduce the result?

No problem, here are the details for accuracy and performance tests.

RISC-V P Extension v0.5.2

Env

export RISCV=/opt/andes
export PATH=$PATH:/opt/andes/bin

Toolchain

Prebuilt Releases: Andes-Development-Kit

Suggested Version: v5_1_1

nds-gnu-toolchain

./build_linux_toolchain.sh

TARGET=riscv64-linux
PREFIX=/opt/andes
ARCH=rv64imafdcxandes
ABI=lp64d
CPU=andes-25-series
XLEN=64
BUILD=`pwd`/build-nds64le-linux-glibc-v5d

Qemu

qemu

shell ./build

../configure --prefix=/opt/andes --target-list=riscv32-linux-user,riscv64-linux-user --disable-werror --static

Board

The development board used for performance tests is TinkerV with Andes AX45.

Upload the installed toolchain's sysroot at /opt/andes/sysroot, or the prebuilt releases above.

/etc/ld.so.conf

include /etc/ld.so.conf.d/*.conf
/path/to/the/sysroot/library

shell

ldconfig -v

After that the sysroot library should appear in the result.

OpenCV Test

shell ./build

cmake -D CMAKE_BUILD_TYPE=Debug -D CMAKE_INSTALL_PREFIX=/opt/andes -D BUILD_SHARED_LIBS=OFF -D CMAKE_TOOLCHAIN_FILE=../platforms/linux/riscv64-andes-gcc.toolchain.cmake ..

Qemu

shell ./build/bin

qemu-riscv64 -cpu andes-ax25 -L /opt/andes/sysroot opencv_test_core

Board

Directly upload and run the test, and it would perform properly.

@Junyan721113
Copy link
Contributor Author

Considering the Todo List of this PR might be too long, would it be better to divide this PR into smaller ones?

@mshabunin
Copy link
Contributor

@Junyan721113 , you can finalize current state more or less (HAL integration, several core functions implementation). And extend supported functions list in future PRs.

@Junyan721113
Copy link
Contributor Author

Junyan721113 commented Apr 22, 2024

Considering the relation between HAL functions, this PR might be ready for review now.
The optimizations mainly contains the following functions:

  • Core
  • Element-wise add and subtract
  • Element-wise minimum or maximum
  • Element-wise absolute difference
  • Bitwise logical operations
  • Element-wise compare
  • ImgProc
  • Integral
  • Threshold
  • WarpAffine
  • WarpPerspective
  • Features2D

The rest of HAL functions are related to convolution, thus left for another PR.

@Junyan721113 Junyan721113 force-pushed the rvp_3rdparty branch 2 times, most recently from e41c22b to 73130d0 Compare April 22, 2024 21:18
@Junyan721113 Junyan721113 changed the title 3rdparty: NDSRVP - A New 3rdparty Library with Optimizations Based on RISC-V P Extension v0.5.2 3rdparty: NDSRVP - A New 3rdparty Library with Optimizations Based on RISC-V P Extension v0.5.2 - Part 1: Basic Functions Apr 22, 2024
@Junyan721113 Junyan721113 marked this pull request as ready for review April 22, 2024 21:19
@Junyan721113
Copy link
Contributor Author

Junyan721113 commented Apr 22, 2024

Besides, I've noticed that some optimizations could be better if several functions required is also opened as HAL interface, such as:

  • AutoBuffer required by resize and Pyrdown
    These functions are not necessary to be HAL opened for optimizations, since they could be separately implemented. However, due to the weakness of RISC-V P extension, they may not be optimized by RVP and could be reused.

  • remap() required by warpAffine and warpProspective
    Although it can be reused in warpAffine and warpProspective, the remap functions without Floating-Point Operations can be optimized by RVP. However, if decided to optimize remap(), its implementation (such as static RemapNNFunc nn_tab[2][8]) would have so much coupling that every function must to be reimplemented by RVP. Maybe it is possible to open different types of remap functions as different HAL interfaces, called cv_hal_remapNN8u cv_hal_remapNN16s for example? Currently there is only one related inferface called cv_hal_remap32f.

Meanwhile, I wonder how will the HAL inferface change in the coming OpenCV 5.0. The changes may affect the next PR related to this 3rdparty library.

@asmorkalov
Copy link
Contributor

@Junyan721113 Thanks a lot for the contribution!

  • AutoBuffer may be achieved by simple combination of new and malloca. Not sure, if we need expose it.
  • Remap was added to HAL interface a week ago: New HAL API for remap #25399. You are welcome to contribute RISC-V implementation.

@asmorkalov asmorkalov added this to the 4.10.0 milestone Apr 23, 2024
@Junyan721113
Copy link
Contributor Author

@Junyan721113 Thanks a lot for the contribution!

  • AutoBuffer may be achieved by simple combination of new and malloca. Not sure, if we need expose it.
  • Remap was added to HAL interface a week ago: New HAL API for remap #25399. You are welcome to contribute RISC-V implementation.

Thank you! This helps me a lot.

@Junyan721113
Copy link
Contributor Author

@Junyan721113 Thanks a lot for the contribution!

  • AutoBuffer may be achieved by simple combination of new and malloca. Not sure, if we need expose it.
  • Remap was added to HAL interface a week ago: New HAL API for remap #25399. You are welcome to contribute RISC-V implementation.

The mentioned PR contains cv_hal_remap32f, how about adding cv_hal_remap8u cv_hal_remap8s cv_hal_remap16u cv_hal_remap16s? Float32 interface might not be helpful to RVP.

@Junyan721113 , you can finalize current state more or less (HAL integration, several core functions implementation). And extend supported functions list in future PRs.

Meanwhile, the to-do list of "Part 1" is finished, other new features will be in "Part 2". This PR is ready for review now.

@asmorkalov
Copy link
Contributor

32f stands to mapx and mapy are floats, but bot fixed point. source and destination may be any OpenCV supported type. Sorry for the confusion.

@mshabunin
Copy link
Contributor

Currently there are several warnings regarding strict aliasing in the new HAL library (warpAffine and warpPerspective). Are they serious issues or not? Can we somehow avoid these constructions (maybe with some reinterpret intrinsics)?

/work/opencv/3rdparty/ndsrvp/src/warpAffine.cpp: In member function 'virtual void cv::NdsrvpWarpAffineInvoker::operator()(const cv::Range&) const':
/opencv/3rdparty/ndsrvp/src/warpAffine.cpp:58:76: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
   58 |                             *(uint16x4_t*)(xy + x1 * 2) = __nds__v_pkbb16(*(uint16x4_t*)&vY, *(uint16x4_t*)&vX);
      |                                                                            ^~~~~~~~~~~~~~~~
/opencv/3rdparty/ndsrvp/src/warpAffine.cpp:58:95: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
   58 |                             *(uint16x4_t*)(xy + x1 * 2) = __nds__v_pkbb16(*(uint16x4_t*)&vY, *(uint16x4_t*)&vX);
      |                                                                                               ^~~~~~~~~~~~~~~~
/opencv/3rdparty/ndsrvp/src/warpAffine.cpp:82:76: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
   82 |                             *(uint16x4_t*)(xy + x1 * 2) = __nds__v_pkbb16(*(uint16x4_t*)&vy, *(uint16x4_t*)&vx);
      |                                                                            ^~~~~~~~~~~~~~~~
/opencv/3rdparty/ndsrvp/src/warpAffine.cpp:82:95: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
   82 |                             *(uint16x4_t*)(xy + x1 * 2) = __nds__v_pkbb16(*(uint16x4_t*)&vy, *(uint16x4_t*)&vx);
      |                                                                                               ^~~~~~~~~~~~~~~~

@Junyan721113
Copy link
Contributor Author

Currently there are several warnings regarding strict aliasing in the new HAL library (warpAffine and warpPerspective). Are they serious issues or not? Can we somehow avoid these constructions (maybe with some reinterpret intrinsics)?

/work/opencv/3rdparty/ndsrvp/src/warpAffine.cpp: In member function 'virtual void cv::NdsrvpWarpAffineInvoker::operator()(const cv::Range&) const':
/opencv/3rdparty/ndsrvp/src/warpAffine.cpp:58:76: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
   58 |                             *(uint16x4_t*)(xy + x1 * 2) = __nds__v_pkbb16(*(uint16x4_t*)&vY, *(uint16x4_t*)&vX);
      |                                                                            ^~~~~~~~~~~~~~~~
/opencv/3rdparty/ndsrvp/src/warpAffine.cpp:58:95: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
   58 |                             *(uint16x4_t*)(xy + x1 * 2) = __nds__v_pkbb16(*(uint16x4_t*)&vY, *(uint16x4_t*)&vX);
      |                                                                                               ^~~~~~~~~~~~~~~~
/opencv/3rdparty/ndsrvp/src/warpAffine.cpp:82:76: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
   82 |                             *(uint16x4_t*)(xy + x1 * 2) = __nds__v_pkbb16(*(uint16x4_t*)&vy, *(uint16x4_t*)&vx);
      |                                                                            ^~~~~~~~~~~~~~~~
/opencv/3rdparty/ndsrvp/src/warpAffine.cpp:82:95: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
   82 |                             *(uint16x4_t*)(xy + x1 * 2) = __nds__v_pkbb16(*(uint16x4_t*)&vy, *(uint16x4_t*)&vx);
      |                                                                                               ^~~~~~~~~~~~~~~~

It was a mistake. They've been replaced with safer explicit type conversions.

@Junyan721113
Copy link
Contributor Author

Strict-aliasing warnings have been fixed. Are there any other suggested changes?

@Junyan721113 Junyan721113 force-pushed the rvp_3rdparty branch 2 times, most recently from f5ba8f1 to e69d8b6 Compare May 22, 2024 07:48
Copy link
Contributor

@mshabunin mshabunin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it is possible to open different types of remap functions as different HAL interfaces, called cv_hal_remapNN8u cv_hal_remapNN16s for example? Currently there is only one related inferface called cv_hal_remap32f.

You can implement only one case and return NOT_IMPLEMENTED for others.

Overall looks good to me. We need to fix cmp operation before merge. Refactoring and some cleanup (warpAffine/warpPerspective/headers) can be postponed to the next PR.

@Junyan721113
Copy link
Contributor Author

Looks like it doesn't work - tests are failing now 🙁 . Commenting cmp functions fixes them.

I've made a logical mistake. Fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants