3rdparty: NDSRVP - A New 3rdparty Library with Optimizations Based on RISC-V P Extension v0.5.2 - Part 1: Basic Functions #25167

Junyan721113 · 2024-03-06T10:07:09Z

Summary

Previous context

From PR #24556:

As you wrote, the P-extension differs from RVV thus can not be easily implemented via Universal Intrinsics mechanism, but there is another HAL mechanism for lower-level CPU optimizations which is used by the Carotene library on ARM platforms. I suggest moving all non-dnn code to similar third-party component. For example, FAST algorithm should allow such optimization-shortcut: see https://github.com/opencv/opencv/blob/4.x/modules/features2d/src/hal_replacement.hpp
Reference documentation is here:

https://docs.opencv.org/4.x/d1/d1b/group__core__hal__interface.html

https://docs.opencv.org/4.x/dd/d8b/group__imgproc__hal__interface.html

https://docs.opencv.org/4.x/db/d47/group__features2d__hal__interface.html

Carotene library is turned on here:

opencv/CMakeLists.txt

Lines 906 to 911 in 8bbf08f

if(WITH_CAROTENE)

ocv_debug_message(STATUS "Enable carotene acceleration")

if(NOT ";${OpenCV_HAL};" MATCHES ";carotene;")

set(OpenCV_HAL "carotene;${OpenCV_HAL}")

endif()

endif()

As a test outside of this PR, A 3rdparty component called ndsrvp is created, containing one of the non-dnn code (integral_SIMD), and it works very well.
All the non-dnn code in this PR have been removed, currently this PR can be focused on dnn optinizations.
This HAL mechanism is quite suitable for rvp optimizations, all the non-dnn code is expected to be moved into ndsrvp soon.

Progress

Part 1 (This PR)

Part 2 (Next PR)

Rough Estimate. Todo List May Change.

Core
ImgProc
smaller remap HAL interface
AdaptiveThreshold
BoxFilter
Canny
Convert
Filter
GaussianBlur
MedianBlur
Morph
Pyrdown
Resize
Scharr
SepFilter
Sobel
Features2D
FAST

Performance Tests

The optimization does not contain floating point opreations.

Absolute Difference

Geometric mean (ms)

Name of Test	opencv perf core Absdiff	opencv perf core Absdiff	opencv perf core Absdiff vs opencv perf core Absdiff (x-factor)
Absdiff::OCL_AbsDiffFixture::(640x480, 8UC1)	23.104	5.972	3.87
Absdiff::OCL_AbsDiffFixture::(640x480, 32FC1)	39.500	40.830	0.97
Absdiff::OCL_AbsDiffFixture::(640x480, 8UC3)	69.155	15.051	4.59
Absdiff::OCL_AbsDiffFixture::(640x480, 32FC3)	118.715	120.509	0.99
Absdiff::OCL_AbsDiffFixture::(640x480, 8UC4)	93.001	19.770	4.70
Absdiff::OCL_AbsDiffFixture::(640x480, 32FC4)	161.136	160.791	1.00
Absdiff::OCL_AbsDiffFixture::(1280x720, 8UC1)	69.211	15.140	4.57
Absdiff::OCL_AbsDiffFixture::(1280x720, 32FC1)	118.762	119.263	1.00
Absdiff::OCL_AbsDiffFixture::(1280x720, 8UC3)	212.414	44.692	4.75
Absdiff::OCL_AbsDiffFixture::(1280x720, 32FC3)	367.512	366.569	1.00
Absdiff::OCL_AbsDiffFixture::(1280x720, 8UC4)	285.337	59.708	4.78
Absdiff::OCL_AbsDiffFixture::(1280x720, 32FC4)	490.395	491.118	1.00
Absdiff::OCL_AbsDiffFixture::(1920x1080, 8UC1)	158.827	33.462	4.75
Absdiff::OCL_AbsDiffFixture::(1920x1080, 32FC1)	273.503	273.668	1.00
Absdiff::OCL_AbsDiffFixture::(1920x1080, 8UC3)	484.175	100.520	4.82
Absdiff::OCL_AbsDiffFixture::(1920x1080, 32FC3)	828.758	829.689	1.00
Absdiff::OCL_AbsDiffFixture::(1920x1080, 8UC4)	648.592	137.195	4.73
Absdiff::OCL_AbsDiffFixture::(1920x1080, 32FC4)	1116.755	1109.587	1.01
Absdiff::OCL_AbsDiffFixture::(3840x2160, 8UC1)	648.715	134.875	4.81
Absdiff::OCL_AbsDiffFixture::(3840x2160, 32FC1)	1115.939	1113.818	1.00
Absdiff::OCL_AbsDiffFixture::(3840x2160, 8UC3)	1944.791	413.420	4.70
Absdiff::OCL_AbsDiffFixture::(3840x2160, 32FC3)	3354.193	3324.672	1.01
Absdiff::OCL_AbsDiffFixture::(3840x2160, 8UC4)	2594.585	553.486	4.69
Absdiff::OCL_AbsDiffFixture::(3840x2160, 32FC4)	4473.543	4438.453	1.01

Bitwise Operation

Geometric mean (ms)

Name of Test	opencv perf core Bit	opencv perf core Bit	opencv perf core Bit vs opencv perf core Bit (x-factor)
Bitwise_and::OCL_BitwiseAndFixture::(640x480, 8UC1)	22.542	4.971	4.53
Bitwise_and::OCL_BitwiseAndFixture::(640x480, 32FC1)	90.210	19.917	4.53
Bitwise_and::OCL_BitwiseAndFixture::(640x480, 8UC3)	68.429	15.037	4.55
Bitwise_and::OCL_BitwiseAndFixture::(640x480, 32FC3)	280.168	59.239	4.73
Bitwise_and::OCL_BitwiseAndFixture::(640x480, 8UC4)	90.565	19.735	4.59
Bitwise_and::OCL_BitwiseAndFixture::(640x480, 32FC4)	374.695	79.257	4.73
Bitwise_and::OCL_BitwiseAndFixture::(1280x720, 8UC1)	67.824	14.873	4.56
Bitwise_and::OCL_BitwiseAndFixture::(1280x720, 32FC1)	279.514	59.232	4.72
Bitwise_and::OCL_BitwiseAndFixture::(1280x720, 8UC3)	208.337	44.234	4.71
Bitwise_and::OCL_BitwiseAndFixture::(1280x720, 32FC3)	851.211	182.522	4.66
Bitwise_and::OCL_BitwiseAndFixture::(1280x720, 8UC4)	279.529	59.095	4.73
Bitwise_and::OCL_BitwiseAndFixture::(1280x720, 32FC4)	1132.065	244.877	4.62
Bitwise_and::OCL_BitwiseAndFixture::(1920x1080, 8UC1)	155.685	33.078	4.71
Bitwise_and::OCL_BitwiseAndFixture::(1920x1080, 32FC1)	635.253	137.482	4.62
Bitwise_and::OCL_BitwiseAndFixture::(1920x1080, 8UC3)	474.494	100.166	4.74
Bitwise_and::OCL_BitwiseAndFixture::(1920x1080, 32FC3)	1907.340	412.841	4.62
Bitwise_and::OCL_BitwiseAndFixture::(1920x1080, 8UC4)	635.538	134.544	4.72
Bitwise_and::OCL_BitwiseAndFixture::(1920x1080, 32FC4)	2552.666	556.397	4.59
Bitwise_and::OCL_BitwiseAndFixture::(3840x2160, 8UC1)	634.736	136.355	4.66
Bitwise_and::OCL_BitwiseAndFixture::(3840x2160, 32FC1)	2548.283	561.827	4.54
Bitwise_and::OCL_BitwiseAndFixture::(3840x2160, 8UC3)	1911.454	421.571	4.53
Bitwise_and::OCL_BitwiseAndFixture::(3840x2160, 32FC3)	7663.803	1677.289	4.57
Bitwise_and::OCL_BitwiseAndFixture::(3840x2160, 8UC4)	2543.983	562.780	4.52
Bitwise_and::OCL_BitwiseAndFixture::(3840x2160, 32FC4)	10211.693	2237.393	4.56
Bitwise_not::OCL_BitwiseNotFixture::(640x480, 8UC1)	22.341	4.811	4.64
Bitwise_not::OCL_BitwiseNotFixture::(640x480, 32FC1)	89.975	19.288	4.66
Bitwise_not::OCL_BitwiseNotFixture::(640x480, 8UC3)	67.237	14.643	4.59
Bitwise_not::OCL_BitwiseNotFixture::(640x480, 32FC3)	276.324	58.609	4.71
Bitwise_not::OCL_BitwiseNotFixture::(640x480, 8UC4)	89.587	19.554	4.58
Bitwise_not::OCL_BitwiseNotFixture::(640x480, 32FC4)	370.986	77.136	4.81
Bitwise_not::OCL_BitwiseNotFixture::(1280x720, 8UC1)	67.227	14.541	4.62
Bitwise_not::OCL_BitwiseNotFixture::(1280x720, 32FC1)	276.357	58.076	4.76
Bitwise_not::OCL_BitwiseNotFixture::(1280x720, 8UC3)	206.752	43.376	4.77
Bitwise_not::OCL_BitwiseNotFixture::(1280x720, 32FC3)	841.638	177.787	4.73
Bitwise_not::OCL_BitwiseNotFixture::(1280x720, 8UC4)	276.773	57.784	4.79
Bitwise_not::OCL_BitwiseNotFixture::(1280x720, 32FC4)	1127.740	237.472	4.75
Bitwise_not::OCL_BitwiseNotFixture::(1920x1080, 8UC1)	153.808	32.531	4.73
Bitwise_not::OCL_BitwiseNotFixture::(1920x1080, 32FC1)	627.765	129.990	4.83
Bitwise_not::OCL_BitwiseNotFixture::(1920x1080, 8UC3)	469.799	98.249	4.78
Bitwise_not::OCL_BitwiseNotFixture::(1920x1080, 32FC3)	1893.591	403.694	4.69
Bitwise_not::OCL_BitwiseNotFixture::(1920x1080, 8UC4)	627.724	129.962	4.83
Bitwise_not::OCL_BitwiseNotFixture::(1920x1080, 32FC4)	2529.967	540.744	4.68
Bitwise_not::OCL_BitwiseNotFixture::(3840x2160, 8UC1)	628.089	130.277	4.82
Bitwise_not::OCL_BitwiseNotFixture::(3840x2160, 32FC1)	2521.817	540.146	4.67
Bitwise_not::OCL_BitwiseNotFixture::(3840x2160, 8UC3)	1905.004	404.704	4.71
Bitwise_not::OCL_BitwiseNotFixture::(3840x2160, 32FC3)	7567.971	1627.898	4.65
Bitwise_not::OCL_BitwiseNotFixture::(3840x2160, 8UC4)	2531.476	540.181	4.69
Bitwise_not::OCL_BitwiseNotFixture::(3840x2160, 32FC4)	10075.594	2181.654	4.62
Bitwise_or::OCL_BitwiseOrFixture::(640x480, 8UC1)	22.566	5.076	4.45
Bitwise_or::OCL_BitwiseOrFixture::(640x480, 32FC1)	90.391	19.928	4.54
Bitwise_or::OCL_BitwiseOrFixture::(640x480, 8UC3)	67.758	14.740	4.60
Bitwise_or::OCL_BitwiseOrFixture::(640x480, 32FC3)	279.253	59.844	4.67
Bitwise_or::OCL_BitwiseOrFixture::(640x480, 8UC4)	90.296	19.802	4.56
Bitwise_or::OCL_BitwiseOrFixture::(640x480, 32FC4)	373.972	79.815	4.69
Bitwise_or::OCL_BitwiseOrFixture::(1280x720, 8UC1)	67.815	14.865	4.56
Bitwise_or::OCL_BitwiseOrFixture::(1280x720, 32FC1)	279.398	60.054	4.65
Bitwise_or::OCL_BitwiseOrFixture::(1280x720, 8UC3)	208.643	45.043	4.63
Bitwise_or::OCL_BitwiseOrFixture::(1280x720, 32FC3)	850.042	180.985	4.70
Bitwise_or::OCL_BitwiseOrFixture::(1280x720, 8UC4)	279.363	60.385	4.63
Bitwise_or::OCL_BitwiseOrFixture::(1280x720, 32FC4)	1134.858	243.062	4.67
Bitwise_or::OCL_BitwiseOrFixture::(1920x1080, 8UC1)	155.212	33.155	4.68
Bitwise_or::OCL_BitwiseOrFixture::(1920x1080, 32FC1)	634.985	134.911	4.71
Bitwise_or::OCL_BitwiseOrFixture::(1920x1080, 8UC3)	474.648	100.407	4.73
Bitwise_or::OCL_BitwiseOrFixture::(1920x1080, 32FC3)	1912.049	414.184	4.62
Bitwise_or::OCL_BitwiseOrFixture::(1920x1080, 8UC4)	635.252	132.587	4.79
Bitwise_or::OCL_BitwiseOrFixture::(1920x1080, 32FC4)	2544.471	560.737	4.54
Bitwise_or::OCL_BitwiseOrFixture::(3840x2160, 8UC1)	634.574	134.966	4.70
Bitwise_or::OCL_BitwiseOrFixture::(3840x2160, 32FC1)	2545.129	561.498	4.53
Bitwise_or::OCL_BitwiseOrFixture::(3840x2160, 8UC3)	1910.900	419.365	4.56
Bitwise_or::OCL_BitwiseOrFixture::(3840x2160, 32FC3)	7662.603	1685.812	4.55
Bitwise_or::OCL_BitwiseOrFixture::(3840x2160, 8UC4)	2548.971	560.787	4.55
Bitwise_or::OCL_BitwiseOrFixture::(3840x2160, 32FC4)	10201.407	2237.552	4.56
Bitwise_xor::OCL_BitwiseXorFixture::(640x480, 8UC1)	22.718	4.961	4.58
Bitwise_xor::OCL_BitwiseXorFixture::(640x480, 32FC1)	91.496	19.831	4.61
Bitwise_xor::OCL_BitwiseXorFixture::(640x480, 8UC3)	67.910	15.151	4.48
Bitwise_xor::OCL_BitwiseXorFixture::(640x480, 32FC3)	279.612	59.792	4.68
Bitwise_xor::OCL_BitwiseXorFixture::(640x480, 8UC4)	91.073	19.853	4.59
Bitwise_xor::OCL_BitwiseXorFixture::(640x480, 32FC4)	374.641	79.155	4.73
Bitwise_xor::OCL_BitwiseXorFixture::(1280x720, 8UC1)	67.704	15.008	4.51
Bitwise_xor::OCL_BitwiseXorFixture::(1280x720, 32FC1)	279.229	60.088	4.65
Bitwise_xor::OCL_BitwiseXorFixture::(1280x720, 8UC3)	208.156	44.426	4.69
Bitwise_xor::OCL_BitwiseXorFixture::(1280x720, 32FC3)	849.501	180.848	4.70
Bitwise_xor::OCL_BitwiseXorFixture::(1280x720, 8UC4)	279.642	59.728	4.68
Bitwise_xor::OCL_BitwiseXorFixture::(1280x720, 32FC4)	1129.826	242.880	4.65
Bitwise_xor::OCL_BitwiseXorFixture::(1920x1080, 8UC1)	155.585	33.354	4.66
Bitwise_xor::OCL_BitwiseXorFixture::(1920x1080, 32FC1)	634.090	134.995	4.70
Bitwise_xor::OCL_BitwiseXorFixture::(1920x1080, 8UC3)	474.931	99.598	4.77
Bitwise_xor::OCL_BitwiseXorFixture::(1920x1080, 32FC3)	1910.519	413.138	4.62
Bitwise_xor::OCL_BitwiseXorFixture::(1920x1080, 8UC4)	635.026	135.155	4.70
Bitwise_xor::OCL_BitwiseXorFixture::(1920x1080, 32FC4)	2560.167	560.838	4.56
Bitwise_xor::OCL_BitwiseXorFixture::(3840x2160, 8UC1)	634.893	134.883	4.71
Bitwise_xor::OCL_BitwiseXorFixture::(3840x2160, 32FC1)	2548.166	560.831	4.54
Bitwise_xor::OCL_BitwiseXorFixture::(3840x2160, 8UC3)	1911.392	419.816	4.55
Bitwise_xor::OCL_BitwiseXorFixture::(3840x2160, 32FC3)	7646.634	1677.988	4.56
Bitwise_xor::OCL_BitwiseXorFixture::(3840x2160, 8UC4)	2560.637	560.805	4.57
Bitwise_xor::OCL_BitwiseXorFixture::(3840x2160, 32FC4)	10227.044	2249.458	4.55

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

I agree to contribute to the project under Apache 2 License.
To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
The PR is proposed to the proper branch
There is a reference to the original bug report and related work
There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
The feature is well documented and sample code can be built with the project CMake

asmorkalov · 2024-03-06T10:49:59Z

@Junyan721113 Could you add more details on hardware configuration and how to reproduce the result?

Junyan721113 · 2024-03-06T15:56:01Z

@Junyan721113 Could you add more details on hardware configuration and how to reproduce the result?

No problem, here are the details for accuracy and performance tests.

RISC-V P Extension v0.5.2

Env

export RISCV=/opt/andes
export PATH=$PATH:/opt/andes/bin

Toolchain

Prebuilt Releases: Andes-Development-Kit

Suggested Version: v5_1_1

nds-gnu-toolchain

./build_linux_toolchain.sh

TARGET=riscv64-linux
PREFIX=/opt/andes
ARCH=rv64imafdcxandes
ABI=lp64d
CPU=andes-25-series
XLEN=64
BUILD=`pwd`/build-nds64le-linux-glibc-v5d

Qemu

qemu

shell ./build

../configure --prefix=/opt/andes --target-list=riscv32-linux-user,riscv64-linux-user --disable-werror --static

Board

The development board used for performance tests is TinkerV with Andes AX45.

Upload the installed toolchain's sysroot at /opt/andes/sysroot, or the prebuilt releases above.

/etc/ld.so.conf

include /etc/ld.so.conf.d/*.conf
/path/to/the/sysroot/library

shell

ldconfig -v

After that the sysroot library should appear in the result.

OpenCV Test

shell ./build

cmake -D CMAKE_BUILD_TYPE=Debug -D CMAKE_INSTALL_PREFIX=/opt/andes -D BUILD_SHARED_LIBS=OFF -D CMAKE_TOOLCHAIN_FILE=../platforms/linux/riscv64-andes-gcc.toolchain.cmake ..

Qemu

shell ./build/bin

qemu-riscv64 -cpu andes-ax25 -L /opt/andes/sysroot opencv_test_core

Board

Directly upload and run the test, and it would perform properly.

Junyan721113 · 2024-03-20T07:44:50Z

Considering the Todo List of this PR might be too long, would it be better to divide this PR into smaller ones?

mshabunin · 2024-03-20T08:31:46Z

@Junyan721113 , you can finalize current state more or less (HAL integration, several core functions implementation). And extend supported functions list in future PRs.

Junyan721113 · 2024-04-22T20:44:22Z

Considering the relation between HAL functions, this PR might be ready for review now.
The optimizations mainly contains the following functions:

The rest of HAL functions are related to convolution, thus left for another PR.

Junyan721113 · 2024-04-22T21:23:01Z

Besides, I've noticed that some optimizations could be better if several functions required is also opened as HAL interface, such as:

AutoBuffer required by resize and Pyrdown
These functions are not necessary to be HAL opened for optimizations, since they could be separately implemented. However, due to the weakness of RISC-V P extension, they may not be optimized by RVP and could be reused.
remap() required by warpAffine and warpProspective
Although it can be reused in warpAffine and warpProspective, the remap functions without Floating-Point Operations can be optimized by RVP. However, if decided to optimize remap(), its implementation (such as static RemapNNFunc nn_tab[2][8]) would have so much coupling that every function must to be reimplemented by RVP. Maybe it is possible to open different types of remap functions as different HAL interfaces, called cv_hal_remapNN8u cv_hal_remapNN16s for example? Currently there is only one related inferface called cv_hal_remap32f.

Meanwhile, I wonder how will the HAL inferface change in the coming OpenCV 5.0. The changes may affect the next PR related to this 3rdparty library.

asmorkalov · 2024-04-23T05:57:15Z

@Junyan721113 Thanks a lot for the contribution!

AutoBuffer may be achieved by simple combination of new and malloca. Not sure, if we need expose it.
Remap was added to HAL interface a week ago: New HAL API for remap #25399. You are welcome to contribute RISC-V implementation.

Junyan721113 · 2024-04-23T16:36:49Z

@Junyan721113 Thanks a lot for the contribution!

AutoBuffer may be achieved by simple combination of new and malloca. Not sure, if we need expose it.

Remap was added to HAL interface a week ago: New HAL API for remap #25399. You are welcome to contribute RISC-V implementation.

Thank you! This helps me a lot.

Junyan721113 · 2024-04-24T07:45:14Z

@Junyan721113 Thanks a lot for the contribution!

AutoBuffer may be achieved by simple combination of new and malloca. Not sure, if we need expose it.

Remap was added to HAL interface a week ago: New HAL API for remap #25399. You are welcome to contribute RISC-V implementation.

The mentioned PR contains cv_hal_remap32f, how about adding cv_hal_remap8u cv_hal_remap8s cv_hal_remap16u cv_hal_remap16s? Float32 interface might not be helpful to RVP.

@Junyan721113 , you can finalize current state more or less (HAL integration, several core functions implementation). And extend supported functions list in future PRs.

Meanwhile, the to-do list of "Part 1" is finished, other new features will be in "Part 2". This PR is ready for review now.

asmorkalov · 2024-04-24T08:47:53Z

32f stands to mapx and mapy are floats, but bot fixed point. source and destination may be any OpenCV supported type. Sorry for the confusion.

mshabunin · 2024-05-02T19:41:30Z

Currently there are several warnings regarding strict aliasing in the new HAL library (warpAffine and warpPerspective). Are they serious issues or not? Can we somehow avoid these constructions (maybe with some reinterpret intrinsics)?

/work/opencv/3rdparty/ndsrvp/src/warpAffine.cpp: In member function 'virtual void cv::NdsrvpWarpAffineInvoker::operator()(const cv::Range&) const':
/opencv/3rdparty/ndsrvp/src/warpAffine.cpp:58:76: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
   58 |                             *(uint16x4_t*)(xy + x1 * 2) = __nds__v_pkbb16(*(uint16x4_t*)&vY, *(uint16x4_t*)&vX);
      |                                                                            ^~~~~~~~~~~~~~~~
/opencv/3rdparty/ndsrvp/src/warpAffine.cpp:58:95: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
   58 |                             *(uint16x4_t*)(xy + x1 * 2) = __nds__v_pkbb16(*(uint16x4_t*)&vY, *(uint16x4_t*)&vX);
      |                                                                                               ^~~~~~~~~~~~~~~~
/opencv/3rdparty/ndsrvp/src/warpAffine.cpp:82:76: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
   82 |                             *(uint16x4_t*)(xy + x1 * 2) = __nds__v_pkbb16(*(uint16x4_t*)&vy, *(uint16x4_t*)&vx);
      |                                                                            ^~~~~~~~~~~~~~~~
/opencv/3rdparty/ndsrvp/src/warpAffine.cpp:82:95: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
   82 |                             *(uint16x4_t*)(xy + x1 * 2) = __nds__v_pkbb16(*(uint16x4_t*)&vy, *(uint16x4_t*)&vx);
      |                                                                                               ^~~~~~~~~~~~~~~~

Junyan721113 · 2024-05-08T08:06:33Z

Currently there are several warnings regarding strict aliasing in the new HAL library (warpAffine and warpPerspective). Are they serious issues or not? Can we somehow avoid these constructions (maybe with some reinterpret intrinsics)?

/work/opencv/3rdparty/ndsrvp/src/warpAffine.cpp: In member function 'virtual void cv::NdsrvpWarpAffineInvoker::operator()(const cv::Range&) const':
/opencv/3rdparty/ndsrvp/src/warpAffine.cpp:58:76: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
   58 |                             *(uint16x4_t*)(xy + x1 * 2) = __nds__v_pkbb16(*(uint16x4_t*)&vY, *(uint16x4_t*)&vX);
      |                                                                            ^~~~~~~~~~~~~~~~
/opencv/3rdparty/ndsrvp/src/warpAffine.cpp:58:95: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
   58 |                             *(uint16x4_t*)(xy + x1 * 2) = __nds__v_pkbb16(*(uint16x4_t*)&vY, *(uint16x4_t*)&vX);
      |                                                                                               ^~~~~~~~~~~~~~~~
/opencv/3rdparty/ndsrvp/src/warpAffine.cpp:82:76: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
   82 |                             *(uint16x4_t*)(xy + x1 * 2) = __nds__v_pkbb16(*(uint16x4_t*)&vy, *(uint16x4_t*)&vx);
      |                                                                            ^~~~~~~~~~~~~~~~
/opencv/3rdparty/ndsrvp/src/warpAffine.cpp:82:95: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
   82 |                             *(uint16x4_t*)(xy + x1 * 2) = __nds__v_pkbb16(*(uint16x4_t*)&vy, *(uint16x4_t*)&vx);
      |                                                                                               ^~~~~~~~~~~~~~~~

It was a mistake. They've been replaced with safer explicit type conversions.

Junyan721113 · 2024-05-15T07:56:28Z

Strict-aliasing warnings have been fixed. Are there any other suggested changes?

3rdparty/ndsrvp/src/threshold.cpp

3rdparty/ndsrvp/include/imgproc.hpp

3rdparty/ndsrvp/include/core.hpp

3rdparty/ndsrvp/ndsrvp_hal.hpp

3rdparty/ndsrvp/include/core.hpp

3rdparty/ndsrvp/src/warpPerspective.cpp

mshabunin

Maybe it is possible to open different types of remap functions as different HAL interfaces, called cv_hal_remapNN8u cv_hal_remapNN16s for example? Currently there is only one related inferface called cv_hal_remap32f.

You can implement only one case and return NOT_IMPLEMENTED for others.

Overall looks good to me. We need to fix cmp operation before merge. Refactoring and some cleanup (warpAffine/warpPerspective/headers) can be postponed to the next PR.

platforms/linux/riscv64-andes-gcc.toolchain.cmake

3rdparty/ndsrvp/include/core.hpp

Junyan721113 · 2024-05-27T22:43:04Z

Looks like it doesn't work - tests are failing now 🙁 . Commenting cmp functions fixes them.

I've made a logical mistake. Fixed.

asmorkalov added optimization category: 3rdparty platform: riscv labels Mar 6, 2024

asmorkalov requested a review from mshabunin March 7, 2024 06:08

Junyan721113 force-pushed the rvp_3rdparty branch from ec66a3e to 9d1a0fb Compare March 13, 2024 07:10

Junyan721113 force-pushed the rvp_3rdparty branch 2 times, most recently from e41c22b to 73130d0 Compare April 22, 2024 21:18

Junyan721113 changed the title ~~3rdparty: NDSRVP - A New 3rdparty Library with Optimizations Based on RISC-V P Extension v0.5.2~~ 3rdparty: NDSRVP - A New 3rdparty Library with Optimizations Based on RISC-V P Extension v0.5.2 - Part 1: Basic Functions Apr 22, 2024

Junyan721113 marked this pull request as ready for review April 22, 2024 21:19

asmorkalov added this to the 4.10.0 milestone Apr 23, 2024

Junyan721113 force-pushed the rvp_3rdparty branch from 73130d0 to d2b1f6a Compare May 8, 2024 07:49

mshabunin reviewed May 15, 2024

View reviewed changes

Junyan721113 force-pushed the rvp_3rdparty branch 2 times, most recently from f5ba8f1 to e69d8b6 Compare May 22, 2024 07:48

mshabunin reviewed May 24, 2024

View reviewed changes

platforms/linux/riscv64-andes-gcc.toolchain.cmake Show resolved Hide resolved

3rdparty/ndsrvp/include/core.hpp Outdated Show resolved Hide resolved

feat: RVP 3rdparty Optimizations based on HAL

f77b05a

Junyan721113 force-pushed the rvp_3rdparty branch from e69d8b6 to f77b05a Compare May 27, 2024 22:39

mshabunin approved these changes May 28, 2024

View reviewed changes

asmorkalov assigned mshabunin May 28, 2024

asmorkalov merged commit d9421ac into opencv:4.x May 28, 2024

mshabunin mentioned this pull request Jun 14, 2024

Merge 4.x -> 5.x #25745

Merged

Junyan721113 mentioned this pull request Jun 19, 2024

3rdparty: NDSRVP - Part 1.5: New Interfaces #25786

Merged

9 tasks

Junyan721113 mentioned this pull request Aug 30, 2024

3rdparty: NDSRVP - Part 2: Filter #26088

Merged

6 tasks

	if(WITH_CAROTENE)
	ocv_debug_message(STATUS "Enable carotene acceleration")
	if(NOT ";${OpenCV_HAL};" MATCHES ";carotene;")
	set(OpenCV_HAL "carotene;${OpenCV_HAL}")
	endif()
	endif()

Uh oh!

3rdparty: NDSRVP - A New 3rdparty Library with Optimizations Based on RISC-V P Extension v0.5.2 - Part 1: Basic Functions #25167

3rdparty: NDSRVP - A New 3rdparty Library with Optimizations Based on RISC-V P Extension v0.5.2 - Part 1: Basic Functions #25167

Uh oh!

Conversation

Junyan721113 commented Mar 6, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Previous context

Progress

Part 1 (This PR)

Part 2 (Next PR)

Performance Tests

Pull Request Readiness Checklist

Uh oh!

asmorkalov commented Mar 6, 2024

Uh oh!

Junyan721113 commented Mar 6, 2024

RISC-V P Extension v0.5.2

Env

Toolchain

Qemu

Board

OpenCV Test

Qemu

Board

Uh oh!

Junyan721113 commented Mar 20, 2024

Uh oh!

mshabunin commented Mar 20, 2024

Uh oh!

Junyan721113 commented Apr 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Junyan721113 commented Apr 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

asmorkalov commented Apr 23, 2024

Uh oh!

Junyan721113 commented Apr 23, 2024

Uh oh!

Junyan721113 commented Apr 24, 2024

Uh oh!

asmorkalov commented Apr 24, 2024

Uh oh!

mshabunin commented May 2, 2024

Uh oh!

Junyan721113 commented May 8, 2024

Uh oh!

Junyan721113 commented May 15, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mshabunin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Junyan721113 commented May 27, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Junyan721113 commented Mar 6, 2024 •

edited

Loading

Junyan721113 commented Apr 22, 2024 •

edited

Loading

Junyan721113 commented Apr 22, 2024 •

edited

Loading