Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@mshabunin
Copy link
Contributor

In 0.7.1 RVV implementation multiplication had performed unnecessary operations unpacking and then packing values back again. Removed this part. Performance on LicheePi 4A with Xuantie 2.8.0 toolchain have increased 5.82 ms -> 1.06 ms (1920x1080 / CV_8UC1 - BinaryOpTest.multiply/20). Accuracy tests for core and imgproc pass with the same failures as before the fix.

vuint16m2_t res = vwmulu_vv_u16m2(a, b, 16);

// following calls are not needed - they unpack values and pack them back again
vuint16m1_t c = vget_v_u16m2_u16m1(res, 0);
vuint16m1_t d = vget_v_u16m2_u16m1(res, 1);
vuint16m2_t im = vundefined_u16m2();
im = vset_v_u16m1_u16m2(im, 0, c);
im = vset_v_u16m1_u16m2(im, 1, d);
// emd - we can pass 'res' directly to 'vnclipu'

vuint8m1_t fin = vnclipu_wx_u8m1(im, 0, 16);

Note: RVV-scalable implementation already uses shortened code

Copy link
Contributor

@asmorkalov asmorkalov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@asmorkalov asmorkalov merged commit 8ed0319 into opencv:4.x Jan 29, 2024
@mshabunin mshabunin deleted the fix-rvv07-mul branch January 29, 2024 09:53
This was referenced Feb 3, 2024
@dkurt dkurt added this to the 4.10.0 milestone Apr 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants