Speed up HLS2RGB. #22441

vrabaud · 2022-08-29T12:37:11Z

In case of huge (and probably invalid) input, make sure we do not
rely only on the while loops for truncation.

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

I agree to contribute to the project under Apache 2 License.
To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
The PR is proposed to the proper branch

alalek

Speedup of processing of invalid input is out of scope.
Slowdown regressions of processing of valid input should be avoided.

What is performance numbers?

vrabaud · 2022-08-30T09:07:05Z

Right, it is out of scope but it is an easy CPU bomb: in my case, a 1x5 image took 3 minutes to process because it had entries like 1e40.

All those while loops do is get the remainder of a division by 6. We might as well use the official function for that (and transform the while loop into conditions). The SIMD implementation does that too.

vpisarev · 2022-08-31T17:28:43Z

@vrabaud, thank you for the patch! I agree with both you and @alalek, there should be protection against attacks or unintentional bad usage, but it should preferably not slowdown the normal use cases. fmodf means function call, and it can be quite heavy, as it may include some checks for corner cases etc.

May I suggest to modify the patch as:

sector = cvFloor(h);
h -= sector; 
sector %= 6;
sector += sector < 0 ? 6 : 0;

/* ///////// these instructions are not needed anymore,
   we guarantee that sector is within [0, 6) //////////
CV_DbgAssert( 0 <= h && h < 6 );
sector = cvFloor(h);
h -= sector;
*/

I found that the generated code is very good, much better than what we had before. sector %= 6 is computed efficiently without divisions.

vrabaud · 2022-09-01T08:13:52Z

Thx @vpisarev , I applied your patch and also added a fix so that NaN does not trigger out of bound reads.

vrabaud · 2022-09-01T08:29:07Z

We actually have another problem for big h values: cvFloor will SIGILL
We can do:

                h *= hscale;
                if (cvIsNaN(h)) {
                    // Avoid throwing cvFloor computation in the NaN case.
                    b = g = r = h;
                } else {
                    if  (h > std::numeric_limits<int>::max() ||
                         h < std::numeric_limits<int>::min(){
                       // standard C++ floor
                    } else {
                      cvFloor
                    }

vpisarev · 2022-09-02T08:25:09Z

@vrabaud, this problem with cvFloor() is quite serious actually. What is the platform (Intel, ARM, ???), OS and the compiler where this problem is reproduced? What are the particular values that you pass to cvFloor()?

asmorkalov · 2022-09-12T12:16:12Z

@vrabaud friendly reminder.

vrabaud · 2022-09-15T13:08:15Z

Ok, I updated my PR

unify the sector computation because we have the same issue with HSV
the SIGILL was coming from the fuzzer compilation flag because a float bigger than numeric_limits::max() was cast to int in cvFloor (using the C cast not __builtin_floof)
fmodf has to be used first in case we have a float bigger that the max int
the NaN case is treated independently to not trigger sector values out of memory (which would be a security issue)

We can have a second path if h is smaller than max int like what @vpisarev mentioned but it is just replacing an fmod by an %, not much of a gain.

vpisarev · 2022-09-19T12:40:10Z

@vrabaud, @asmorkalov, the solution is robust, but I don't like that it seriously affects the speed in normal cases, compared to the solution that I suggested. Also, the solution is very local to HLS2RGB, it does not solve the problem of SIGILL when cvFloor() is given NaN or +/-Inf or another very big value. I suggest to solve the problem once and for all:

We declare in documentation that cvFloor() (as well as cvCeil, cvRound) produces some platform-dependent integer value when it's given a NaN or argument that is outside of INT_MIN..INT_MAX.
We make sure that if result of cvFloor() (cvCeil, cvRound) is used to compute address or index in LUT, it does not go outside of the proper value range.
We make sure that implementation of cvFloor() (cvCeil, cvRound) is very efficient and that it does not throw any exception when the argument is NaN or is outside of INT_MIN..INT_MAX.

First, we need to see if __builtin_floorf (__builtin_ceilf, __builtin_lrintf) produce exception. If not, let's use them for cvFloor/cvCeil/cvRound implementation. This is what the current 4.x branch does. If not, let's check if (int)x produces exception. If not, let's change implementation of cvFloor(), cvCeil() back to what it was before the latest patches:

int cvFloorf/*_exception_less*/(float x) {
    int i = (int)x;
    return i - (i > x);
}

int cvCeilf/*_exception_less*/(float x) {
    int i = (int)x;
    return i + (i < x);
}

In case of huge (and probably invalid) input, make sure we do not rely only on the while loops for truncation.

vrabaud · 2023-01-19T19:42:13Z

Sorry for the delay, I applied @vpisarev 's suggestion and removed the NaN handling case (the throwing was just due to the sanitizer throwing when shrinking doing a float to int cast).

vpisarev · 2023-02-20T20:19:59Z

@alalek, should we finally merge it?

alalek

Hopefully there are no significant perf regressions for normal input.

modules/imgproc/src/color_hsv.simd.hpp

In case of huge (and probably invalid) input, make sure we do not rely only on the while loops for truncation. ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch

asmorkalov requested a review from vpisarev August 29, 2022 13:18

asmorkalov added optimization category: imgproc labels Aug 29, 2022

alalek requested changes Aug 29, 2022

View reviewed changes

vrabaud force-pushed the hls_while branch from eb6de4d to f5aa8ed Compare August 31, 2022 21:47

vpisarev self-assigned this Sep 2, 2022

vrabaud force-pushed the hls_while branch from 9a137d3 to f6c5ec1 Compare September 15, 2022 12:59

vrabaud added 6 commits January 19, 2023 20:33

Speed up HLS2RGB.

dd5889e

In case of huge (and probably invalid) input, make sure we do not rely only on the while loops for truncation.

Remove while loops.

3fc7791

Fix NaN issue with HLS

e10db54

Apply vpisarev's suggestion.

9c4d183

Unify sector computation.

3abcfe1

Remove NaN handling

ba8416d

vrabaud force-pushed the hls_while branch from f6c5ec1 to ba8416d Compare January 19, 2023 19:34

vrabaud requested a review from alalek January 19, 2023 20:03

vpisarev approved these changes Feb 20, 2023

View reviewed changes

alalek approved these changes Feb 20, 2023

View reviewed changes

modules/imgproc/src/color_hsv.simd.hpp Show resolved Hide resolved

asmorkalov merged commit 8ad8ec6 into opencv:3.4 Mar 7, 2023

asmorkalov mentioned this pull request Apr 20, 2023

(4.x) Merge 3.4 #23523

Merged

asmorkalov mentioned this pull request May 31, 2023

(5.x) Merge 4.x #23718

Merged

vrabaud deleted the hls_while branch July 7, 2023 13:43

Uh oh!

Speed up HLS2RGB. #22441

Speed up HLS2RGB. #22441

Uh oh!

Conversation

vrabaud commented Aug 29, 2022

Pull Request Readiness Checklist

Uh oh!

alalek left a comment

Choose a reason for hiding this comment

Uh oh!

vrabaud commented Aug 30, 2022

Uh oh!

vpisarev commented Aug 31, 2022

Uh oh!

vrabaud commented Sep 1, 2022

Uh oh!

vrabaud commented Sep 1, 2022

Uh oh!

vpisarev commented Sep 2, 2022

Uh oh!

asmorkalov commented Sep 12, 2022

Uh oh!

vrabaud commented Sep 15, 2022

Uh oh!

vpisarev commented Sep 19, 2022

Uh oh!

vrabaud commented Jan 19, 2023

Uh oh!

vpisarev commented Feb 20, 2023

Uh oh!

alalek left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants