-
-
Notifications
You must be signed in to change notification settings - Fork 56.3k
Speed up HLS2RGB. #22441
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up HLS2RGB. #22441
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Speedup of processing of invalid input is out of scope.
Slowdown regressions of processing of valid input should be avoided.
What is performance numbers?
Right, it is out of scope but it is an easy CPU bomb: in my case, a 1x5 image took 3 minutes to process because it had entries like All those while loops do is get the remainder of a division by 6. We might as well use the official function for that (and transform the while loop into conditions). The SIMD implementation does that too. |
@vrabaud, thank you for the patch! I agree with both you and @alalek, there should be protection against attacks or unintentional bad usage, but it should preferably not slowdown the normal use cases. fmodf means function call, and it can be quite heavy, as it may include some checks for corner cases etc. May I suggest to modify the patch as:
I found that the generated code is very good, much better than what we had before. |
Thx @vpisarev , I applied your patch and also added a fix so that NaN does not trigger out of bound reads. |
We actually have another problem for big h values: cvFloor will SIGILL
|
@vrabaud, this problem with cvFloor() is quite serious actually. What is the platform (Intel, ARM, ???), OS and the compiler where this problem is reproduced? What are the particular values that you pass to cvFloor()? |
@vrabaud friendly reminder. |
Ok, I updated my PR
We can have a second path if h is smaller than max int like what @vpisarev mentioned but it is just replacing an fmod by an |
@vrabaud, @asmorkalov, the solution is robust, but I don't like that it seriously affects the speed in normal cases, compared to the solution that I suggested. Also, the solution is very local to HLS2RGB, it does not solve the problem of SIGILL when cvFloor() is given NaN or +/-Inf or another very big value. I suggest to solve the problem once and for all:
First, we need to see if __builtin_floorf (__builtin_ceilf, __builtin_lrintf) produce exception. If not, let's use them for cvFloor/cvCeil/cvRound implementation. This is what the current 4.x branch does. If not, let's check if
|
In case of huge (and probably invalid) input, make sure we do not rely only on the while loops for truncation.
Sorry for the delay, I applied @vpisarev 's suggestion and removed the NaN handling case (the throwing was just due to the sanitizer throwing when shrinking doing a float to int cast). |
@alalek, should we finally merge it? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hopefully there are no significant perf regressions for normal input.
In case of huge (and probably invalid) input, make sure we do not rely only on the while loops for truncation. ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch
In case of huge (and probably invalid) input, make sure we do not rely only on the while loops for truncation. ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch
In case of huge (and probably invalid) input, make sure we do not
rely only on the while loops for truncation.
Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request