Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@chacha21
Copy link
Contributor

@chacha21 chacha21 commented Nov 30, 2023

Implements #24603

Currently, remap() is applied as dst(x, y) <- src(mapX(x, y), mapY(x, y)) It means that the maps must be filled with absolute coordinates.

However, if one wants to remap something according to a displacement field ("warp"), the operation should be dst(x, y) <- src(x+displacementX(x, y), y+displacementY(x, y))

It is trivial to build a mapping from a displacement field, but it is an undesirable overhead for CPU and memory.

This PR implements the feature as an experimental option, through the optional flag WARP_RELATIVE_MAP than can be ORed to the interpolation mode.

Since the xy maps might be const, there is no attempt to add the coordinate offset to those maps, and everything is postponed on-the-fly to the very last coordinate computation before fetching src. Interestingly, this let cv::convertMaps() unchanged since the fractional part of interpolation does not care of the integer coordinate offset.

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

  • I agree to contribute to the project under Apache 2 License.
  • To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
  • The PR is proposed to the proper branch
  • There is a reference to the original bug report and related work
  • There is accuracy test, performance test and test data in opencv_extra repository, if applicable
    Patch to opencv_extra has the same branch name.
  • The feature is well documented and sample code can be built with the project CMake

…24603)

Currently, `remap()` is applied as `dst(x, y) <- src(mapX(x, y), mapY(x, y))`
It means that the maps must be filled with absolute coordinates.

However, if one wants to remap something according to a displacement field ("warp"), the operation should be
`dst(x, y) <- src(x+displacementX(x, y), y+displacementY(x, y))`

It is trivial to build a mapping from a displacement field, but it is an undesirable overhead for CPU and memory.

This PR implements the feature as an experimental option, through the optional flag WARP_RELATIVE_MAP than can be ORed to the interpolation mode.

Since the xy maps might be const, there is no attempt to add the coordinate offset to those maps, and everything is postponed on-the-fly to the very last coordinate computation before fetching `src`.
Interestingly, this let `cv::concertMaps()` unchanged since the fractional part of interpolation does not care of the integer coordinate offset.
restored a disabled parallel_for that was used for debugging
added a check to avoid openvx when WARP_RELATIVE_MAP is used, since there is no implementation
locally on my machine, there is no more performance regression when cv::remap() without WARP_RELATIVE_MAP
using WARP_RELATIVE_MAP always performs better than manually preprocessing the maps from displacement fields to absolute coordinates
added missing Neon in place bin_op implementation
@asmorkalov asmorkalov added this to the 4.10.0 milestone Dec 1, 2023
This reverts commit 32ebbf1.
operator += is not supported  as wide as SSE
avoid operator += for wide intrinsics
use v_add instead of operator+
@asmorkalov
Copy link
Contributor

@vpisarev Friendly reminder.

1 similar comment
@asmorkalov
Copy link
Contributor

@vpisarev Friendly reminder.

Copy link
Contributor

@asmorkalov asmorkalov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the great job! The proposed option is really very useful. I apologize for large delay caused by release and some 5.0 preparation activities. General proposals:

  • Extend accuracy tests. The current test does not cover all touched cases. Also the test scenario looks very simple. As soon as absolute->relative displacement conversion is very simple the test expansion may be done easily.
  • Need to add several corner cases, e.g. absolute coordinate is close to type range
  • There are no performance tests. I propose to patch the existing one as in the first item.

int borderType, const Scalar& _borderValue, const Point& _offset )
{
Size ssize = _src.size(), dsize = _dst.size();
const Point offset = _offset;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to create local variable.

Copy link
Contributor Author

@chacha21 chacha21 Feb 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about performance ? don't you think that keeping a reference will prevent the compiler from optimizing access to offset.x|y in the inner loops ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good question. we need performance test for it ;)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran performance test in the PR and do not see visible effect of additional constant inside implementation. I propose to remove it.

Comment on lines 445 to 452
const ushort* FXY, const void* _wtab, int width ) const
const ushort* FXY, const void* _wtab, int width, const Point& _offset ) const
{
int cn = _src.channels(), x = 0, sstep = (int)_src.step;
Point rel_offset = _offset;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to create local variable.

typedef typename CastOp::rtype T;
typedef typename CastOp::type1 WT;
Size ssize = _src.size(), dsize = _dst.size();
const Point offset = _offset;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need in local variable

typedef typename CastOp::rtype T;
typedef typename CastOp::type1 WT;
Size ssize = _src.size(), dsize = _dst.size();
const Point offset = _offset;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need in local variable

Comment on lines +1332 to +1336
cv::Mat mapRelativeX32F(size, CV_32FC1);
mapRelativeX32F.setTo(cv::Scalar::all(-0.33));

cv::Mat mapRelativeY32F(size, CV_32FC1);
mapRelativeY32F.setTo(cv::Scalar::all(-0.33));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I propose to use different values for x and y to highlight x<->y swaps in code and other related offset issues.

@asmorkalov asmorkalov self-assigned this Feb 5, 2024
- fixed doc to add INTER_NEAREST_EXACT as not supported
- use a bu_ld constant instead of an argument for the OpenCL kernel
- add comment to explain why some propably useless local variable are used
- extend tests. (the covered test cases were previously just copied from original remap tests.)
@chacha21
Copy link
Contributor Author

chacha21 commented Feb 6, 2024

A few comments after the last commit :

  • do we keep the "WARP_RELATIVE_MAP" flag ? Is there a better strategy to enable that code ? Is the name OK ? (I have not mentioned it in the doc yet)
  • using the corner case "absolute coordinate is close to type range" : using some saturate_add would be great, but AFAIK, only OpenCL provides such an operator so far
  • about the "probably useless local variables", their usage will be determined by perf tests, but...
  • ...I don't know how to write and run perf tests. I have never been able to to that on my development machine with Windows7 and broken Python.

@asmorkalov
Copy link
Contributor

@chacha21 Thanks a lot! I added performance test for the new case. Also, please take a look on CI issues, e.g.

/Users/opencv-cn/GHA-OCV-1/_work/opencv/opencv/opencv/modules/imgproc/src/imgwarp.cpp:1353:10: warning: private field 'isRelative' is not used [-Wunused-private-field]

@asmorkalov
Copy link
Contributor

do we keep the "WARP_RELATIVE_MAP" flag ? Is there a better strategy to enable that code ? Is the name OK ? (I have not mentioned it in the doc yet) - Looks good to me.

- fixed typo in comment
- removed dead code
- added WARP_RELATIVE_MAP to doc
@vpisarev
Copy link
Contributor

vpisarev commented Feb 9, 2024

@chacha21, thank you for the contribution!

I like how OpenCL part is implemented. It's conditionally compiled code, and so it does not affect performance of the standard case. But I don't like that in CPU version there are extra conditions inside the innermost loops. And the extra registers needed to hold the pixel grid coordinates. In subsequent versions of OpenCV we would like to optimize remap further, not to slow it down. We want to keep it clean, we want to avoid any unnecessary overhead.

Here is the proposed solution. If "relative" flag is set, a tile of map(s) should be copied to a temporary buffer (probably stack-allocated buffer) and augmented there prior to calling the remap kernels instead of doing it in the remap kernels themselves. Since the offsets for x and y are integers, such method is compatible with both floating-point and the fixed-point representations of the maps.

@opencv-alalek
Copy link
Contributor

But I don't like that in CPU version there are extra conditions inside the innermost loops

There is no such problem because there is template<bool isRelative> in the most critical paths.


to optimize remap further, not to slow it down

Just need to provide performance report for such modifications.
All PRs which provides optimization or modification of implementation should have that report.

@chacha21
Copy link
Contributor Author

chacha21 commented Feb 9, 2024

But I don't like that in CPU version there are extra conditions inside the innermost loops

There is no such problem because there is template<bool isRelative> in the most critical paths.

Exactly, my first commit on this PR was a local bool isRelative and I switched to template version after to keep performance.

to optimize remap further, not to slow it down

Just need to provide performance report for such modifications. All PRs which provides optimization or modification of implementation should have that report.

I will try to run the test and report ASAP (not familiar at all with the procedure)

@vpisarev
Copy link
Contributor

vpisarev commented Feb 9, 2024

@chacha21, yes, I'm probably wrong about performance - I still see some unconditional things like vector registers holding the pixel coordinates. Maybe compiler will optimize it out, but maybe not. Besides, it basically duplicates all the remap kernels for such a very rarely used feature. I'd still suggest to do it externally by copying each tile of maps into a temporary buffer and augmenting it there. The kernels then will stay unchanged.

Copy link
Contributor

@vpisarev vpisarev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please, do mapx & mapy augmentation as a separate preprocessing step (probably, tile-by-tile, to achieve cache and thread locality), not inside interpolation kernels

@chacha21
Copy link
Contributor Author

chacha21 commented Feb 10, 2024

@vpisarev
I understand, but I am not sure that tiling maps to create "augmented" copies will be better.
As far as I understand, the best place would be in an alternative RemapInvoker (perhaps a template<bool isRelative> RemapInvoker) to get rid of the template of the remap kernels.
So the tiling would occur in the virtual void RemapInvoker::operator(), something like that :

Actual :

        int x, y, x1, y1;
        const int buf_size = 1 << 14;
        int brows0 = std::min(128, dst->rows), map_depth = m1->depth();
        int bcols0 = std::min(buf_size/brows0, dst->cols);
        brows0 = std::min(buf_size/bcols0, dst->rows);

        Mat _bufxy(brows0, bcols0, CV_16SC2), _bufa;
        if( !nnfunc )
            _bufa.create(brows0, bcols0, CV_16UC1);

modified :

        int x, y, x1, y1;
        const int buf_size = 1 << 14;
        int brows0 = std::min(128, dst->rows), map_depth = m1->depth();
        int bcols0 = std::min(buf_size/brows0, dst->cols);
        brows0 = std::min(buf_size/bcols0, dst->rows);

        Mat m1AugmentedTile(brows0 , bcols0, m1->type());
        Mat m2AugmentedTile(m2->empty() ? 0 : brows0 , m2->empty() ? 0 : bcols0, m2->type());
        fillByAddingRelativeOffset(m1AugmentedTile, m1);
        fillByAddingRelativeOffset(m2AugmentedTile, m2);
        //then in the code below, use m1AugmentedTile and m2AugmentedTile instead of m1, m2

        Mat _bufxy(brows0, bcols0, CV_16SC2), _bufa;
        if( !nnfunc )
            _bufa.create(brows0, bcols0, CV_16UC1);

As the first step, I just tried to observe the overhead of the allocation of m1AugmentedTile and m2AugmentedTile, with their content copied from m1 and m2 without modification.
brows0 x bcols0 seems a little large for an AutoBuffer, so I relied on a Mat.

And the timing is not good.

original implementation :
1000 x cv::remap((1280x1024)) => 1282.797557ms
1000 x cv::remap((1280x1024)+WARP_RELATIVE_MAP => 1490.316975ms (~+15% you're right it is not negligible)

When allocating m1AugmentedTile/m2AugmentedTile in RemapInvoker::operator() :
1000 x cv::remap((1280x1024)) => 1934.893129ms

The overhead of using m1AugmentedTile/m2AugmentedTile is from the beginning larger than the current proposal for WARP_RELATIVE_MAP.

So I have to admit than WARP_RELATIVE_MAP is not free. But my idea was that the cost of using relative offsets on the fly was still cheaper than creating absolute maps from relative maps before calling remap(). I think it still holds.

What do you think ? Should I try with a stack allocation for m1AugmentedTile/m2AugmentedTile or is it a bad idea ?

Copy link
Contributor

@asmorkalov asmorkalov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

the purpose of the variable was to bring locality but did not show measurable performance improvement
@vpisarev vpisarev self-requested a review February 27, 2024 17:21
@vpisarev
Copy link
Contributor

ok, since @asmorkalov could not reproduce any regressions on his machines, let's merge it in!

@asmorkalov asmorkalov merged commit 5e5a035 into opencv:4.x Feb 28, 2024
@asmorkalov asmorkalov mentioned this pull request Feb 28, 2024
klatism pushed a commit to klatism/opencv that referenced this pull request May 17, 2024
First proposal of cv::remap with relative displacement field (opencv#24603) opencv#24621

Implements opencv#24603

Currently, `remap()` is applied as `dst(x, y) <- src(mapX(x, y), mapY(x, y))` It means that the maps must be filled with absolute coordinates.

However, if one wants to remap something according to a displacement field ("warp"), the operation should be `dst(x, y) <- src(x+displacementX(x, y), y+displacementY(x, y))`

It is trivial to build a mapping from a displacement field, but it is an undesirable overhead for CPU and memory.

This PR implements the feature as an experimental option, through the optional flag WARP_RELATIVE_MAP than can be ORed to the interpolation mode.

Since the xy maps might be const, there is no attempt to add the coordinate offset to those maps, and everything is postponed on-the-fly to the very last coordinate computation before fetching `src`. Interestingly, this let `cv::convertMaps()` unchanged since the fractional part of interpolation does not care of the integer coordinate offset.

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [X] I agree to contribute to the project under Apache 2 License.
- [X] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [X] The PR is proposed to the proper branch
- [X] There is a reference to the original bug report and related work
- [X] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
asmorkalov pushed a commit to opencv/opencv_contrib that referenced this pull request Jul 24, 2024
first proposal of cv::remap with relative displacement field

Relates to [#24621](opencv/opencv#24621), [#24603](opencv/opencv#24603)

CUDA implementation of the feature

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [X] I agree to contribute to the project under Apache 2 License.
- [X] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [X] The PR is proposed to the proper branch
- [X] There is a reference to the original bug report and related work
- [X] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
savuor pushed a commit to savuor/opencv that referenced this pull request Nov 8, 2024
First proposal of cv::remap with relative displacement field (opencv#24603) opencv#24621

Implements opencv#24603

Currently, `remap()` is applied as `dst(x, y) <- src(mapX(x, y), mapY(x, y))` It means that the maps must be filled with absolute coordinates.

However, if one wants to remap something according to a displacement field ("warp"), the operation should be `dst(x, y) <- src(x+displacementX(x, y), y+displacementY(x, y))`

It is trivial to build a mapping from a displacement field, but it is an undesirable overhead for CPU and memory.

This PR implements the feature as an experimental option, through the optional flag WARP_RELATIVE_MAP than can be ORed to the interpolation mode.

Since the xy maps might be const, there is no attempt to add the coordinate offset to those maps, and everything is postponed on-the-fly to the very last coordinate computation before fetching `src`. Interestingly, this let `cv::convertMaps()` unchanged since the fractional part of interpolation does not care of the integer coordinate offset.

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [X] I agree to contribute to the project under Apache 2 License.
- [X] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [X] The PR is proposed to the proper branch
- [X] There is a reference to the original bug report and related work
- [X] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
savuor pushed a commit to savuor/opencv that referenced this pull request Nov 21, 2024
First proposal of cv::remap with relative displacement field (opencv#24603) opencv#24621

Implements opencv#24603

Currently, `remap()` is applied as `dst(x, y) <- src(mapX(x, y), mapY(x, y))` It means that the maps must be filled with absolute coordinates.

However, if one wants to remap something according to a displacement field ("warp"), the operation should be `dst(x, y) <- src(x+displacementX(x, y), y+displacementY(x, y))`

It is trivial to build a mapping from a displacement field, but it is an undesirable overhead for CPU and memory.

This PR implements the feature as an experimental option, through the optional flag WARP_RELATIVE_MAP than can be ORed to the interpolation mode.

Since the xy maps might be const, there is no attempt to add the coordinate offset to those maps, and everything is postponed on-the-fly to the very last coordinate computation before fetching `src`. Interestingly, this let `cv::convertMaps()` unchanged since the fractional part of interpolation does not care of the integer coordinate offset.

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [X] I agree to contribute to the project under Apache 2 License.
- [X] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [X] The PR is proposed to the proper branch
- [X] There is a reference to the original bug report and related work
- [X] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants