Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@cudawarped
Copy link
Contributor

@cudawarped cudawarped commented Jan 10, 2023

This fixes issue #3412 returning incorrect results for single GpuMat's with a single row, which was a result of an error in #3378.

Unfortunatley because GpuMat's with a single row or column are allocated with cudaMalloc and not cudaMallocPitch and 2D texture objects require pitched memory this PR introduces extra staging memory, when the row/col dimension is 1. This is a quick fix for the issue which only introduces extra overhead for single dimension GpuMat's who's use should not be that common.

A "better" fix would be to rework the TexturePtr class so that

__device__ __forceinline__ R operator ()(index_type y, index_type x) const;

staticly dispatches a call to tex1Dfetch when either x or y is 1, but this will be a more significant change.

Dependant on opencv/opencv#23126

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

  • I agree to contribute to the project under Apache 2 License.
  • To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
  • The PR is proposed to the proper branch
  • There is a reference to the original bug report and related work
  • There is accuracy test, performance test and test data in opencv_extra repository, if applicable
    Patch to opencv_extra has the same branch name.
  • The feature is well documented and sample code can be built with the project CMake
force_builders=Custom
buildworker:Custom=linux-1
docker_image:Custom=ubuntu-cuda:16.04

@cudawarped cudawarped marked this pull request as draft January 11, 2023 07:14
@cudawarped cudawarped marked this pull request as ready for review January 11, 2023 14:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CUDA Remap gives incorrect result or crashes for 1 row/column src Mats

3 participants