-
-
Notifications
You must be signed in to change notification settings - Fork 56.4k
🐛 Fix CUDA for old GPUs without FP16 support #25880
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
modules/dnn/src/registry.cpp
Outdated
| if (cuda4dnn::doesDeviceSupportFP16()) | ||
| backends.push_back(std::make_pair(DNN_BACKEND_CUDA, DNN_TARGET_CUDA_FP16)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It works well with single gpu configuration. The current GPU may be old and does not support FP16, but the second one does. It's popular case, if one GPU is used for rendering another one - for compute.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is fine to check the CURRENT CUDA device.
Lets left the caller responsibility to properly select used device via cudaSetDevice before these calls.
BTW, OpenCL backend has similar problems but looks like they are handled in another place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pushed fix for the issue and extended check to target management. @opencv-alalek could you take a look again?
|
@Jamim Could you pull the branch from Github and test the last commit with your GPU. |
|
Could you pull the branch from Github and test the last commit with your GPU.
Sorry, I'm not at home now. I'll be able to do so in the evening.
|
Jamim
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @asmorkalov,
I've tested your changes and they work well on my system.
Also, I have a couple of minor suggestions. Please take a look.
Co-authored-by: Aliaksei Urbanski <[email protected]>
Fixes #21461
This is a build-time solution that reflects https://github.com/opencv/opencv/blob/4.10.0/modules/dnn/src/cuda4dnn/init.hpp#L68-L82.We shouldn't add an invalid target while building withCUDA_ARCH_BIN< 53.(please see this discussion)
This is a run-time solution that basically reverts these lines.
I've debugged these changes, coupled with other fixes, on Gentoo Linux and related tests passed on my laptop with
GeForce GTX 960M.Alternative solution:
Best regards!
Pull Request Readiness Checklist
n/aThere is accuracy test, performance test and test data in opencv_extra repository, if applicablen/aThe feature is well documented and sample code can be built with the project CMake