-
-
Notifications
You must be signed in to change notification settings - Fork 56.4k
Yolo Object Detection API #24691
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Yolo Object Detection API #24691
Conversation
|
I have already start one pr #24267 for yolox |
This is a unified API for all yolo detectors |
wrong it's like my pr but for all yolo models |
Sorry, meant API for yolo detectors |
I made this PR because of this |
| CV_DEPRECATED_EXTERNAL // avoid using in C++ code (need to fix bindings first) | ||
| YoloObjectDetector(); | ||
|
|
||
| CV_WRAP void detect(InputArray frame, CV_OUT std::vector<int>& classIds, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not have this here.
User uses DetectionModel::detect() from the base class instead. Inheritance is designed for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Type of box rectangle is different from detection model's detect method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How user and bindings should work with that?
Inheritance is not just a word.
Why DetectionModel doesn't need that method overload?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have just checked the bindings. Call to detect method in python returns floating points for boxes as expected.
So, I do not understand, what is the issue?
I have overridden the detect method since I changed box from Rect to Rect2d.
Do you want me to keep it as in the ancestor class? If yes, why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any reason for object detection use Rect2d? All subsequent actions with detected object require conversion to Rect2i anyway (ROI subtraction, drawing).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that regard you are right, there is no point. I was coming from the point of view "what base model outputs, opencv should return it as well". You suggest to change it to Rect?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anyway we apply some postprocessing so output is not an origin. So yes, integer type is enough for such a high level API and I cannot image the case when float/double precision is required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we still have this here?
User uses DetectionModel::detect() from the base class instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I overload detect in Imlp class. Without definition in the base class in fails to build. Could can check on your side
| CV_OUT std::vector<float>& confidences, CV_OUT std::vector<Rect2d>& boxes, | ||
| float confThreshold = 0.5f, float nmsThreshold = 0.0f); | ||
|
|
||
| CV_WRAP void postProccess( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is better to move it outside of this class with appropriate names. Like NMSBoxes()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made it static
|
@dkurt @akarsakov @opencv-alalek some points on which I need you opinion
|
modules/dnn/src/model.cpp
Outdated
| ImagePaddingMode paddingMode; | ||
| YoloVersion yoloVersion; | ||
| float padValue; | ||
| float Height, Width; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| float Height, Width; | |
| Size frameSize; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when I use Size instead of floats model predictions start drifting for some unknown reason
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Size2f
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably because of integer division. Please rename at least to lower case height and width
|
modules/dnn/src/model.cpp
Outdated
| padValue = padValue_; | ||
| } | ||
|
|
||
| void boxRescale(CV_OUT std::vector<Rect>& boxes){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you use blobRectToImageRect by @LaurentBerger?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can be done in a separate PR
| CV_DEPRECATED_EXTERNAL // avoid using in C++ code (need to fix bindings first) | ||
| YoloObjectDetector(); | ||
|
|
||
| CV_WRAP void detect(InputArray frame, CV_OUT std::vector<int>& classIds, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we still have this here?
User uses DetectionModel::detect() from the base class instead.
modules/dnn/src/model.cpp
Outdated
| ImagePaddingMode paddingMode; | ||
| YoloVersion yoloVersion; | ||
| float padValue; | ||
| float Height, Width; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Size2f
modules/dnn/src/model.cpp
Outdated
| std::vector<Mat> detections; | ||
| impl.dynamicCast<YOLODetectionModel_Impl>()->processFrame(frame, detections); | ||
|
|
||
| postProccess( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No business logic is allowed in PImpl methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where should this logic exist if not in overridden method of child class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Obviously in implementation class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then why does DetectionModel have the same "business logic" implemented in the main class? Take a look here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be refactored too (moved to Impl part).
Check TextDetectionModel => TextDetectionModel_EAST / TextDetectionModel_DB
Other minor fixes from PR
| for(auto & detection : detections){ | ||
| cv::transposeND(detection, {0, 2, 1}, detection); | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move to the loop below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
| } | ||
|
|
||
| void YOLODetectionModel::postProccess( | ||
| std::vector<Mat>& detections, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| std::vector<Mat>& detections, | |
| const std::vector<Mat>& detections, |
| } else { | ||
| CV_Error(Error::StsNotImplemented, "Unsupported padding mode"); | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code style issue
| net.forward(outs, outNames); | ||
| } | ||
|
|
||
| void setPaddingValue(const float padValue_){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const can be avoided
| YOLODetectionModel::YOLODetectionModel(const String& model, const String& config) | ||
| { | ||
| impl = makePtr<YOLODetectionModel_Impl>(); | ||
| impl->initNet(readNet(model, config)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So let's use readNetFromDarknet
| for (auto preds : detections) | ||
| { | ||
| if (!darknet) | ||
| preds = preds.reshape(1, preds.size[1]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This step is valid for every model. So can we remove the if condition?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not valid. For yolov5 and larger model not valid
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please share the shapes for them?
|
|
||
| if (!darknet && detections[0].size[1] < detections[0].size[2]) { | ||
| yolov8 = true; // Set the correct flag based on tensor shape | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
darknet flag can be removed completely considering https://github.com/opencv/opencv/pull/24691/files#r1434763871 and the following change:
if (detections[0].dims == 3 && detections[0].size[1] < detections[0].size[2]) {
yolov8 = true; // Set the correct flag based on tensor shape
}150b264 to
2899387
Compare
Documentation for Yolo usage in Opencv #24898 This PR introduces documentation for the usage of yolo detection model family in open CV. This is not to be merge before #24691, as the sample will need to be changed. ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake
This PR introduces a unified API for yolo object detection family.
Note: This is a preliminary PR, it is not to merged but is proposed to get feedback
Issue to fix:
Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
Patch to opencv_extra has the same branch name.