Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@Abdurrahheem
Copy link
Contributor

This PR introduces a unified API for yolo object detection family.

Note: This is a preliminary PR, it is not to merged but is proposed to get feedback

Issue to fix:

  • Implement a sample for education purposes
  • Add support for rescaling output boxes to original image size
  • Add support for darknet yolo detectors (currently on onnx are supported)
  • Add support for image preprocessing for each yolo (not sure if we need this in the API explicitly)

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

  • I agree to contribute to the project under Apache 2 License.
  • To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
  • The PR is proposed to the proper branch
  • There is a reference to the original bug report and related work
  • There is accuracy test, performance test and test data in opencv_extra repository, if applicable
    Patch to opencv_extra has the same branch name.
  • The feature is well documented and sample code can be built with the project CMake

@LaurentBerger
Copy link
Contributor

I have already start one pr #24267 for yolox

@Abdurrahheem
Copy link
Contributor Author

Abdurrahheem commented Dec 12, 2023

I have already start one pr #24267 for yolox

This is a unified API for all yolo detectors

@LaurentBerger
Copy link
Contributor

LaurentBerger commented Dec 13, 2023

I have already start one pr #24267 for yolox

This is a unified test for all yolo detectors

wrong it's like my pr but for all yolo models
@fengyuentau

@Abdurrahheem
Copy link
Contributor Author

I have already start one pr #24267 for yolox

This is a unified test for all yolo detectors

wrong it's like my pr but for all yolo models @fengyuentau

Sorry, meant API for yolo detectors

@LaurentBerger
Copy link
Contributor

I have already start one pr #24267 for yolox

This is a unified test for all yolo detectors

wrong it's like my pr but for all yolo models @fengyuentau

Sorry, meant API for yolo detectors

I made this PR because of this

CV_DEPRECATED_EXTERNAL // avoid using in C++ code (need to fix bindings first)
YoloObjectDetector();

CV_WRAP void detect(InputArray frame, CV_OUT std::vector<int>& classIds,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not have this here.
User uses DetectionModel::detect() from the base class instead. Inheritance is designed for that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Type of box rectangle is different from detection model's detect method.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How user and bindings should work with that?
Inheritance is not just a word.
Why DetectionModel doesn't need that method overload?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have just checked the bindings. Call to detect method in python returns floating points for boxes as expected.
So, I do not understand, what is the issue?
I have overridden the detect method since I changed box from Rect to Rect2d.
Do you want me to keep it as in the ancestor class? If yes, why?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason for object detection use Rect2d? All subsequent actions with detected object require conversion to Rect2i anyway (ROI subtraction, drawing).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that regard you are right, there is no point. I was coming from the point of view "what base model outputs, opencv should return it as well". You suggest to change it to Rect?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyway we apply some postprocessing so output is not an origin. So yes, integer type is enough for such a high level API and I cannot image the case when float/double precision is required.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we still have this here?

User uses DetectionModel::detect() from the base class instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I overload detect in Imlp class. Without definition in the base class in fails to build. Could can check on your side

CV_OUT std::vector<float>& confidences, CV_OUT std::vector<Rect2d>& boxes,
float confThreshold = 0.5f, float nmsThreshold = 0.0f);

CV_WRAP void postProccess(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is better to move it outside of this class with appropriate names. Like NMSBoxes()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made it static

@Abdurrahheem
Copy link
Contributor Author

@dkurt @akarsakov @opencv-alalek some points on which I need you opinion

  1. Do we need images prepossessing as a part of API?
  2. Do we need image rescaling as part as a part of API?

ImagePaddingMode paddingMode;
YoloVersion yoloVersion;
float padValue;
float Height, Width;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
float Height, Width;
Size frameSize;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when I use Size instead of floats model predictions start drifting for some unknown reason

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Size2f

Copy link
Member

@dkurt dkurt Dec 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably because of integer division. Please rename at least to lower case height and width

@asmorkalov
Copy link
Contributor

 modules/dnn/include/opencv2/dnn/dnn.hpp:1655: trailing whitespace.
+     {   

@asmorkalov asmorkalov marked this pull request as ready for review December 17, 2023 10:13
padValue = padValue_;
}

void boxRescale(CV_OUT std::vector<Rect>& boxes){
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be done in a separate PR

CV_DEPRECATED_EXTERNAL // avoid using in C++ code (need to fix bindings first)
YoloObjectDetector();

CV_WRAP void detect(InputArray frame, CV_OUT std::vector<int>& classIds,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we still have this here?

User uses DetectionModel::detect() from the base class instead.

ImagePaddingMode paddingMode;
YoloVersion yoloVersion;
float padValue;
float Height, Width;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Size2f

std::vector<Mat> detections;
impl.dynamicCast<YOLODetectionModel_Impl>()->processFrame(frame, detections);

postProccess(
Copy link
Contributor

@opencv-alalek opencv-alalek Dec 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No business logic is allowed in PImpl methods.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where should this logic exist if not in overridden method of child class?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Obviously in implementation class.

Copy link
Contributor Author

@Abdurrahheem Abdurrahheem Dec 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then why does DetectionModel have the same "business logic" implemented in the main class? Take a look here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be refactored too (moved to Impl part).

Check TextDetectionModel => TextDetectionModel_EAST / TextDetectionModel_DB

for(auto & detection : detections){
cv::transposeND(detection, {0, 2, 1}, detection);
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move to the loop below

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

}

void YOLODetectionModel::postProccess(
std::vector<Mat>& detections,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
std::vector<Mat>& detections,
const std::vector<Mat>& detections,

} else {
CV_Error(Error::StsNotImplemented, "Unsupported padding mode");
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code style issue

net.forward(outs, outNames);
}

void setPaddingValue(const float padValue_){
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const can be avoided

YOLODetectionModel::YOLODetectionModel(const String& model, const String& config)
{
impl = makePtr<YOLODetectionModel_Impl>();
impl->initNet(readNet(model, config));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So let's use readNetFromDarknet

for (auto preds : detections)
{
if (!darknet)
preds = preds.reshape(1, preds.size[1]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This step is valid for every model. So can we remove the if condition?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not valid. For yolov5 and larger model not valid

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please share the shapes for them?


if (!darknet && detections[0].size[1] < detections[0].size[2]) {
yolov8 = true; // Set the correct flag based on tensor shape
}
Copy link
Member

@dkurt dkurt Dec 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

darknet flag can be removed completely considering https://github.com/opencv/opencv/pull/24691/files#r1434763871 and the following change:

    if (detections[0].dims == 3 && detections[0].size[1] < detections[0].size[2]) {
        yolov8 = true;  // Set the correct flag based on tensor shape
    }

@asmorkalov asmorkalov modified the milestones: 4.9.0, 4.10.0 Dec 22, 2023
asmorkalov pushed a commit that referenced this pull request Jan 31, 2024
Documentation for Yolo usage in Opencv #24898

This PR introduces documentation for the usage of yolo detection model family in open CV. This is not to be merge before #24691, as the sample will need to be changed. 


### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
@asmorkalov asmorkalov modified the milestones: 4.10.0, 4.11.0 Apr 27, 2024
@asmorkalov asmorkalov marked this pull request as draft September 13, 2024 11:03
@asmorkalov asmorkalov modified the milestones: 4.11.0, 5.0-release Dec 17, 2024
@asmorkalov asmorkalov closed this Oct 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants