Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
15 views8 pages

Report Objectdetection Achyut

This document outlines a project focused on real-time object detection using a webcam integrated with the ROS 2 framework and the YOLOv8 model. It details the system design, implementation steps, challenges faced, and results observed, highlighting the effectiveness of the approach in capturing and processing images for object detection. Key findings include high detection accuracy, particularly for common objects, and the potential for further improvements in system performance and application in robotic navigation.

Uploaded by

achyut.morang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views8 pages

Report Objectdetection Achyut

This document outlines a project focused on real-time object detection using a webcam integrated with the ROS 2 framework and the YOLOv8 model. It details the system design, implementation steps, challenges faced, and results observed, highlighting the effectiveness of the approach in capturing and processing images for object detection. Key findings include high detection accuracy, particularly for common objects, and the potential for further improvements in system performance and application in robotic navigation.

Uploaded by

achyut.morang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

EE50013 Autonomous Navigation

Assignment: Object Detection using Webcam with ROS


Achyut Morang – [email protected]
March 16, 2025

1 Introduction
Object detection plays a fundamental role in modern robotics and autonomous systems, en-
abling machines to perceive and understand their environment. With advancements in deep
learning, real-time object detection has become more efficient and accurate, making it crucial
for applications such as autonomous vehicles, industrial automation, and surveillance [3, 1,
2]. This project focuses on integrating a real-time object detection system into the ROS 2
framework using YOLOv8.
The motivation for this experiment stems from the increasing need for robust and efficient
perception systems in autonomous navigation. Traditional object detection methods often
suffer from high computational costs or limited generalization to real-world scenarios. YOLO
(You Only Look Once) is known for its ability to perform object detection with high accuracy
and speed, making it ideal for real-time robotic applications [3, 2]. By implementing YOLOv8
in ROS 2, this project aims to explore its feasibility in a robotic vision pipeline.
The system is designed to capture live images from a webcam, publish them to a ROS
topic, and process them for object detection. The detected objects are visualized with
bounding boxes and confidence scores, and detection results are stored in a structured format
for further analysis. This implementation demonstrates how deep learning-based object
detection can be effectively integrated into a real-time ROS 2-based perception system, which
is crucial for smart mobility applications and autonomous decision-making in robots [4, 5].

2 System Design
2.1 Development Environment
The project was implemented in a virtualized Ubuntu environment running on macOS using
UTM as the virtualization software. Since ROS 2 is not natively supported on macOS, the
VM setup was necessary to ensure compatibility with ROS 2 Humble Hawksbill. A bridged
network configuration was used to enable seamless file transfers and remote execution.

1
ROS 2 was chosen as the middleware due to its modernized publisher-subscriber archi-
tecture, support for real-time applications, and integration with robotic frameworks. The
YOLOv8 model was selected for its balance between accuracy and efficiency in object
detection tasks.

2.2 Object Detection Pipeline


The system followed a modular architecture, enabling image streaming, object detection,
and result visualization. The major components were:

• Image Publisher: Captures webcam frames and publishes them to the /image raw
topic.

• YOLO Detector: Subscribes to /image raw, processes frames using YOLOv8, and
publishes detected objects to /detection results.

• Result Logger: Stores detection results in a structured JSON format for further
analysis.

• Bag File Recorder: Logs both raw images and detection results for offline analysis.

2.3 ROS 2 Topics and Message Flow


The system was built using a publisher-subscriber model in ROS 2. The two main topics
used for communication were:

• /image raw: Published by the webcam node to stream real-time images.

• /detection results: Published by the YOLO detector node to store object detection
outputs.

Since custom message generation in ROS 2 was unsuccessful, a workaround was


implemented by using standard message types (std msgs/String) for detection results.

2.4 Challenges Faced


Several challenges were encountered during development, requiring iterative debugging and
improvements:

• ROS 2 Message Generation Issues: Custom messages were not successfully built
due to CMake and package.xml errors, leading to the use of standard message types.

• Library Dependencies: Compatibility issues arose with OpenCV, PyTorch, and


Ultralytics, requiring multiple troubleshooting steps.

2
• Webcam Detection Issues: The correct device mapping had to be manually speci-
fied due to multiple video input sources.

• Storage and Transfer Constraints: The ROS 2 bag files were large (over 400MB)
and had to be compressed for efficient storage and submission.

• Visualization Issues: OpenCV’s GUI functions required additional configurations


when running inside a virtual machine.

2.5 Final Architecture


The final implementation followed a structured design:

• Publisher: image publisher → Publishes images to /image raw.

• Subscriber: yolo detector → Processes images and publishes results to /detection results.

• Logger: Saves detections to detections.json.

• Bag File Recorder: Captures and stores image and detection data for offline review.

This design ensures modularity, real-time processing, and efficient data logging.

3 Implementation Steps
3.1 Setting Up the ROS 2 Workspace
A ROS 2 workspace was initialized with the standard structure. The object detection
package was created inside the src directory. The necessary ROS 2 dependencies were
configured in package.xml and CMakeLists.txt to support Python-based execution.

3.2 Writing the Image Publisher Node


The image publisher.py script was developed to:

• Capture real-time images from the webcam.

• Convert them into ROS 2-compatible messages.

• Publish them on the /image raw topic.

The node was tested independently to ensure correct image streaming.

3
3.3 Implementing the YOLOv8 Detector Node
The yolo detector.py script was designed to:

• Subscribe to the /image raw topic to receive images.


• Process each frame using the YOLOv8 model.
• Extract bounding boxes, confidence scores, and class names.
• Publish the results to the /detection results topic.
• Log the results in detections.json for further analysis.

Additionally, the script was fine-tuned to detect objects with low confidence scores,
ensuring that even uncertain detections were logged.

3.4 Configuring ROS 2 Topics


The ROS 2 topics were verified to ensure proper message flow. The two key topics used
were:

• /image raw: Streaming real-time frames from the webcam.


• /detection results: JSON-formatted object detection results.

3.5 Running and Testing the System


The object detection system was executed and tested in real time. Debugging involved:

• Ensuring ROS 2 was sourced properly before running nodes.


• Checking the availability of expected ROS 2 topics.
• Verifying frame processing and detection accuracy.
• Analyzing logged detection results.

3.6 Recording and Managing the ROS 2 Bag File


For post-processing and offline analysis, a ROS 2 bag file was recorded using the ros2 bag
record command. Unlike ROS 1, which stores bag files in a compressed format, ROS 2 bag
files use an SQLite-based .db3 format, requiring special tools for playback.
The following topics were logged:

• /image raw – Stores all captured webcam images.


• /detection results – Logs object detection outputs.

4
3.7 Logging and Storing Detection Results
The detection results were stored in detections.json. The logging mechanism was modified
to:

• Append new detections instead of overwriting previous ones.

• Maintain consistency in formatting for easy visualization.

• Store bounding box coordinates, object class, and confidence scores.

The recorded JSON data was later analyzed to extract meaningful insights, including
object occurrence frequency, confidence distribution, and trends across multiple runs.

Figure 1: Frequency distribution of detected object classes.

4 Results and Observations


The implemented object detection system successfully captured live webcam images, pro-
cessed them using YOLOv8, and logged detections in both ROS 2 topics and structured
JSON files. The experiment provided insights into real-time object detection, confidence
score distribution, and the effectiveness of ROS 2 for modular robotic applications.

4.1 Key Findings


The key observations from the experiment are summarized as follows:

5
• The majority of detections were persons, with a total of 679 instances recorded,
highlighting the model’s strong performance in detecting humans in the scene.

• Other frequently detected objects included cell phones (66 times), bottles (60
times), remotes (38 times), and vases (26 times), reflecting common objects in
the test environment.

• Less frequently detected objects included refrigerators (8 times), laptops (2 times),


chairs (1 time), and a single instance of a banana, demonstrating the variability in
the dataset.

• The confidence score statistics revealed that:

– The average confidence score across all detections was 0.83, indicating a gen-
erally reliable detection performance.
– The highest confidence recorded was 0.98, while the lowest was 0.25.
– 75% of the detected objects had confidence scores above 0.93, ensuring
high reliability in most predictions.
– A minimum confidence threshold of 0.25 was used to log even lower-confidence
detections, which provided insight into borderline predictions.

Figure 2: Confidence score distribution of detected objects.

6
4.2 Detection Analysis
A frequency analysis of detected objects showed that certain classes were identified far more
often than others. Figure 1 illustrates the distribution of detected objects, with dominant
categories being people and commonly used items. This pattern was influenced by the
camera placement, the scene composition, and the presence of dynamic elements such as
moving persons.

Figure 3: Sample detection: A cell phone identified with bounding box and confidence score.

An important observation was the confidence score distribution of detections. Fig-


ure 2 shows that most detections had confidence scores above 0.60, indicating that the
model was generally confident in its predictions. However, a small percentage of detections
had confidence below 0.30, suggesting some level of false positives or ambiguous detections.
Adjusting the confidence threshold can allow for a trade-off between precision and recall.

4.3 Sample Detections


The system effectively detected various objects in real-time. Figure 3 illustrates a sample de-
tection: where a cell phone was successfully identified. The bounding boxes and confidence
scores are clearly displayed, demonstrating the robustness of the YOLOv8 model.

4.4 Performance and System Behavior


The system was tested in real-time, achieving an average inference time of approximately
100ms per frame. Performance was influenced by factors such as hardware limitations in the
virtualized environment, input frame resolution, and the number of objects present in each
frame.

7
For offline analysis, a ROS 2 bag file was recorded, capturing both raw images and
detection results. Unlike ROS 1, where bag files are stored as a single binary, ROS 2 bag
files use an SQLite-based .db3 format. The recorded topics included /image raw for webcam
frames and /detection results for object metadata. The bag file was compressed and
archived as detection data ros2bag.zip, submitted alongside detections.json and a
demonstration video demo detection.mp4.

5 Conclusion
This experiment demonstrated the feasibility of using ROS 2 and YOLOv8 for real-time
object detection. The structured logging in JSON allowed for post-processing and visualiza-
tion, aiding in a quantitative assessment of system performance. The findings suggest that
fine-tuning confidence thresholds and computational optimizations can further improve the
system. Future work could explore multi-camera setups, object tracking, and integration
with robotic motion planning to enhance real-world applicability.

References
[1] Alexey Bochkovskiy, Chien-Yao Wang, and Hong-Yuan Mark Liao. “YOLOv4: Optimal
Speed and Accuracy of Object Detection”. In: arXiv preprint arXiv:2004.10934 (2020).
[2] Glenn Jocher, Ayush Chaurasia, Jing Qiu, et al. YOLOv8: Cutting-Edge, Real-Time
Object Detection and Segmentation. Available at https://github.com/ultralytics/
ultralytics. 2023.
[3] Joseph Redmon and Ali Farhadi. “YOLO9000: Better, Faster, Stronger”. In: Proceedings
of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016),
pp. 7263–7271.
[4] Open Robotics. ROS 2: Robot Operating System. Available at https://docs.ros.org/
en/rolling/. 2023.
[5] Mujahed Talha et al. “ROS 2: An Overview of the Next Generation Robotics Middle-
ware”. In: arXiv preprint arXiv:2101.00689 (2021).

You might also like