Robotics-Module 3
CS(AI), AI&DS
Lekshmi R
Muthoot Institute of Science and Technology 1
Muthoot Institute of Technology and Science 2
Robot Vision
• https://www.wiredworkers.io/blog/robot-vision-how-does-it-work-and-what-can-you-do-with-it/?srslti
d=AfmBOoozdE3YYKb3sFTIRDyXfbaZYEkDrSxyu_-hDgkxPstj_8Eca0m
Muthoot Institute of Science and Technology 3
• Robot Vision involves using a combination of camera hardware and computer algorithms to
allow robots to process visual data from the world.
• For example, your system could have a 2D camera which detects an object for the robot to
pick up.
• A more complex example might be to use a 3D stereo camera to guide a robot to mount
wheels onto a moving vehicle.
• Without Robot Vision, your robot is essentially blind.
• This is not a problem for many robotic tasks, but for some applications Robot Vision is
useful or even essential.
Muthoot Institute of Science and Technology 4
• Robot Vision must incorporate aspects of robotics into its techniques and algorithms, such
as kinematics, reference frame calibration and the robot's ability to physically affect the
environment.
• Visual Servoing is a perfect example of a technique which can only be termed Robot
Vision, not Computer Vision.
• It involves controlling the motion of a robot by using the feedback of the robot's position as
detected by a vision sensor.
• Seven Stages of Robot Vision
• Robot vision systems process images step-by-step to interpret and interact with the
environment. Here are the seven key stages in robot vision:
1. Image Acquisition
2. Pre-processing
3. Segmentation
4. Feature Extraction
Muthoot Institute of Science and Technology 5
5. Image Recognition
6. Interpretation and Decision Making
7. Execution and Control
Image Acquistion
Muthoot Institute of Science and Technology 6
• It is the first step in digital signal processing.
• It involves capturing visual data from the environment using specialized sensors, which
are then processed for further analysis in robotics, AI, and machine vision applications.
• Image sensing refers to the process of detecting light or electromagnetic waves using a
sensor, while image acquisition involves converting this sensed data into a digital image
that can be processed by a computer or robotic system.
• A self-driving car’s camera captures images of the road, which are then processed to
detect pedestrians, traffic signs, and other vehicles.
• Key components of an image sensing system include:
1. Sensors (Image Sensors)
These are the core components that detect light and convert it into electrical signals.
The two main types of sensors are: CCD and CMOS
2. Lenses
Lenses focus light onto the image sensor to create a sharp image.
Muthoot Institute of Science and Technology 7
• Fixed-focus lenses (used in industrial applications)
• Zoom lenses (adjustable field of view)
• Wide-angle lenses (capturing large areas)
• Telephoto lenses (focusing on distant objects)
• Example: A drone camera uses a wide-angle lens for surveying large areas.
3. Illumination (Lighting Systems)
• Proper lighting is critical for clear image acquisition. Different illumination techniques are
used depending on the application.
• Common Lighting Techniques:
• LEDs: Cost-effective, used in industrial settings.
• Infrared (IR) lighting: Used in night vision.
• Laser illumination: Used in 3D scanning and depth sensing.
Example: In automated inspection systems, backlighting is used to highlight object
edges.
Muthoot Institute of Science and Technology 8
4. Frame Grabbers & Digital Interfaces
• These are hardware components used to transfer image data from sensors to computers.
Frame Grabber: Captures frames in high-speed applications.
Interfaces:
• USB 3.0 (low-cost)
• GigE (for industrial applications)
• Camera Link (high-speed processing)
Muthoot Institute of Science and Technology 9
Muthoot Institute of Science and Technology 10
Muthoot Institute of Science and Technology 11
Muthoot Institute of Science and Technology 12
2. Preprocessing
• Image preprocessing is an essential step in computer vision and image analysis.
• It enhances image quality by reducing noise, improving contrast, and highlighting
important features.
• Some common preprocessing techniques include:
a. Filtering (Noise Reduction)
❖ Filtering smooths an image to reduce unwanted noise while preserving important
details.
❖ Example: Gaussian Blur Uses a Gaussian function to create a smoothing filter.
❖ Reduces high-frequency noise (random variations in brightness or color).
❖ Commonly used before edge detection to prevent false edges
❖ Application: Used in denoising, feature extraction, and medical imaging, and facial
recognition.
Muthoot Institute of Science and Technology 13
b. Edge Detection
❖ Edge detection highlights boundaries between objects in an image.
❖ Common Methods:
Sobel Operator: Detects edges by calculating intensity gradients in horizontal and
vertical directions.
Canny Edge Detection: A multi-step algorithm that smooths the image, detects
strong edges, and refines weak ones.
❖ Application: Used in object detection, image segmentation, and face recognition.
c. Histogram Equalization
❖ (Contrast Enhancement)This technique improves contrast by redistributing the pixel
intensity values across the image.
❖ Analyzes the image histogram (distribution of pixel intensities).
❖ Spreads out frequently occurring intensity values to enhance darker or lighter areas.
❖ Makes details in low-contrast regions more visible.
❖ Application: Used in medical imaging (X-rays, MRIs), satellite imagery
Muthoot Institute of Science and Technology 14
3.Segmentation
• Segmentation is the process of dividing an image into meaningful regions to simplify
analysis.
• It helps in object recognition, scene understanding, and feature extraction.
• Different segmentation methods are used depending on the image characteristics and
the application.
a. Thresholding (Binary Segmentation)
• Thresholding converts an image into two regions: foreground (object) and
background.
• A threshold value is chosen and pixels above the threshold are set to white (object),
and pixels below it are set to black (background).
• Example: Separating handwritten text from a white paper in OCR (Optical Character
Recognition).
• Limitation: Works best when there is a clear intensity difference between the object
and background.
Muthoot Institute of Science and Technology 15
b. Clustering (K-means Segmentation): K-means clustering groups pixels with similar
colors or intensities into K different clusters.
❖ The algorithm initializes K cluster centers randomly.
❖ It assigns each pixel to the nearest cluster based on color/intensity similarity.
❖ The cluster centers are updated iteratively until convergence.
❖ Example: Segmenting an image into different regions like sky, trees, and water in landscape
photography.
• Advantage: Works well for multi-region segmentation.
• Limitation: Can be computationally expensive for high-resolution images.
c. Edge-Based Segmentation
❖ This method detects object boundaries using edge-detection techniques like Canny or Sobel
filters.
❖ Identifies areas where pixel intensity changes sharply
Muthoot Institute of Science and Technology 16
❖ Connects edge points to form object boundaries.
❖ Example: Detecting lanes on a road for self-driving cars
❖ Advantage: Effective for objects with well-defined edges.
❖ Limitation: Struggles with. blurry images or objects with weak edges.
• Example: Robot Identifying an Apple
• Imagine a robot sorting apples. It uses segmentation to separate the apple from the
background:
• Thresholding: If the apple is bright red and the background is dark, thresholding can isolate
it.
• K-means Clustering: Groups similar colors (red for apples, green for leaves, brown for
stems).
• Edge-Based Segmentation: Detects the round shape of the apple against the background.
• Once segmented, the robot can identify, pick, and place the apple correctly.
Muthoot Institute of Science and Technology 17
4. Feature Extraction
• Feature extraction is a crucial step in image processing where key characteristics of an image are
identified and represented in a way that helps with object recognition, classification, or analysis.
• Features provide meaningful information about the object while reducing the complexity of raw
image data.
a. Color Features (RGB, HSV, etc.)
❖ Color features help differentiate objects based on their hue, saturation, and intensity.
❖ Extracts pixel values from color spaces like RGB (Red, Green, Blue) or HSV (Hue, Saturation,
Value).
❖ Used for distinguishing objects with different colors, like a red apple vs. a green apple.
b. Shape Features (Edges, Contours)
❖ Shape features help recognize objects based on their geometric properties.
❖ Edge detection (Sobel, Canny) finds boundaries between objects.
❖ Contours trace the outline of objects to detect their shape.
❖ Hough Transform is used for detecting lines (roads, edges of buildings) or circles (coins, wheels).
Muthoot Institute of Science and Technology 18
❖ Example: A robot differentiates between a box (rectangular edges) and a cylinder (curved
edges) by analyzing contours.
c. Texture Features (Patterns)
Texture describes surface properties, like whether an object appears rough or smooth.
❖ Gray-Level Co-occurrence Matrix (GLCM): Measures how often pixel intensity values
repeat in an image.
❖ Local Binary Patterns (LBP): Captures fine texture details by analysing pixel
relationships.
❖ Example: Medical imaging uses texture analysis to differentiate between normal and
diseased tissues (e.g., smooth vs. rough tumour surfaces).
❖ These features make it easier for AI, robots, and vision systems to understand images and
take appropriate actions.
Muthoot Institute of Science and Technology 19
• The Moment invariant is a feature extraction technique used to extract the global
features for shape recognition and identification analysis. There are many types of
Moment Invariants techniques since it was introduced.
• Moment invariants are normalized central moments of order p+ q used as descriptors
because they are invariant to translation, rotation, and scaling.
• They are useful because they define a simply calculated set of region properties that
can be used for shape classification and part recognition.
Muthoot Institute of Science and Technology 20
5. Image Recognition
• Image recognition is the process of identifying objects in an image by matching extracted features
with predefined models. It is used in robotics, self-driving cars, security systems, and more. Image
recognition is called the labeling process applied to a segmented object of a scene.
• That is, the image recognition presumes that objects in a scene have been segmented as
individual elements (e.g., a bolt, a seal, a wrench).
a. Template Matching: Compares a small reference image (template) with different regions of the input
image.
❖ A template (e.g., a specific shape or symbol) is slid over the target image.
❖ The system calculates how closely the template matches different parts of the image.
❖ Works well when objects have a fixed size, orientation, and lighting.
❖ Example: A barcode scanner matches a scanned pattern with stored barcode templates.
❖ Advantage: Simple and effective for predictable objects.
❖ Limitation: Fails if the object varies in size, angle, or lighting
b. Machine Learning (CNNs, Deep Learning)
Muthoot Institute of Science and Technology 21
❖ The model learns hierarchical features, from edges to complex patterns.
❖ During recognition, the model predicts the object's identity based on learned patterns.
❖ Example: Face recognition (e.g., unlocking a phone with Face ID).
❖ Self-driving cars identifying pedestrians and traffic signs.
❖ Advantage: Handles complex images, variations in lighting, and different angles.
❖ Limitation: Requires a large dataset and computational power.
c. Key point Detection (SIFT, SURF) Used for recognizing objects by detecting unique
feature points.
❖ Common Methods: SIFT (Scale-Invariant Feature Transform): Identifies key points
that remain consistent under rotation, scale, and lighting changes.
❖ SURF (Speeded-Up Robust Features): A faster alternative to SIFT for real-time
applications.
❖ Example: A robot recognizing a specific tool in a cluttered workshop based on its key
points.
❖ Advantage: Works well for complex objects with unique details.
❖ Limitation: Computationally intensive.
Muthoot Institute of Science and Technology 22
❖ Example: A Robot Recognizing Objects Imagine a robot in a warehouse that needs
to recognize different packages:
❖ Template Matching: The robot matches predefined labels or barcodes to identify
packages.
❖ Machine Learning (CNNs): If packages have variable shapes, CNNs can classify
them based on previous training.
❖ SIFT/SURF: If a specific tool or product needs to be identified, key point detection
helps recognize it despite rotations or occlusions.
❖ Object recognition matches extracted features to predefined models using techniques
like template matching (pattern comparison), machine learning (CNNs for deep
learning), and key point detection (SIFT, SURF).
❖ These methods power modern AI applications like face recognition, robotics, and
autonomous vehicles
Muthoot Institute of Science and Technology 23
6. Interpretation & Decision Making
• Once an image is processed, segmented, and recognized, the final step is interpreting the
information and making decisions based on what the system "sees.“
• This is essential in robotics, automation, and AI-driven applications.
a. Perception (Understanding the Scene)The robot analyzes extracted features (color,
shape, texture) and recognized objects
❖ Example: A sorting robot detects whether a product is defective.
b. Decision Making (Choosing an Action)Based on predefined rules, AI algorithms, or
machine learning models, the robot decides what action to take.
❖ Example: If a defective product is found, the robot diverts it from the conveyor belt.
c. Action Execution (Carrying Out the Task)The robot physically interacts with the
environment based on its decision.
❖ Example: A robotic arm picks and places items in different bins.
❖ Example 1: Sorting Robot → Identifying & Separating Defective Products Vision
System: Captures images of products on a conveyor belt.
Muthoot Institute of Science and Technology 24
❖ Feature Extraction: Identifies scratches, cracks, or missing components.
• Decision Making: If a product is defective → Move it to the rejection bin. If a product is
good → Continue to packaging.
• Real-world Use Case: Quality control in manufacturing (electronics, food processing,
pharmaceuticals).
❖ Example :Autonomous Robot → Avoiding Obstacles Vision System: Uses cameras and
LiDAR sensors to detect obstacles.
❖ Decision Making: If an obstacle is detected → The robot calculates a new path. If the path is
clear → The robot continues moving forward.
❖ Action Execution: The robot adjusts its movement dynamically.
❖ Real-world Use Case: Self-driving cars avoid pedestrians and other vehicles.
❖ Warehouse robots navigate safely around workers and shelves.
❖ Interpretation & decision-making allow robots to analyze visual data, determine the best
course of action, and interact with their environment. This is crucial for automation,
robotics, and AI-driven applications, enabling systems to operate intelligently and
autonomously.
Muthoot Institute of Science and Technology 25
7. Execution & Control
❖ Involve sending movement commands to robotic systems based on vision processing and
decision-making.
❖ With actuators, sensors, and feedback loops, robots and AI-powered systems can perform
real-world tasks efficiently
❖ After interpreting visual data and making decisions, the final step is executing actions based
on the processed information.
❖ This is where the robot or system physically interacts with the environment.
a. Actuation (Moving the Robot)
❖ The system sends commands to actuators (motors, robotic arms, wheels).
❖ Example: A robotic arm receives a command to pick up an object.
b. Feedback Loop (Adjusting in Real-Time)
❖ Sensors (cameras, LiDAR, encoders) provide feedback to refine actions.
❖ Example: A self-driving car continuously adjusts its speed and steering based on obstacles
and road signs.
Muthoot Institute of Science and Technology 26
c. Task Completion (Final Execution)
❖ The robot successfully performs the action.
❖ Example: A warehouse robot moves an item to the correct location.
Muthoot Institute of Science and Technology 27
• Study Feature Extraction, Image segmentation in detail
Muthoot Institute of Science and Technology 28
Camera Sensor Hardware Interfacing
• Camera sensors are essential components in computer vision, robotics, surveillance, and
autonomous systems.
• Hardware interfacing involves connecting and communicating with a camera sensor to
capture and process images or video streams.
• Components of a Camera Sensor Interface
• Interfacing a camera sensor requires:
1. Camera Module – The physical sensor (e.g., CMOS, CCD).
2. Processor/Controller – A microcontroller (MCU), FPGA, or a computer (e.g., Raspberry
Pi, Jetson Nano).
3. Communication Bus – The interface type (e.g., MIPI, USB, I2C, SPI).
4. Power Supply – Proper voltage levels for the sensor.
5. Driver Software – Software/firmware to process the camera feed.
Muthoot Institute of Science and Technology 29
In machine vision, camera sensor hardware interfacing involves connecting the camera to the
system or device that will be processing the images captured by the camera.
This process typically involves a few key steps:
Selection of the appropriate camera: Choosing the right camera for your machine vision
application is critical. Considerations such as resolution, frame rate, dynamic range, and
sensitivity are all important factors to consider when selecting a camera.
Physical connection: Once you have selected the camera, you will need to physically connect
it to your processing device. This typically involves using a cable that is compatible with both
the camera and the processing device.
Software setup: After the camera is physically connected, you will need to configure the
software on your processing device to interface with the camera. This typically involves
installing drivers for the camera and configuring any software settings, such as resolution and
frame rate.
Testing: Once the camera is connected and configured, you will need to test the system to
ensure that it is capturing images correctly and that the processing device is receiving the data
from the camera. This may involve adjusting camera settings or software configurations to
optimize image quality.
Muthoot Institute of Science and Technology 30
Software setup of the Camera
• Install the camera driver: The camera driver is a piece of software that enables communication
between the camera and the computer.
• Check the camera manufacturer's website to find the appropriate driver for your camera and
operating syst
• Install the camera software development kit (SDK): The camera SDK is a set of libraries and
tools that enable developers to access and control the camera.
• Download and install the camera SDK from the manufacturer's website.
• Install the machine vision software: There are various machine vision software packages
available, such as OpenCV, HALCON, and Matrox Imaging Library. Choose the software that
best suits your needs and install it on your computer.
• Configure the software: Once you have installed the necessary software, you will need to
configure it to work with your camera.
• Follow the instructions provided in the camera SDK and machine vision software
documentation to configure the software.
Muthoot Institute of Science and Technology 31
• Test the camera: After you have completed the software setup, test the
camera to ensure that it is functioning correctly.
• Use the sample programs provided with the camera SDK or machine
vision software to test the camera's functionality.
• Integrate with your application: Once the camera and software are
working correctly, you can integrate the camera into your application.
• Use the API provided by the machine vision software to access and
control the camera from within your application
Muthoot Institute of Science and Technology 32