SUBJECT: COMPUTER VISION
CLASS:-BE
BRANCH:-Computer Science and Design
UNIT-I
Introduction to Computer Vision
Definition of Computer Vision:-
Computer vision is a field of artificial intelligence (AI) that enables
computers and systems to derive meaningful information from visual
data, such as images and videos.
It involves the use of algorithms and models to process, analyze, and
understand visual content, allowing computers to:
1. *Perceive*: Detect and recognize objects, scenes, and activities.
2. *Reason*: Interpret the meaning and context of visual data.
3. *Act*: Make decisions or take actions based on visual information.
Human Vision vs Computer Vision:
Human Vision:
1. _Biological_: Based on the human eye and brain.
2. _Dynamic_: Continuously adapts to changing environments.
3. _Contextual_: Understands scene context and semantics.
4. _Robust_: Tolerant to variations in lighting, pose, and occlusion.
5. _High-level processing_: Involves cognitive processes like attention and memory.
Computer Vision:
1. _Artificial_: Based on algorithms and mathematical models.
2. _Programmed_: Operates within predetermined parameters.
3. _Limited context_: Struggles with scene understanding and semantics.
4. _Sensitive to variations_: Can be affected by changes in lighting, pose, and
occlusion.
5. _Low-level processing_: Focuses on pixel-level processing and feature extraction.
Parameter Human Vision Computer Vision
1.Based on Based on the human eye and Based on algorithms and
brain. mathematical models.
2.Complexity Human vision is more complex Computer vision is less
and nuanced. complex and nuanced.
3.Flexibility Human vision adapts to new computer vision requires
situations retraining.
4.Sensitive to Tolerant to variations in lighting, Can be affected by changes in
variations pose, and occlusion lighting, pose, and occlusion.
5.Processing High-level processing_: Involves Low-level processing_: Focuses
cognitive processes like attention on pixel-level processing and
and memory. feature extraction
6.Understanding Human vision understands the computer vision interprets
scene. pixels
7.Contextual Understands scene context and Operates within predetermined
semantics. parameters
Types of Computer Vision:
1. *Image Classification*: Identifying objects or scenes in images.
2. *Object Detection*: Locating and classifying objects within images.
3. *Image Segmentation*: Dividing images into regions or objects.
4. *Facial Recognition*: Identifying or verifying individuals based on
facial features.
5. *Optical Character Recognition (OCR)*: Extracting text from
images.
6. *Image Generation*: Creating new images or modifying existing
ones.
7. *Image Restoration*: Enhancing or restoring degraded or damaged
images.
8. *Motion Analysis*: Analyzing motion patterns in videos or images.
9. *Stereo Vision*: Calculating depth information from multiple
images.
10. *3D Reconstruction*: Creating 3D models from 2D images or
videos.
11. *Tracking*: Following objects or individuals across frames or
images.
12. *Scene Understanding*: Interpreting the context and meaning of
scenes.
13. *Action Recognition*: Identifying actions or activities in videos.
14. *Image Retrieval*: Searching for images based on content or
features.
15. *Medical Image Analysis*: Analyzing medical images for
diagnosis or research.
Computer Vision Pipeline:-
A computer vision pipeline is a series of steps used to process and
analyze visual data from images or videos.
Steps of a computer vision pipeline:
Step 1: Image Acquisition*
- Capture images or videos from various sources (cameras, sensors, files)
- Handle image formats (e.g., JPEG, PNG, TIFF)
*Step 2: Image Preprocessing*
- Resize, crop, or normalize images
- Apply filters (e.g., blur, thresholding) to enhance or remove noise
- Convert images to suitable color spaces (e.g., RGB, grayscale)
*Step 3: Object Detection*
- Identify regions of interest (ROI) or objects within the image
- Use techniques like:
- Edge detection (e.g., Canny, Sobel)
- Corner detection (e.g., Harris, FAST)
- Deep learning-based methods (e.g., YOLO, SSD)
*Step 4: Feature Extraction*
- Extract relevant features from detected objects or ROIs
- Features can include:
- Shape (e.g., contours, shape context)
- Color (e.g., histograms, color moments)
- Texture (e.g., Gabor filters, LBP)
- Other attributes (e.g., size, orientation)
*Step 5: Object Recognition*
- Classify detected objects into categories (e.g., people, cars, buildings)
- Use machine learning models or deep learning-based approaches (e.g., CNNs,
SVMs)
- Use techniques like:
- Thresholding (e.g., Otsu, adaptive)
- Clustering (e.g., k-means, hierarchical)
- Contour detection (e.g., active contours)
*Step 7: Pose Estimation*
- Determine the orientation and position of objects in 3D space
- Use techniques like:
- Stereo vision
- Structure from motion
- Deep learning-based methods (e.g., PoseNet)
*Step 8: Tracking*
ReplyForward
Add reaction
- Follow the movement of objects across frames in a video sequence
- Use techniques like:
- Kalman filters
- Particle filters
- Deep learning-based approaches (e.g., tracking by detection)
*Step 9: Scene Understanding*
- Interpret the context and meaning of the visual data
- Use techniques like:
- Semantic segmentation
- Object detection
- Graph-based methods (e.g., scene graphs)
*Step 10: Visualization*
- Display the results of the computer vision pipeline
- Use techniques like:
- Image rendering
- 3D visualization
- Augmented reality (AR)
The history of computer vision
1. *1960s: Early Beginnings*
- First attempts at image processing and recognition
- Work by pioneers like Larry Roberts, David Marr, and Tomaso Poggio
2. *1970s: Image Processing*
- Development of image processing techniques (filtering, edge
detection)
- First commercial image processing systems
3. *1980s: Machine Learning and Expert Systems*
- Introduction of machine learning and expert systems
- Applications in industrial inspection and robotics
4. *1990s: Object Recognition and Tracking*
- Advances in object recognition and tracking
- Development of algorithms like SIFT and SURF
5. *2000s: Deep Learning and Convolutional Neural Networks
(CNNs)*
- Emergence of deep learning and CNNs
- Breakthroughs in image classification, detection, and
segmentation
6. *2010s: Computer Vision Boom*
- Widespread adoption of computer vision in industries
(healthcare, automotive, security)
- Advancements in areas like facial recognition, natural language
processing, and autonomous vehicles
7. *Present Day: Advancements and Applications*
- Continued improvements in accuracy and efficiency
- Expanding applications in areas like:
- Healthcare (disease diagnosis, medical imaging)
- Autonomous systems (vehicles, drones, robots)
- Augmented reality and virtual reality
Computer vision applications
1. _Healthcare_:
- Medical imaging analysis (tumor detection, disease diagnosis)
- Patient monitoring (vital signs, fall detection)
- Surgical robotics and navigation
2. _Autonomous Vehicles_:
- Object detection and tracking (pedestrians, cars, lanes)
- Scene understanding and navigation
- Driver monitoring and assistance
3. _Security and Surveillance_:
- Facial recognition and identification
- Intrusion detection and alert systems
- Object detection and tracking (people, vehicles, bags)
4. _Retail and Marketing_:
- Customer behavior analysis (foot traffic, engagement)
- Product recognition and inventory management
- Visual search and recommendation systems
5. _Industrial Inspection and Automation_:
- Quality control and defect detection
- Object recognition and sorting
- Predictive maintenance and anomaly detection
6. _Agriculture and Environmental Monitoring_:
- Crop health and yield analysis
- Object detection (animals, vehicles, equipment)
- Weather and climate monitoring
7. _Gaming and Entertainment_:
- Player tracking and motion analysis
- Virtual and augmented reality experiences
- Game development and testing
Computer Vision Applications: object detection, Recognition,
Surveillance
*Object Detection:*
Object detection in computer vision involves several steps:
1. *Image Preprocessing*: Enhance image quality, remove noise, and
convert to a suitable format.
2. *Feature Extraction*: Extract relevant features from the image,
such as edges, corners, or textures.
3. *Object Proposal Generation*: Generate potential object locations
and sizes.
4. *Feature Description*: Describe the features within each proposed
region.
5. *Classification*: Classify each region as an object or background.
6. *Non-Maximum Suppression*: Remove duplicate detections.
7. *Post-processing*: Refine detections, merge overlapping boxes,
and output final results.
Techniques used for object detection:
1. *Sliding Window*: Slide a window across the image, classifying each region.
2. *Region-based CNNs (R-CNNs)*: Use CNNs to classify regions of interest.
3. *You Only Look Once (YOLO)*: Detect objects in one pass, without region
proposals.
4. *Single Shot Detector (SSD)*: Detect objects in one pass, with default boxes.
5. *Faster R-CNN*: Improve R-CNNs with a region proposal network.
6. *Mask R-CNN*: Add instance segmentation to Faster R-CNN.
1. _Autonomous Vehicles_: Detecting pedestrians, cars, and obstacles.
2. _Quality Inspection_: Detecting defects in products on a production line.
3. _Security Screening_: Detecting weapons or contraband in luggage.
4. _Medical Imaging_: Detecting tumors or abnormalities in X-rays or MRIs.
5. _Retail Analytics_: Detecting products on shelves for inventory management.
Algorithms used:
1. *Haar Cascades*
2. *HOG+SVM*
3. *Deep Learning-based methods* (e.g., CNNs, YOLO, SSD)
Evaluation metrics:
1. *Precision*
2. *Recall*
3. *AP (Average Precision)*
4. *mAP (mean Average Precision)*
Applications of Object Detection:-
1. _Autonomous Vehicles_: Detecting pedestrians, cars,
and obstacles.
2. _Quality Inspection_: Detecting defects in products on a
production line.
3. _Security Screening_: Detecting weapons or contraband
in luggage.
4. _Medical Imaging_: Detecting tumors or abnormalities in
X-rays or MRIs.
5. _Retail Analytics_: Detecting products on shelves for
inventory management.
*Object Recognition:*
Object recognition in computer vision involves several steps:
1. *Feature Extraction*: Extract relevant features from the image, such as edges,
corners, textures, or shapes.
2. *Feature Description*: Describe the features in a way that can be compared to
known objects.
3. *Object Representation*: Create a representation of the object, such as a
template or a model.
4. *Matching*: Match the features of the unknown object to the features of the
known object.
5. *Classification*: Classify the object based on the matching result.
Techniques used for object recognition:
1. *Template Matching*: Compare the image to a predefined
template.
2. *Feature-based Methods*: Extract and match features, such as
SIFT, SURF, or ORB.
3. *Deep Learning-based Methods*: Use Convolutional Neural
Networks (CNNs) to learn features and classify objects.
4. *3D Object Recognition*: Recognize objects in 3D space, using
techniques such as point cloud processing.
Algorithms used:
1. *SIFT (Scale-Invariant Feature Transform)*
2. *SURF (Speeded-Up Robust Features)*
3. *ORB (Oriented FAST and Rotated BRIEF)*
4. *CNNs (Convolutional Neural Networks)*
Evaluation metrics:
1. *Accuracy*
2. *Precision*
3. *Recall*
4. *F1-score*
Application of object Recognition:-
1. _Facial Recognition_: Identifying individuals for security or
authentication.
2. _Product Recognition_: Identifying products for e-commerce or
inventory management.
3. _Scene Understanding_: Recognizing objects and their context in an
image.
4. _Logo Detection_: Detecting brand logos in images or videos.
5. _Animal Recognition_: Recognizing species in wildlife conservation
efforts.
*Surveillance:*
Surveillance using computer vision involves using cameras and
algorithms to monitor and analyze visual data in real-time or
retrospectively. Here's a general overview of how surveillance takes
place:
1. _Camera Installation_: Cameras are installed in strategic locations
to capture footage of the area under surveillance.
2. _Video Feed_: The cameras transmit the video feed to a central
monitoring station or a cloud-based server.
3. _Object Detection_: Computer vision algorithms detect objects,
people, or vehicles within the video feed.
4. _Object Tracking_: The algorithms track the movement of detected
objects across multiple frames.
5. _Behavioral Analysis_: The system analyzes the behavior of
detected objects, such as loitering, running, or unusual movements.
6. _Alert Generation_: The system generates alerts for suspicious
behavior or predefined events.
7. _Human Verification_: Human operators verify the alerts and take
appropriate action.
8. _Data Storage_: The video feed and analytics data are stored for
future reference.
Computer vision techniques used in surveillance:
1. _Object Detection_: YOLO, SSD, Faster R-CNN
2. _Object Tracking_: Kalman Filter, Deep SORT
3. _Facial Recognition_: Deep Learning-based methods
4. _Behavioral Analysis_: Machine Learning-based methods
Applications of surveillance using computer vision:
1. _Public Safety_: Monitoring public areas, detecting crimes
2. _Border Control_: Detecting intruders, tracking movement
3. _Retail Security_: Preventing shoplifting, monitoring customer
behavior
4. _Traffic Monitoring_: Analyzing traffic flow, detecting accidents
5. _Industrial Inspection_: Monitoring equipment, detecting
anomalies
Image formation in the eye and the camera:-
The eye is considered by most neuroscientists as actually part of the
brain. It consists of a small spherical globe of about 2cm in diameter,
which is free to rotate under the control of 6 extrinsic muscles.
1. Light enters through the cornea (transparent outer layer)
2. The iris (colored part) controls the amount of light entering
3. Light passes through the lens, which changes shape to focus on
objects
at varying distances
4. The retina (innermost layer) converts light into electrical signals sent
to the brain
5. The brain interprets these signals as visual information
Fig. Sketch of a
cross-section
of the eye
Camera models
To understand how vision might be modeled computationally and replicated on a computer,
we need to understand the image acquisition process. The role of the camera in machine
vision is analogous to that of the eye in biological systems.
Pinhole camera model
The pinhole camera is the simplest, and the ideal, model of camera function. It has an
infinitesimally small hole through which light enters before forming an inverted image on
the camera surface facing the hole. To simplify things, we usually model a pinhole camera
by placing the image plane between the focal point of the camera and the object, so that
the image is not inverted. This mapping of three dimensions onto two, is called
a perspective projection (see Figure 4), and perspective geometry is fundamental to any
understanding of image analysis.
Figure: Perspective projection in
the pinhole camera model
1. Light enters through the lens
2. The aperture (opening) controls the amount of light entering
3. Light passes through the lens, which focuses on objects at varying
distances
4. The image sensor (CCD or CMOS) converts light into electrical
signals
5. The camera's processor interprets these signals as visual
information, storing it as an image
Perspective geometry:-
Euclidean geometry is a special case of perspective geometry, and the
use of perspective geometry in computer vision makes for a simpler
and more elegant expression of the computational processes that
render vision possible.
A perspective projection is the projection of a three-dimensional
object onto a two-dimensional surface by straight lines that pass
through a single point. Simple geometry shows that if we denote the
distance of the image plane to the centre of projection by f, then the
image coordinates (xi,yi) are related to the object coordinates (xo,yo,zo)
by
Radiometry: measuring light Sources, shadows, and shading,
.
*What is Digital Image Processing ?*
• Digital Image Processing means processing digital image by means of a digital
computer.
• Digital image processing is the use of algorithms and mathematical models to
process and analyze digital images.
• The goal of digital image processing is to enhance the quality of images, extract
meaningful information from images, and automate image-based tasks.
Image sampling and quantization are fundamental steps in digital image
processing:
*Image Sampling:*
- Sampling rate: The number of samples taken from the analog image per unit time
(e.g., pixels per second).
- Sampling theorem: States that the sampling rate must be at least twice the highest
frequency component in the image to avoid aliasing.
- Types of sampling:
- Random sampling
- Periodic sampling
- Uniform sampling
*Image Quantization:*
- Quantization levels: The number of discrete values used to represent the sampled
image (e.g., 256 levels for 8-bit images).
- Quantization error: The difference between the original analog value and the
quantized digital value.
- Types of quantization:
- Uniform quantization
- Non-uniform quantization (e.g., logarithmic)
- Vector quantization
*Applications:*
- Image compression: Sampling and quantization reduce data rate.
- Digital cameras: Sampling and quantization convert optical signals to digital
images.
- Medical imaging: Sampling and quantization affect image quality in MRI, CT scans,
etc.
Components of digital image processing:
1. *Image Acquisition*: Obtaining the image from a source, such as a camera or scanner.
2. *Image Representation*: Storing the image in a digital format, using pixels and color
models.
3. *Image Enhancement*: Improving the image quality, contrast, and brightness.
4. *Image Restoration*: Removing noise, blur, and other distortions.
5. *Image Compression*: Reducing the image file size for storage and transmission.
6. *Image Segmentation*: Dividing the image into regions or objects.
7. *Image Feature Extraction*: Identifying and extracting specific features, such as edges,
shapes, or textures.
8. *Image Recognition*: Identifying objects, patterns, or scenes within the image.
9. *Image Analysis*: Interpreting the meaning and context of the image.
10. *Image Display*: Outputting the processed image to a display device.
Elements of Digital Image Processing :
1. _Lighting_: Illumination, shading, and shadows affect image quality.
2. _Color_: Color spaces (RGB, YUV, etc.), color correction, and color
enhancement.
3. _Contrast_: Adjusting brightness and darkness levels for better
visibility.
4. _Texture_: Analyzing and synthesizing surface patterns and details.
5. _Edges_: Detecting and enhancing boundaries between objects.
6. _Shapes_: Recognizing and manipulating geometric forms.
7. _Motion_: Tracking movement and compensating for blur.
8. _Depth Perception_: Estimating distance and 3D information from 2D
images.
9. _Optical Flow_: Calculating motion vectors between consecutive
frames.
10. _Image Quality Metrics_: Evaluating visual fidelity using metrics like
PSNR, SSIM, etc.
Image Sensing And Acquisition:
It refer to the process of capturing and obtaining visual data from the
environment. Here are the key aspects:
*Image Sensing:*
1. *Photo detectors*: Convert light into electrical signals (e.g., CCD, CMOS).
2. *Optics*: Lenses, mirrors, and filters focus and manipulate light.
3. *Sensors*: Detect various properties like intensity, color, and polarization.
*Image Acquisition:*
1. *Cameras*: Capture images using photo detectors and optics.
2. *Scanners*: Scan documents or objects to create digital images.
3. *Imaging Modalities*: MRI, CT, X-ray, and other medical imaging techniques.
4. *Frame Grabbers*: Capture video frames from cameras or other sources.
*Types of Image Acquisition:*
1. *Visible Light Imaging*: Captures images in the visible spectrum.
2. *Infrared Imaging*: Captures heat signatures or thermal radiation.
3. *Multispectral Imaging*: Captures images across different spectral
bands.
4. *Hyperspectral Imaging*: Captures detailed spectral information.
*Applications:*
1. *Computer Vision*
2. *Medical Imaging*
3. *Surveillance*
4. *Industrial Inspection*
5. *Remote Sensing*
6. *Astronomy*
*Challenges:*
1. *Noise and Interference*
2. *Lighting Conditions*
3. *Optical Distortions*
4. *Sensor Limitations*
5. *Data Storage and Transmission*
Relationship Between Pixels:
Pixels (picture elements) are the basic units of digital images. The relationship
between pixels is crucial in understanding image processing and analysis. Here are
some key aspects:
1. _Neighboring Pixels_: Adjacent pixels in the image, which can be horizontally,
vertically, or diagonally connected.
2. _Pixel Neighborhood_: The set of neighboring pixels surrounding a central pixel.
3. _Pixel Connectivity_: The way pixels are connected, such as:
- 4-connectivity (horizontal and vertical)
- 8-connectivity (including diagonals)
4. _Pixel Proximity_: The distance between pixels, which affects image processing operations
like filtering and edge detection.
5. _Pixel Similarity_: The similarity in intensity, color, or texture between pixels, used in image
segmentation and clustering.
6. _Pixel Dependency_: The relationship between pixels in an image, such as:
- Spatial dependency (nearby pixels)
- Temporal dependency (pixels in a video sequence)
Spatial and Spectral Relationship between Pixel
Spatial and spectral relationships between pixels are two fundamental concepts in image
processing:
*Spatial Relationship:*
- Refers to the relationship between pixels in the same image, based on their proximity
and location.
- Describes how pixels interact with their neighbors in the spatial domain.
- Examples:
- Adjacency: Pixels next to each other.
- Proximity: Pixels near each other.
- Connectivity: Pixels connected by edges or contours.
*Spectral Relationship:*
- Refers to the relationship between pixels based on their color or intensity values.
- Describes how pixels interact with each other in the frequency domain.
- Examples:
- Color similarity: Pixels with similar color values.
- Intensity similarity: Pixels with similar brightness values.
- Texture similarity: Pixels with similar texture patterns.
Understanding the relationships between pixels is essential in various
image processing tasks, including:
1. _Image Filtering
2. _Edge Detection
3. _Image Segmentation
4. _Object Recognition
5. _Image Compression
Color Models:-
There are several color models used to represent and reproduce colors in various devices
and media. Here are some of the most common color models:
1. *RGB (Red, Green, Blue)*:
- Additive model
- Used in digital displays (monitors, TVs, mobile devices)
- Combines red, green, and blue light to create colors
2. *CMYK (Cyan, Magenta, Yellow, Black)*:
- Subtractive model
- Used in printing (inkjet, offset, laser)
- Combines cyan, magenta, and yellow inks to create colors, with black added for depth
3. *YUV (Luminance and Chrominance)*:
- Used in video transmission and compression (e.g., TV, video cameras)
- Separates luminance (brightness) from chrominance (color)
4. *HSV (Hue, Saturation, Value)*:
- Color wheel-based model
- Used in computer graphics, design, and image editing
- Describes colors by hue, saturation, and brightness
5. *HSL (Hue, Saturation, Lightness)*:
- Similar to HSV, but with lightness instead of value
- Used in computer graphics, design, and image editing
6. __LAB (L_a_b*)**:
- Device-independent model
- Used in color management, printing, and design
- Describes colors by lightness and two color opponent channels (a* and b*)
7. *Pantone*:
- Proprietary color matching system
- Used in printing, design, and branding
- Provides precise color reproduction across different materials and devices
Image Types:
Here are some common image types:
1. _Raster Images_:
- Made up of pixels (grid of tiny squares)
- Examples: JPEG, PNG, GIF, BMP
2. _Vector Images_:
- Made up of paths and shapes
- Examples: SVG, EPS, AI, CDR
3. _Bitmap Images_:
- Made up of a grid of pixels (like raster images)
- Examples: BMP, TIFF, GIF
4. _Grayscale Images_:
- Contain only shades of gray (no color)
- Examples: JPEG, PNG, TIFF
5. _Indexed Color Images_:
- Use a limited color palette (256 colors or less)
- Examples: GIF, PNG
6. _True Color Images_:
- Use a wide range of colors (16 million or more)
- Examples: JPEG, PNG, TIFF
7. _Compressed Images_:
- Reduced file size using algorithms (e.g., JPEG, GIF)
8. _Raw Images_:
- Unprocessed data from a camera sensor
- Examples: NEF, CR2, ARW
9. _HDR (High Dynamic Range) Images_:
- Capture a wider range of tonal values
- Examples: TIFF, JPEG, PNG
10. _3D Images_:
- Represent three-dimensional scenes or objects
- Examples: OBJ, STL, 3DS