Basic Image Opeartion
Basic Image Opeartion
Computer vision is a field of AI that enables machines to understand and interpret visual data.
Image processing is a key component of computer vision, involving basic operations such as
color, brightness, contrast, and sharpness adjustments. Image segmentation divides images
into smaller regions or objects using techniques like thresholding and edge detection. Image
registration aligns and matches images from different viewpoints or times. These operations
enable machines to process and analyze visual data for a wide range of applications, from
healthcare and autonomous vehicles to robotics and surveillance.
Introduction
The emphasis of image processing is the processing of raw images to apply some kind of
change. Typically, the purpose is to refine images or prepare them as input for a certain job,
whereas computer vision aims to describe and understand images. For example, noise
reduction, contrast, and rotation operations are common image processing components that
can be handled at the pixel level without requiring a comprehensive understanding of the
image.
Image Representation
An image comprises a rectangular array of dots known as pixels and is also defined as a two-
dimensional function. To represent digital images, we mostly use two methods.
Suppose a digital image with M rows and N columns is created by sampling the image f(x,
y). The values of the coordinates (x, y) are now discrete quantities. We will utilize integer
values for these discrete coordinates to simplify the notation and make things easier. As a
result, the coordinates at the origin have the values (x, y) = (0, 0). The following coordinate
values along the image's first row are shown as (x, y) = (0, 1). It's crucial to remember that
the second sample along the first row is denoted by the notation (0, 1).
By definition, an image is a set of square pixels (picture elements) arranged in columns and
rows in an array or matrix. The elements in these arrays or matrices also represent the pixels
or intensity values of the image.
People frequently used up to a byte to represent each pixel in an image. This means that 0 is
black and 255 is white, and values between 0 and 255 reflect the intensity for each pixel in
the image. A matrix of this kind is constructed for each color channel in the image.
Normalizing numbers between 0 and 1 is also typical.
Image as a function
In its most general form, an image is a function f from R2 to R (f: ℝ² → ℝ). It just helps us
have operations on images more effectively if we represent it also as a function.
A function(f) being going from R 2 to R, which simply corresponds to one particular
coordinate location on the image, say (i, j) and that is what we mean by R2 – f( x, y ) gives
the intensity of a channel at position (x, y) , that is If values are normalized, the intensity
value can range from 0 to 1 or from 0 to 255. – defined over a rectangle, with a finite range:
f: [a,b]x[c,d] → [0,1] – A color image is just three functions pasted together:
Image Acquisition
In the context of image processing, image acquisition can be roughly defined as acquiring an
image from a source, typically a hardware-based source, to send it via subsequent processes.
The initial stage in the workflow sequence for image processing is always image capture
because processing is impossible without an image. It can be crucial in some sectors to have a
constant baseline from which to work, and the obtained image is entirely unprocessed and is
the product of whatever hardware was used to generate it. If the image captured by a sensor
(e.g., a camera) is not already in digital form, it is converted using an analog-to-digital
converter.
Image Acquisition Techniques
The image acquisition process depends entirely on the hardware system, which may include a
sensor, also a hardware device. Light is converted into electrical charges by a sensor. A
camera's sensor measures the reflected energy from the scene being captured.
The most common and fundamental sensor for image acquisition is the photodiode. It is made
of silicon materials and has an output voltage proportional to the incoming light. The typical
approach for acquiring a 2D digital image employs a matrix of single sensors. Two
technologies exist side by side.
Sampling is converting a continuous scene into a discrete map. The matrix of pixels does this
naturally. Sampling a continuous image results in information loss. Intuitively, it is obvious
that sampling reduces resolution.
Quantization converts continuous light intensities into a finite collection of numbers. Images
are generally quantized into 256 gray values; each pixel consumes one byte (8 bits). The
reason for allocating 256 gray values to each pixel is not only because it is well-matched to
computer architecture but also because the amount of values is sufficient to give humans the
illusion of continuous gray value change.
Such distortions, in addition to greater effect and quantization noise, can impact image
acquisition. The two most common distortions are noise and blurring.
It highlights or hones the edges and borders of images and differentiates them to make
realistic showcases more suitable for display and inspection. To achieve the mentioned
attributes, picture augmentation broadens the dynamic range of selected features, allowing
them to be easily identified.
Image restoration differs from image enhancement in that the latter is intended to highlight
characteristics of the image that make the image more pleasant to the observer rather than
necessarily producing real data from a scientific standpoint. Image restoration techniques aim
to decrease noise and regain resolution loss.
Image compression Image compression is used to reduce the size of memory capacity
without affecting or destroying its quality, sparing an image or the data transmission required
to communicate it. Image compression is defined as reducing the amount of information
expected to express to a digital image.
Image representation and description This process occurs after the picture segmentation of
objects. It is used to effectively discover and recognize items in a scene to define the quality
characteristics during design recognition or in quantitative codes for competent capacity
during image compression.
The representation is related to showing visual output as a border or region. It can include
shape properties in corners or regional representations such as roughness or skeletal shapes.
The description, on the other hand, is most generally known as feature selection, and it
collects useful information from an image. The retrieved information can aid in accurately
distinguishing between object classes.
Image Labeling The process of labeling an object based on its description for classification
purposes. This is a critical stage in Computer Vision. A vast enough corpus of images must
be analyzed and labeled for the Computer Vision model to find comparable items in
additional images.
Image Transformation
Image transformation is the process of altering the appearance of an image in some way. This
may involve changing the scale or orientation of the image, applying filters or other effects,
or transforming the image to a different color space.
M=
[1 0 tx
0 1 ty]
You can take make it into a Numpy array of type np.float32 and pass it into
the cv.warpAffine() function .cv.warpAffine() function Applies an affine transformation to
an image. Rotation The transformation matrix of the form M is used to rotate an image for an
angle.
M=
[cosθ sinθ
-sinθ cosθ]
However, OpenCV supports scaled rotation with an adjustable center of rotation, allowing
you to rotate in whatever direction you want. The modified transformation matrix is given by
[α β (1−α)⋅center.x−β⋅center.y
β α β⋅center.x+(1−α)⋅center.y ]
Affine Transformation
All parallel lines in the original image will remain parallel in the output image after affine
transformation. We require three points from the input image and their matching places in the
output image to find the transformation matrix. Then cv.getAffineTransform returns a 2x3
matrix to be provided to cv.warpAffine.
Color Transformations
Color Transforms converts three-band red, green, and blue (RGB) images to one of several
specialized color spaces and back to RGB. You can generate a color-enhanced color
composite image by adjusting the contrast stretch between the two transforms. You can also
substitute the value or lightness band with another band (generally with higher spatial
resolution) to create an image that combines one image's color features with another's spatial
properties.
Contrast stretching is a simple technique for increasing the contrast of a digital image by
rescaling the pixel values to a larger range. It can enhance the visibility of details and features
in an image, particularly if the original image has low contrast or bad lighting. But, contrast
stretching has some drawbacks that you should be aware of when using it in your digital
image processing applications. To avoid this, always examine the image's histogram before
and after contrast stretching and modify the parameters accordingly. Other techniques, such
as histogram equalization or adaptive contrast enhancement, can improve contrast without
clipping or saturating the pixel values.
There are numerous image transformation techniques, each having pros and cons. Below are
some of the most frequent techniques, along with a quick comparison: The process of
resizing an image is known as scaling. It can be used to enlarge or decrease the size of an
image. Scaling an image larger may result in detail loss while scaling it smaller may result in
quality loss. Rotation is spinning an image at a specific angle. It can be used to adjust an
image's orientation or to provide interesting visual effects. However, rotating an image may
result in information loss at the image's edges. Translation is shifting an image within the
visual frame to a different location. It can be used to align an image or to build a composite
image from several photographs. There is no information loss during
translation. Shearing alters an image by shifting pixels along one axis while holding the
other axis constant. It can be used to rectify an image's skewness or to produce fascinating
aesthetic effects. Shearing does not result in information loss. Warping is the process of
changing an image by extending or compressing it unevenly. It can be used to rectify picture
distortion or to provide interesting effects. Warping may result in information loss along the
image's edges. Cropping is the process of eliminating a section of an image. It can be used to
isolate a certain area of an image or to remove undesired elements. Cropping may result in
data loss.
Each transformation approach has advantages and disadvantages, and the unique application
and desired outcome determine the technique used.
Image Segmentation
Definition and Explanation of Image Segmentation
Image segmentation is a computer vision approach that divides an image into several
segments or areas based on pixel values. Image segmentation aims to simplify or modify an
image's representation into something more meaningful and easier to examine.
Image segmentation has numerous uses, including object detection and tracking, picture
compression, medical image analysis, and robotics.
Thresholding: This technique includes defining a pixel intensity threshold for an image,
above or below which the pixels are separated into different regions.
Edge detection is the process of recognizing edges or boundaries between various regions in
an image.
Clustering is a technique that groups comparable pixels based on their spatial closeness and
color similarity.
Watershed segmentation is a technique that simulates the flow of water over an image, with
the image's peaks and valleys determining the segmentation borders.
The process of image segmentation in computer vision typically involves the following
steps:
1.Image pre-processing: The input image is pre-processed in this step to eliminate noise,
improve contrast, and normalize the illumination. This step is necessary to ensure that the
image segmentation algorithm can recognize and classify the various sections in the image
accurately.
2.Feature extraction: In this step, relevant features from the pre-processed image are
extracted, such as color, texture, shape, and intensity. The features used are determined by the
application and the properties of the image being segmented.
3.Image segmentation: The image is segmented into many segments or areas based on the
extracted features in this step. This can be accomplished through the use of many approaches
such as thresholding, clustering, edge detection, and region expansion.
5.Evaluation: Lastly, the quality of the segmentation result is assessed using a variety of
measures such as precision, recall, and F1 score. This phase is critical for ensuring that the
segmentation algorithm is robust and capable of generalizing to new pictures.
Advantages
Disadvantages
Edge detection is a technique for extracting the boundaries or edges of objects in an image. It
works by detecting differences in image intensity or colour.
Blob detection is a technique for extracting regions of interest in an image that has a similar
intensity or hue. It is frequently used to detect circular or elliptical objects.
Corner detection is a technique for detecting the corners or places of interest in an image. It is
frequently employed in feature-based matching and tracking.
Texture analysis is a technique for extracting the texture patterns or features of an image. It is
frequently employed in picture classification and segmentation.
Scale-invariant feature transform (SIFT): SIFT is a prominent feature extraction approach for
identifying and extracting critical points in an image that are insensitive to changes in scale,
rotation, and lighting.
HOG is a feature extraction technique that calculates the gradient magnitudes and
orientations in an image and generates a histogram of the gradient orientations.
Convolutional neural networks (CNNs): CNNs are deep learning models that can learn and
extract characteristics from photos automatically. They have demonstrated extraordinary
success in a wide range of computer vision applications, including object identification and
recognition.
The steps involved in image feature extraction in computer vision typically include the
following: Image preprocessing includes cleaning and improving the input image to remove
noise, artifacts, and other undesired elements that could interfere with feature extraction.
Image smoothing, contrast enhancement, and noise reduction are all common preprocessing
techniques.
Feature selection: In this step, a set of relevant features that can represent the salient qualities
of the input image is chosen. Domain expertise, image analysis techniques, or machine
learning algorithms are frequently used to choose features.
Feature extraction: In this stage, the selected features from the input image are computed. A
variety of approaches, including edge detection, corner detection, texture analysis, and deep
learning, can be used to extract features.
Feature representation: In this stage, the extracted features are represented in a suitable
manner that may be used for further analysis or categorization. Feature vectors, histograms,
and graphs are examples of common representations.
Feature normalization is the process of scaling or normalizing feature values such that they
are similar across images or datasets. Mean normalization, standardisation, and min-max
scaling are examples of common normalization approaches.
Feature reduction: In this stage, the extracted features' dimensionality is reduced to improve
computing efficiency, minimize noise, or prevent overfitting. Principal component analysis
(PCA), linear discriminant analysis (LDA), and feature selection algorithms are examples of
common reduction techniques.
Advantages:
Disadvantages:
Information loss: Feature extraction might result in information loss since some
features may be lost or aggregated during the process.
Noise sensitivity: Feature extraction can be susceptible to noise and other picture
distortions, affecting the quality and dependability of derived features.
Overfitting: Overfitting occurs when the extracted features are particular to the
training data and do not generalize well to new data.
Feature extraction can be computationally expensive, particularly for large datasets
and complicated feature extraction approaches.
Image Classification
Image classification is a task in computer vision in which an image is classified into one or
more specified classes or categories. Image categorization seeks to automate the process of
detecting and distinguishing objects, sceneries, or patterns in digital photographs.
Object Detection and Classification: Object detection and classification is a type of image
classification that detects and localizes objects within an image in addition to classifying it.
Identifying and classifying all instances of vehicles in an image, for example.
Data Collection: The process of gathering a dataset of labeled photographs, each of which is
associated with a predefined label or category.
Data Preprocessing: The preparation of data so that it can be used to train the model. This
stage involves duties including image scaling, normalization, and data augmentation.
Feature Extraction: The extraction of visual features from an input image utilizing techniques
such as edge detection, texture analysis, and deep learning.
Feature Selection: The selection of relevant and discriminative traits that can be utilized to
distinguish across classes.
Model Training: Using a supervised learning technique, such as logistic regression, decision
trees, or deep neural networks, a machine learning model is trained on the extracted features
and labels.
Model Evaluation: A separate test set is used to evaluate the trained model's accuracy and
generalization performance. To assess the model's performance, many measures such as
accuracy, precision, recall, and F1 score can be utilized.
Model Deployment: The deployment of the trained model to categorize new, unseen images
by extracting their attributes and predicting their labels using the learned model.
Advantages
Image classification algorithms automate the image analysis process, making it faster
and more efficient than manual analysis.
Objectivity: Picture classification systems produce consistent, objective results by
removing any potential for subjective biases presented by human analysts.
Scalability: Image classification techniques can handle vast amounts of data and can
be readily scaled up or down according to the size of the dataset.
Image classification techniques are versatile in that they may be used for a large range
of image kinds and formats, making them valuable in a variety of domains and
applications.
Disadvantages
Medical imaging and diagnostics have grown in importance in modern healthcare because
they provide crucial insights that can assist clinicians in detecting and diagnosing disorders.
Users can perform a variety of image processing procedures in 2D and 3D images, including:
Image filters are used to reduce and remove undesirable noise or distortions.
Cropping and resampling input data to make image processing easier and faster
The progress of computer vision in healthcare in recent years has resulted in faster and more
accurate diagnoses. Medical images can be instantly examined for disease indications using
computer vision algorithms, allowing for more accurate diagnosis in a fraction of the time
and cost of traditional procedures. By avoiding unnecessary treatments, assisted or automated
diagnostics help to lower total healthcare expenses. Image recognition systems have
demonstrated tremendous success in detecting illness patterns.
Robotics
Image processing is a basic part of robotics, allowing machines to comprehend and interpret
visual data from their surroundings. Robots can extract meaningful information from images
and utilize it to make decisions and complete tasks by applying image enhancement,
restoration, and segmentation techniques.
Image enhancement techniques improve an image's quality so that a robot can better interpret
it. Image restoration techniques help a robot understand images more properly. Image
processing techniques can also be utilized for 3D reconstruction and localization, which are
critical for robots to understand their surroundings and navigate.
The three primary categories of visual tracking, biometrics, and digital media security can be
used to put together the numerous problems of security in daily life that can be resolved by
utilizing image processing techniques. Visual tracking refers to computer vision techniques
that analyze a scene to extract features representing things (e.g., pedestrians) and track them
to provide input to analyze any unusual behavior.
Autonomous Vehicles
Augmented Reality
Preprocessing evaluates the scanned image for noise, skew, and tilt. Following preprocessing,
the noise-free image is sent to the segmentation step, where it is broken into individual
characters. The scanned image is subsequently transformed to grayscale and then to binary.
Feature Extraction: Feature extraction comes after segmentation. Individual image glyphs are
considered for image extraction in this case. In augmented reality, real-time image processing
is performed so that a user can hover a camera over a page and acquire augmented
information such as a 3D model, video, or explanation about that page.
Conclusion
Fundamental image operations are required in computer vision for processing and
evaluating digital images.
These operations include reading and writing image files, scaling images, and
converting images to different formats.
Image enhancement techniques can enhance image quality by lowering noise,
increasing contrast, or sharpening edges.
Image segmentation is separating an image into several segments or areas, whereas
feature extraction is detecting essential features in an image.
Image classification is the process of labeling or categorizing an image based on its
qualities or attributes, and numerous approaches and algorithms are available to
accomplish this goal.