Image Processing
Lecture 10: Image Analysis
Feature recognition and classification
Modified by:
Assoc. Prof. Dr Hossam Mahmoud Moftah
Faculty of computers and artificial intelligence– Beni-Suef
University
Boundary Descriptors Techniques (lab assignment using python)
There are several simple geometric measures that can be
useful for describing a boundary
Length
the number of pixels along a boundary gives a rough approximation of
its
Diameter (Major Axis)
Minor Axis
the line perpendicular to the major axis
Eccentricity
Ratio of major axis to minor axis
Adapted from Rushin Shah, Site Supervisor at Adani Shantigram
Region Descriptors Techniques
Shape features (lab assignment using python)
Area and perimeter
Adapted from Idar Dyrdal
Region Descriptors Techniques
Texture features
One of the simplest set of statistical features for texture
description consists of the following histogram-based
descriptors of the image (or region):
mean, variance (or its square root, the standard deviation),
skew, energy (used as a measure of uniformity), and
entropy,
Adapted from Rushin Shah, Site Supervisor at Adani Shantigram
Region Descriptors Techniques
Adapted from Rushin Shah, Site Supervisor at Adani Shantigram
Region Descriptors Techniques
Adapted from Rushin Shah, Site Supervisor at Adani Shantigram
Region Descriptors Techniques
Adapted from Rushin Shah, Site Supervisor at Adani Shantigram
Region Descriptors Techniques
Adapted from Rushin Shah, Site Supervisor at Adani Shantigram
Region Descriptors Techniques
Histogram-based (statistical) features
The energy descriptor
provides another measure of how the pixel values are distributed
along the gray-level range: images with a single constant value
have maximum energy (i.e., energy = 1); images with few gray
levels will have higher energy than the ones with many gray
levels. The energy descriptor can be calculated as
Adapted from Rushin Shah, Site Supervisor at Adani Shantigram
Region Descriptors Techniques
Adapted from Rushin Shah, Site Supervisor at Adani Shantigram
Region Descriptors Techniques
Histogram-based (statistical) features
Roughness
The variance is sometimes used as a normalized descriptor of
roughness (R), defined as:
Where, 𝜎2 is the normalized (to a [0, 1] interval) variance.
R = 0 for areas of constant intensity, that is, smooth texture.
Adapted from Rushin Shah, Site Supervisor at Adani Shantigram
Region Descriptors Techniques
Adapted from Rushin Shah, Site Supervisor at Adani Shantigram
Texture Features Example
Adapted from Rushin Shah, Site Supervisor at Adani Shantigram
Texture Features Example
Highest uniformity has lowest entropy
Adapted from Rushin Shah, Site Supervisor at Adani Shantigram
Texture Features - Gray level co-occurrence Matrix
Texture Features - Gray level co-occurrence Matrix
Adapted from Rushin Shah, Site Supervisor at Adani Shantigram
Texture Features - Gray level co-occurrence Matrix
(lab assignment using python)
Adapted from Rushin Shah, Site Supervisor at Adani Shantigram
Classification
Classification
To design a classifier it is essential to have a training set of
images
supervised learning:
the classes to which the images belong are known
unsupervised learning:
they are unknown
training the classifier: is the process of using data to
determine the best set of features for a classifier
Adapted from Geoff Dougherty, Image Processing for Medical Applications, CAMBRIDGE, 2009
Difference between Classification and Regression in Machine
Learning
Adapted from Jason Brownlee PhD, https://machinelearningmastery.com/about/
Difference between Classification and Regression in Machine
Learning
Adapted from Jason Brownlee PhD, https://machinelearningmastery.com/about/
Statistical classification
There are two general approaches to statistical
classification parametric and nonparametric:
Parametric Methods:
require probability distributions and estimate
parameters derived from them such as the mean and
standard deviation to provide a compact
representation of the classes.
there is a set of fixed parameters that uses to
determine a probability model that is used in
Machine Learning as well.
Examples: Logistic Regression, Naïve Bayes
Model, etc.
Adapted from Geoff Dougherty, Image Processing for Medical Applications, CAMBRIDGE, 2009,
https://www.geeksforgeeks.org/difference-between-parametric-and-non-parametric-methods/
Statistical classification
Adapted from Geoff Dougherty, Image Processing for Medical Applications, CAMBRIDGE, 2009,
Statistical classification
Parametric Methods:
Consider the case where there are just two classes
class 1 (ω1) and class 2 (ω2), and a single feature, x.
We have a training set, i.e. representative examples
from both classes,
so that we can measure the feature for both classes and
construct probability distributions for each
These are formally known as the probability density
functions or class-conditional probabilities (Appendix
B.3)
p(x|ω1) and p(x|ω2), i.e. the probabilities of measuring
the value x, given that the feature is in class 1 or class
2, respectively.
Adapted from Geoff Dougherty, Image Processing for Medical Applications, CAMBRIDGE, 2009,
Statistical classification
Parametric Methods:
If we have a large number of examples in each class,
then the probability density functions will be Gaussian
in shape (the Central Limit Theorem)
The classification problem is: given another feature
measurement, x, to which class does this feature
belong?
the posterior probability, P(ωi|x), i.e. the probability
that given a feature value of x, the feature belongs to
class ωi.
Probability theory, and specifically Bayes’ Rule, relates
the posterior probabilities to the class-conditional
probabilities or likelihoods (the derivation is given in
Appendix B.3):
Adapted from Geoff Dougherty, Image Processing for Medical Applications, CAMBRIDGE, 2009,
Statistical classification
Parametric Methods:
where P(ωi) is the a priori or prior probability (i.e. the
probability of being in class ω1 or ω2 based on the
relative numbers of those classes in the population,
prior to taking the test)
and p(x) is often considered a mere scaling factor (the
evidence) that guarantees that the posterior
probabilities sum to unity
Adapted from Geoff Dougherty, Image Processing for Medical Applications, CAMBRIDGE, 2009,
Statistical classification
Parametric Methods:
We want to maximize the posterior probability, P(ωi|x)
which is the same as maximizing p(x|ω1) . P(ωi)).
Bayes’ decision rule is:
Adapted from Geoff Dougherty, Image Processing for Medical Applications, CAMBRIDGE, 2009,
Statistical classification
Nonparametric
In Non-Parametric methods, there is no need to make
any assumption of parameters for the given population
or the population we are studying.
Non-parametric methods are gaining popularity
Examples: KNN, Decision Tree Model, etc.
Adapted from https://www.geeksforgeeks.org/difference-between-parametric-and-non-parametric-methods/
k-nearest-neighbor (k-NN) classifier
(lab assignment using python for medical application)
K nearest neighbors (KNN) is a simple algorithm that stores
all available cases and classifies new cases based on a
similarity measure (distance function)
A case is classified by a majority voting of its neighbors, with
the case being assigned to the class most common among its K
nearest neighbors measured by a distance function.
If K=1, then the case is simply assigned to the class of its
nearest neighbor
Adapted from Bing Liu, CS583, UIC
k-nearest-neighbor (k-NN) classifier
Distance Function Measurements:
Most common: Euclidean distance
To classify a new input vector x, examine the k-closest training data points to x and assign the object to the most frequently occurring class
adopted from https://www.cut-the-knot.org/pythagoras/DistanceFormula.shtml
Adapted from David Sontag, New York University and Vibhav Gogate, Carlos Guestrin,
Mehryar Mohri, & Luke Zettlemoyer
k-nearest-neighbor (k-NN) classifier
Adapted from David Sontag, New York University and Vibhav Gogate, Carlos Guestrin,
Mehryar Mohri, & Luke Zettlemoyer
k-nearest-neighbor (k-NN) classifier
Adapted from Carla P. Gomes
[email protected]KNN Example
Points X1(Acid Durability) X2(Strength) Y(Classification)
P1 7 7 BAD
P2 7 4 BAD
P3 3 4 GOOD
P4 1 4 GOOD
P5 3 7 ?
Adapted from Anand Bhosale presentation, international institude of information technology,innovation and leadership
Euclidean Distance From Each Point
Adapted from Anand Bhosale presentation, international institude of information technology,innovation and leadership
3 Nearest NeighBour
P1 P2 P3 P4
(7,7) (7,4) (3,4) (1,4)
Euclidean
Distance of
P5(3,7) from
Class BAD BAD GOOD GOOD
Adapted from Anand Bhosale presentation, international institude of information technology,innovation and leadership
KNN Classification
Points X1(Durability) X2(Strength) Y(Classification
)
P1 7 7 BAD
P2 7 4 BAD
P3 3 4 GOOD
P4 1 4 GOOD
P5 3 7 GOOD
Adapted from Anand Bhosale presentation, international institude of information technology,innovation and leadership
KNN pseudocode
Adapted from Anand Bhosale presentation, international institude of information technology,innovation and leadership
Unsupervised methods
With unsupervised classification, the class labels are
unknown, and the data are plotted to see whether they cluster
naturally.
Unsupervised methods examples:
k-means clustering (see lecture 5)
Hierarchical clustering
Adapted from Anand Bhosale presentation, international institude of information technology,innovation and leadership
Measuring Classification Performance
Adapted from https://en.wikipedia.org/wiki/Confusion_matrix
The Confusion Matrix
(lab assignment using python for medical application)
Adapted from https://glassboxmedicine.com/2019/02/17/measuring-performance-the-confusion-matrix/
The Confusion Matrix
Adapted from https://glassboxmedicine.com/2019/02/17/measuring-performance-the-confusion-matrix/
The Confusion Matrix
Adapted from https://glassboxmedicine.com/2019/02/17/measuring-performance-the-confusion-matrix/
The Confusion Matrix
Adapted from https://glassboxmedicine.com/2019/02/17/measuring-performance-the-confusion-matrix/
The Confusion Matrix
(lab assignment using python for medical application)
A receiver operating characteristic (ROC) curve
(lab assignment using python for medical application)
The ROC curve is a graphical plot that illustrates the
diagnostic ability of a binary classifier system as its
discrimination threshold is varied.
The ROC curve is created by plotting the true positive
rate (TPR) against the false positive rate (FPR) at various
threshold settings.
Adapted from https://en.wikipedia.org/wiki/Receiver_operating_characteristic /
A receiver operating characteristic (ROC) curve
Adapted https://en.wikipedia.org/wiki/Receiver_operating_characteristic
Confusion Matrix example
Adopted from Poornima Singh, Sanjay Singh, and Gayatri S Pandi-Jain, Effective heart disease prediction system using
data mining techniques, Int J Nanomedicine, 2018
The End