Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
37 views101 pages

CH 12 Image Classification

The document discusses image classification techniques including supervised and unsupervised classification. Supervised classification uses training samples of known classes while unsupervised classification determines classes based on spectral properties without prior knowledge. Both techniques aim to categorize pixels into classes using statistical measures of spectral similarity.

Uploaded by

Belex Man
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views101 pages

CH 12 Image Classification

The document discusses image classification techniques including supervised and unsupervised classification. Supervised classification uses training samples of known classes while unsupervised classification determines classes based on spectral properties without prior knowledge. Both techniques aim to categorize pixels into classes using statistical measures of spectral similarity.

Uploaded by

Belex Man
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 101

Image Classification

Chapter 12
Intro
 Digital image classification is assigning
pixels to classes (categories)
 Each pixel has as many digital values as there
are bands
 Compare the values to pixels of known
composition and assign them accordingly
 Each class (in theory) is homogenous
Intro
 Direct uses
 Produce a map of land-use/land-cover
 Indirect use
 Classification is an intermediate step, and may
form only one of several data layers in a GIS
 Water map vs water quality GIS
Intro
 Classifier is a computer program that does
some sort of image classification
 Many different types available
 No one method is “best”
 Simplest is a point or spectral classifier
 Considers each pixel individually
 Simple and economic
 Can’t describe relation to neighboring pixels
Intro
 Spatial neighborhood classifiers consider
groups of pixels
 More difficult to program and more expensive
Intro
 Supervised vs Unsupervised classification
 Supervised requires the analyst to identify
known areas
 Unsupervised determines a set number of
categories based on a computer algorithm
 Hybrid classifiers are a mix of the two
Supervised classification
Unsupervised classification
 No previous knowledge assumed about data.
 Tries to spectrally separate the pixels.
 User has controls over:
 Number of classes
 Number of iterations
 Convergence thresholds
 Two main algorithms: Isodata and k-means
Unsupervised classification
Informational Classes
 Informational classes are categories of
interest to the users
 Geological units
 Forest types
 Land use
Spectral classes
 Spectral classes are pixels that are of uniform
brightness in each of their several channels
 The idea is to link spectral classes to
informational classes
 However, there is usually variability that
causes confusion
 A forest can have trees of varying age, health,
species composition, density, etc.
Classes
 Informational classes then are usually
composed of numerous spectral
subclasses.
 These subclasses may be displayed as a single
unit on the final product
 We often will want to examine how different
the pixels within a class are
 Look at variance and standard deviation
Variance
 The Variance is defined as:
 The average of the squared differences from the
Mean.
 Which is the square of the standard deviation, ie:
σ2
The heights (at the shoulders) are: 600mm, 470mm, 170mm, 430mm and 300mm.

600 + 470 + 170 + 430 + 300 1970


Mean =    =    = 394
5 5
Now, we calculate each dogs difference from the Mean:

To calculate the Variance, take each difference, square it, and then average the result:

2062 + 762 + (-224)2 + 362 + (-94)2 108,520


Variance: σ2 =    =    = 21,704
5 5

So, the Variance is 21,704.


And the Standard Deviation is just the square root of Variance, so:
Standard Deviation: σ = √21,704 = 147

So, using the Standard Deviation we have a "standard" way of knowing what is normal,
and what is extra large or extra small.

Rottweillers are tall dogs. And Dachsunds are a bit short ... but don't tell them!
Normal Distribution
Differences between means
 A crude estimate of might be to simply look at the
differences in means between two classes
 This is too simplistic since it does not account for
differences in variability between the two
 A better way is to look at normalized differences

| xa  xb |
ND 
sa  sb
NDWI
 The Normalized Difference Water Index (NDWI)
(Gao, 1996)
 The SWIR reflectance reflects changes in both the
vegetation water content and the spongy mesophyll
structure in vegetation canopies,
 The NIR reflectance is affected by leaf internal structure
and leaf dry matter content but not by water content.
 The combination of the NIR with the SWIR removes
variations induced by leaf internal structure and leaf dry
matter content, improving the accuracy in retrieving the
vegetation water content (Ceccato et al. 2001).
 Normalized Difference Snow Index (NDSI )
 Normalized Difference Vegetation Index

 Normalized Difference Cloud Index (NDCI) is


defined as a ratio between the difference and the
sum of two zenith radiances measured for two
narrow spectral bands in the visible and near-IR
regions.
 It provides extra tools to remove the radiative effects of the
3D cloud structure.
Unsupervised Classification
 RS images are usually composed of several
relatively uniform spectral classes
 Unsupervised classification is the
identification, labeling and mapping of such
classes
Unsupervised Classification
 Advantages
 Requires no prior knowledge of the region
 Human error is minimized
 Unique classes are recognized as distinct units
 Disadvantages
 Classes do not necessarily match informational
categories of interest
 Limited control of classes and identities
 Spectral properties of classes can change with
time
Unsupervised Classification
 Distance Measures are used to group or
cluster brightness values together
 Euclidean distance between points in space
is a common way to calculate closeness
 Euclidean metric is the "ordinary" distance
between two points that one would measure with
a ruler, and is given by the Pythagorean formula.
Euclidean Distance
 Distance
Euclidean Distance
 Example, the (Euclidean) distance between
points (2, -1) and (-2, 2)
 dist((2, -1), (2, 2))
 = √(2 - (-2))² + ((-1) - 2)²  
 = √(2 + 2)² + (-1 - 2)²  
 = √(4)² + (-3)²  
 = √16 + 9  
 = √25  
 = √5.
Euclidean Distance
 This can be extended to multiple dimensions (bands)
 Add the differences together

1 2 3 4
Pixel A 34 28 22 6
Pixel B 26 16 52 29
Difference 8 12 -30 -23
 (Dif)
Σ (dif)2 = 1,63764
2
144 900 529
 √1637 = 40.5
Distances
 There are a number of other distances that
can be calculated
 L1 distance is the sum of absolute differences
between different bands
 e.g. 8=12+30+23 = 73 for previous example
Example Landsat bands

Near-IR band Red band


Example spectral plot

• Two bands of data.


• Each pixel marks a location
in this 2d spectral space
• Our eye’s can split the data
Band 2

into clusters.
• Some points do not fit
clusters.

Band 1
K-means (unsupervised)
1. A set number of cluster centers are
positioned randomly through the spectral
space.
2. Pixels are assigned to their nearest cluster.
3. The mean location is re-calculated for each
cluster.
4. Repeat 2 and 3 until movement of cluster
centres is below threshold.
5. Assign class types to spectral clusters.
Example k-means
Band 2

Band 2

Band 2
Band 1 Band 1 Band 1

1. First iteration. The 2. Second iteration. 3. N-th iteration. The


cluster centers are The centers move to centers have
set at random. the mean-center of stabilized.
Pixels will be all pixels in this
assigned to the cluster.
nearest center.
Key Components
 Regardless of the unsupervised algorithm
need to pay attention to methods for
 Measuring distance
 Identifying class centroids
 Testing distinctness of classes
Decision Boundary
 All classification programs try to determine
classes based on “decision boundaries”
 That is, divide feature space into an exhaustive
set of nonoverlapping regions
 Begin with a set of prelabeled points for each
class (training samples)
 Minimum Distance to Means – determine locus of
points equidistant from class mean
 Nearest neighbor – determine locus of points
equidistant from the nearest member of 2 classes
Decision boundary
Decision boundary
Decision Boundaries
 Classification usually not so easy
 Desired classes have distributions in feature
space that are not obviously separated
 Nearly always have to use more than three
features (dimensions)
 Wind up having to use discriminant functions
Supervised classification
 Start with knowledge of class types.
 Classes are chosen at start
 Training samples are created for each class
 Ground truth used to verify the training
samples.
 Quite a few algorithms. Here we will look at:
 Parallelepiped
 Maximum likelihood
Supervised Classification
 Advantages
 Analyst has control over the selected classes
tailored to the purpose
 Has specific classes of known identity
 Does not have to match spectral categories on
the final map with informational categories of
interest
 Can detect serious errors in classification if
training areas are missclassified
Supervised Classification
 Disadvantages
 Analyst imposes a classification (may not be
natural)
 Training data are usually tied to informational
categories and not spectral properties
 Remember diversity
 Training data selected may not be representative
 Selection of training data may be time consuming
and expensive
 May not be able to recognize special or unique
categories because they are not known or small
Supervised Classification
 Training data
 Specify corner points of selected areas
 Assume that the correct ID is known
 Often requires ancillary data (maps, photos, etc.)
 Field work often needed to verify
Supervised Classification
 Key Characteristics of Training areas
 Number of pixels
 Have several training areas for one category
 A total of at least 100 pixels per category
 Number depends on the number of categories,
their diversity, and resources available
 More areas also allow discarding ones that have too
high a variance
 Shape –not important, usually rectangular for
ease
Supervised Classification
 More Key Characteristics
 Locations must be spread around the image and
be easily transferred from map to image
 Size must be large enough to estimate spectral
characteristics and variations
 Varies with sensor type and resolution
 Varies with heterogeneity of area
 Uniformity means that each training set should
be as homogenous as possible
Idealized Sequence
 Assemble information
 Conduct field studies
 Conduct preliminary study of scene to
determine landmarks and assess image
quality
 Identify training areas
 Evaluate training data
 Edit training data if necessary
Feature Selection
 Graphic Method – one of the first simple
feature selection aids
 Plot ±1σ in a bar graph
Feature Selection
 Cospectral parallelepiped plots (ellipse
plots) visual representation of separability in
two dimensional feature space
 Use mean and SD of training class statistics for
each class, c, and band, k
 Parallelepipeds represent mean ±1σ of each band
for each class
Feature Selection
 Statistical Methods
 Statistically try to separate clusters
 Results in two types of errors
 A pixel is assigned to a class to which it does not
belong (error of commission)
 A pixel is not assigned to its appropriate class (error
of omission)
Parallelepiped
 Also known as box decision rule, or level-
slice procedure
 Based on the values of the training data
Parallelepiped (supervised)
 For each training region determine the range of
values observed in each band.
 These ranges form a spectral box (or parallelepiped)
which is used to classify this class type.
 Assign new image pixels to the parallelepiped which
it fits into best.
 Pixels outside all boxes can be unclassified or
assigned to the closest one.
 Problems with classes that exhibit high correlation
between bands. This creates long ‘diagonal’ data-
sets that don’t fit well into a box.
Parallelepiped example

Training classes plotted in spectral


space. In this example using 2 bands.
Parallelepiped example
continued

• Each class type defines a


spectral box
• Note that some boxes overlap
even though the classes are
spatially separable.
• This is due to band correlation in
some classes.
• Can be overcome by
customising boxes.
Parallelepiped example

• The algorithm tests a pixel to see if its spectral


values fall within the bounds of each class.
• Pixels are sequentially tested against the defined
classes (i.e., class 1 is tested first, class 2 is tested
next, etc.).
• As soon as the test is passed, the pixel is classified
and the algorithm moves on to the next pixel.
• This classifier is mathematically simple.
• Problem: We will have ambiguities when working
with classes with overlapping bounds.
Parallelepiped example

• The checking procedure stops once the digital


numbers, associated with the investigated pixel,
lies within the bounds of a certain class.
– The classification result is order dependent.
• In other words, the final classification result
depends on how the classes are numbered.
• This is not a desirable feature.
• Solution: Minimum distance classifier.
Minimum Distance Classifier
• Any pixel in the scene is categorized using the
distances between:
– The digital number vector (spectral vector)
associated with that pixel, and
– The means of the information classes derived from
the training sets.
• The pixel is designated to the class with the shortest
distance.
• Some versions of this classifier use the standard
deviation of the classes to determine a minimum
distance threshold.
Minimum Distance Classifier
• If minimum distance is greater than the threshold,
the pixel will be considered unclassified.
– This pixel does not belong to any of the
classes represented by the training set.
• This classifier is slower than the parallelepiped
classifier
• This classifier is mathematically simple.
• Problem: We do not use the standard deviation
derived from the training data.
Maximum likelihood
(supervised)
 For each training class the spectral variance and
covariance is calculated.
 The class can then be statistically modelled with a
mean vector and covariance matrix.
 This assumes the class is normally distributed.
Which is generally okay for natural surfaces.
 Unidentified pixels can then be given a probability of
being in any one class.
 Assign the new pixel to the class with the highest
probability – or unclassified if all probabilities low.
Maximum likelihood
(supervised)
Maximum likelihood example

• Normal probability distributions


are fitted to each training class.
• The lines in the diagram show
1
regions of equal probability.
• Point 1 would be assigned to
class ‘pond culture’ as this is Equiprobability
most probable. contours

• Point 2 would generally be


2
unclassified as the probabilities
of fitting into one for the
classes would be below
threshold.
Maximum likelihood example
• Characteristics:
– Generally produces the most accurate
classification results.
– Assumes normal distribution of the
spectral data within the training classes.
– Mathematically complex.
– Computationally slow.
ISODATA (hybrid)
 Extends k-means. Also calculate standard deviation
for clusters.
 After mean location is re-calculated for each cluster
we can either:
 Combine clusters if centers are close.
 Split clusters with large standard deviation in any
dimension.
 Delete clusters that are to small.
 Then reclassify each pixel and repeat.
 Stop on max iterations or convergence limit.
 Assign class types to spectral clusters.
Example ISODATA
Band 2

Band 2
Band 2

Band 1 Band 1 Band 1

1. Data is clustered 2.Cyan and green 3. Either assign


but blue cluster is clusters only have 2 outliers to nearest
very stretched in or less pixels. So cluster, or mark as
band 1. they will be unclassified.
removed.
Bayes’s Classification
 Bayesian classification is based on the probability of
observing a particular class given a particular pixel
value
 Lets play a simple game with two sets of dice.
 One normal pair
 One augmented pair – 2 extra spots per side
 Player 1 selects a pair of dice randomly and rolls
then, announcing only the outcome
 Player 2 names which type of dice were used
Bayes’s Classification
 To determine the decision boundary we can
list all possible outcomes, and how likely
each one is.
 For normal dice values 2 – 12
 For augmented dice 6- 16
 How likely is each?
 To get 2, there is only one way to roll
 To get 3, there are two ways (1 and 2, or 2 and 1)
Bayes’s Classification
 The histograms become discriminant
functions, and the decision boundary can be
set based on the most probable outcome in
any given case
 If a 7 is the outcome, it could have come from
either pair of dice, but more likely from the
standard pair
 If a 4, then it is most assuredly from the standard
pair
Bayes’s Classification
 Now lets assume we are trying to guess the
type of groundcover
 We could estimate the probabilities from
training areas
 We would generate histograms to estimate the
probability function for each class
 Use these probabilities to separate pixels
K-Nearest Neighbors
k Nearest Neighbor Requires 3 things:

 The set of stored records
 Distance metric to compute
distance between records
 The value of k, the number of
nearest neighbors to retrieve

?  To classify an unknown record:


 Compute distance to other
training records
 Identify k nearest neighbors
 Use class labels of nearest
neighbors to determine the class
label of unknown record (e.g., by
taking majority vote)
k Nearest Neighbor
 Compute the distance between two points:
 Euclidean distance
d(p,q) = √∑(pi – qi)2
 Hamming distance (overlap metric)

 Determine the class from nearest neighbor list


 Take the majority vote of class labels among the k-
nearest neighbors
 Weighted factor
w = 1/d2
k Nearest Neighbor
 k = 1:
 Belongs to square class

 k = 3:
?  Belongs to triangle class

 k = 7:
 Belongs to square class

 Choosing the value of k:


 If k is too small, sensitive to noise points
 If k is too large, neighborhood may include points from
other classes
 Choose an odd value for k, to eliminate ties
k Nearest Neighbor
 Accuracy of all NN based classification,
prediction, or recommendations depends solely
on a data model, no matter what specific NN
algorithm is used.
 Scaling issues
 Attributes may have to be scaled to prevent distance
measures from being dominated by one of the
attributes.
 Examples
 Heightof a person may vary from 4’ to 6’
 Weight of a person may vary from 100lbs to 300lbs
 Income of a person may vary from $10k to $500k

 Nearest Neighbor classifiers are lazy learners


 Models are not built explicitly unlike eager learners.
kNearest Neighbor
Advantages
 Simple technique that is easily implemented
 Building model is cheap
 Extremely flexible classification scheme
 Well suited for
 Multi-modal classes
 Records with multiple class labels

 Error rate at most twice that of Bayes error rate


 Cover & Hart paper (1967)
 Can sometimes be the best method
 Michihiro Kuramochi and George Karypis, Gene Classification using Expression
Profiles: A Feasibility Study, International Journal on Artificial Intelligence Tools.
Vol. 14, No. 4, pp. 641-660, 2005
 K nearest neighbor outperformed SVM for protein function prediction using
expression profiles
kNearest Neighbor
Disadvantages
 Classifying unknown records are
relatively expensive
 Requires distance computation of k-nearest
neighbors
 Computationally intensive, especially when
the size of the training set grows
 Accuracy can be severely degraded by
the presence of noisy or irrelevant
features
Many Classifiers
Ground truth
 Ideally the training regions need to be based
on ground observation.
 They should be large enough to capture all
the spectral variability in the class type.
 E.g. different types of forest, shallow water and
deep ocean etc.
 Do not need to get too detailed otherwise
classes will not be spectrally separable.
Ancillary Data
 Acquired by other means
 Used to assist in classification or analysis
 Maps, reports, other data
 Primary requirements
 Available digitally
 Pertain to the problem
 Compatible with the RS data
Ancillary Data
 Incompatibility is a serious problem
 Physical – digital formats
 Logical – data usually collected for another
reason
 Scale
 Resolution
 Date
 accuracy
Ancillary Data
 Stratification – subdivide the image into that
are easy to define using ancillary data
 Elevation could be used to look at alpine
vegetation separately from lowland
 Postclassification sorting – examine
confusion matrix and look in more detail at
confused classes
Post classification
 Can check non-training regions with more ground
truth if available.
 Calculate classification statistics.
 Confusion Matrix: Columns show ground truth, rows
show how many pixels are assigned to each class.
 Overall accuracy: Total correct pixels/total pixels
 Commission errors: Incorrect pixels assigned to a class
 Omission errors: Pixels in class that are assigned a
different class
 Visually check to see if any major errors or
unwanted features.
Classification and Regression
Tree Analysis
 Classification and Regression Tree Analysis
(CART) is a method to incorporate ancillary
data into image classification
 Requires accurate training data, but not prior
knowledge of the role of the variables
 Advantage is that it identifies useful data and
separates it from those that don’t contribute
to the classification.
Fuzzy Clustering
 Traditional methods allow a pixel to be
identified only with a single cluster
 There are many processes which can make
matching problematic
 So many pixels will be incorrectly labeled
 Fuzzy logic allows partial membership
 Instead of a water pixel, it could be 0.7 water and
0.3 forest.
Neural Networks
 Artificial Neural Networks (ANN) are
computer programs that simulate the brain
 Establishment of linkages and then reinforcement
of linkages between input and output.
 Generally comprised of three elements
 Input layers – source data
 Hidden layers – association by weights
 Output layer - classes
Neural Networks
 There can be forward propagation – the
normal training to classification sequence
 Backward propagation is a retrospective
analysis of input and output which allows
adjustment of the weights
 This creates a transfer function
 Quantitative link between input and output
 Weights may show some bands are more effective for
certain classes and other bands for different classes
Contextual Classification
 Context is derived from spatial relationships
within the image
 Can operate on either classified or unclassified
scenes
 Usually some classification has been done
 It reassigns pixels as appropriate based on
location (context)
Contextual Classification
Contextual Classification
Contextual Classification

You might also like